Training next up previous
Next: recognition Up: Use of HMMs in Previous: Use of HMMs in


We assume that the preprocessing part of the system gives out a sequence of observation vectors


Starting from a certain set of values, parameters of each of the HMMs


can be updated as given by the eqn.1.19, while the required gradients will be given by eqns. 1.44 and 1.48. However for this particular case, isolated recognition, likelihoods in the the last two equations are calculated in a peculiar way.
First consider the clamped case. Since we have an HMM for each class of units in isolated recognition, we can select the model tex2html_wrap_inline3090 of the class l to which the current observation sequence tex2html_wrap_inline3094 belongs. Then starting from eqn. 1.39 ,


where the second line follows from eqn.1.3.

Similarly for the free case, starting from eqn. 1.40,


where tex2html_wrap_inline3098 represents the likelihood of the current observation sequence belonging to class l, in the model tex2html_wrap_inline3102 . With those likelihoods defined in eqns.1.52 and 1.53, the gradient giving equations 1.44 and 1.48 will take the forms,



Now we can summarize the training procedure as follows.

Initialize the each HMM, tex2html_wrap_inline3110 with values generated randomly or using an initialization algorithm like segmental K means [].
Take an observation sequence and
Go to step (2), unless all the observation sequences are considered.
Repeat step(2) to (3) until a convergence criterion is satisfied.

This procedure can easily be modified if the continuous density HMMs are used, by propagating the gradients via chain rule to the parameters of the continuous probability distributions. Further it is worth to mention that preprocessors can also be trained simultaneously, with such a further back propagation.

next up previous
Next: recognition Up: Use of HMMs in Previous: Use of HMMs in

Narada Warakagoda
Fri May 10 20:35:10 MET DST 1996

Home Page