recognition using a HMM continuous recognizer next up previous
Next: Viterbi based recognition Up: Use of HMMs in Previous: MMI training

recognition using a HMM continuous recognizer

When we come to the recognition phase it is assumed that trained HMMs for each of the speech units in the vocabulary are available. The task is to find the underlying speech unit sequence, given an observation sequence tex2html_wrap_inline2954 corresponding to an unknown sentence. Mathematically this operation can be expressed as,


where, tex2html_wrap_inline3220 is an arbitrary speech unit sequence of arbitrary length S. Since tex2html_wrap_inline3224 is provided by the statistical language model, the only thing that should be done in order to find tex2html_wrap_inline3226 is to calculate tex2html_wrap_inline3228 for every possible tex2html_wrap_inline3230 It is obvious that this procedure will be computationally very expensive, because there can be very large number of sentences, even for a small vocabulary. A cheaper solution is to approximate the procedure, by finding the most likely state sequence, tex2html_wrap_inline3232 in the language model tex2html_wrap_inline2942 , instead of the speech unit sequence tex2html_wrap_inline3226 . Formally,


Then it is possible to trace for the corresponding speech unit sequence, via the state sequence. In order to calculate tex2html_wrap_inline3232 we can use the Viterbi Algorithm directly, or the method called Level building, a variant of the Viterbi Algorithm. Since the Viterbi based recognition is suboptimal, unless the each speech unit is corresponding to a HMM state, some attempts have been made to develop efficient methods for calculating the sentence likelihoods. The so called N-best algorithm is one of the this.

There is another problem associated with continuous recognition, which does not arise in connection with isolated recognition. Due to the complicated and approximated recognition (decoding) procedures in continuous mode, mismatches can arise between them and training procedures. For example in MMI training we try to maximize the correct sentence probability, against the alternative sentences. But in the recognition phase if use the Viterbi algorithm for decoding, then there will be a mismatch, because it gives the optimum state sequence and not the optimum sentence. In order to reduce such mismatches, several modifications to the basic MMI training criterion has been suggested [, ]. One such training criterion is so called embedded Viterbi training, where at each t=t, probability of the correct state against the alternative states are maximized. Another suggestion is to maximize the probability of the correct state sequence against the alternative sentences. This training method is consistent with the Viterbi decoding. Finally it is worth to mention that the decoding based on N-best algorithm is more consistent with MMI training than Viterbi based decoding.

next up previous
Next: Viterbi based recognition Up: Use of HMMs in Previous: MMI training

Narada Warakagoda
Fri May 10 20:35:10 MET DST 1996

Home Page