When we come to the recognition phase it is assumed that trained HMMs for each of the speech units in the vocabulary are available. The task is to find the underlying speech unit sequence, given an observation sequence corresponding to an unknown sentence. Mathematically this operation can be expressed as,

where, is an arbitrary speech unit
sequence of arbitrary length *S*. Since is provided by the
statistical language model, the only thing that should be done in
order to find is to calculate for every possible It is obvious that this
procedure will be computationally very expensive, because there can be
very large number of sentences, even for a small vocabulary. A cheaper
solution is to approximate the procedure, by finding the most likely
state sequence, in the language model
, instead of the speech unit sequence . Formally,

Then it is possible to trace for the corresponding speech unit
sequence, via the state sequence. In order to calculate we
can use the Viterbi Algorithm directly, or the method called *Level
building*, a variant of the Viterbi Algorithm. Since the Viterbi
based recognition is suboptimal, unless the each speech unit is
corresponding to a HMM state, some attempts have been made to develop
efficient methods for calculating the sentence likelihoods. The so
called *N-best algorithm* is one of the this.

There is another problem associated with continuous recognition, which
does not arise in connection with isolated recognition. Due to
the complicated and approximated recognition (decoding) procedures in
continuous mode, mismatches can arise between them and training procedures.
For example in MMI training we try to maximize the correct sentence
probability, against the alternative sentences. But in the recognition
phase if use the Viterbi algorithm for decoding, then there will be a
mismatch, because it gives the optimum state sequence and not the
optimum sentence. In order to reduce such mismatches, several
modifications to the basic MMI training criterion has been
suggested [, ]. One such training criterion is so called
*embedded Viterbi training*, where at each *t*=*t*, probability of
the correct state against the alternative states are maximized.
Another suggestion is to maximize the probability of the correct state
sequence against the alternative sentences. This training method is
consistent with the Viterbi decoding. Finally it is worth to mention
that the decoding based on N-best algorithm is more consistent with
MMI training than Viterbi based decoding.

Fri May 10 20:35:10 MET DST 1996