In ML we try to maximize the probability of a given sequence of
observations , belonging to a given class
*w*, given the HMM of the class *w*, wrt the parameters
of the model . This probability is the
total likelihood of the observations and can be expressed
mathematically as

However since we consider only one class *w* at a time we can drop the
subscript and superscript '*w*'s. Then the ML criterion can be given
as,

However there is no known way to analytically solve for the model , which maximize the quantity . But we can choose model parameters such that it is locally maximized, using an iterative procedure, like Baum-Welch method or a gradient based method, which are described below.

Fri May 10 20:35:10 MET DST 1996