In ML we try to maximize the probability of a given sequence of
observations , belonging to a given class
w, given the HMM
of the class w, wrt the parameters
of the model
. This probability is the
total likelihood of the observations and can be expressed
mathematically as
However since we consider only one class w at a time we can drop the subscript and superscript 'w's. Then the ML criterion can be given as,
However there is no known way to analytically solve for the model
, which maximize the quantity
. But we can
choose model parameters such that it is locally maximized, using an
iterative procedure, like Baum-Welch method or a gradient based
method, which are described below.