In ML we try to maximize the probability of a given sequence of observations , belonging to a given class w, given the HMM of the class w, wrt the parameters of the model . This probability is the total likelihood of the observations and can be expressed mathematically as
However since we consider only one class w at a time we can drop the subscript and superscript 'w's. Then the ML criterion can be given as,
However there is no known way to analytically solve for the model , which maximize the quantity . But we can choose model parameters such that it is locally maximized, using an iterative procedure, like Baum-Welch method or a gradient based method, which are described below.