gradient wrt observation probabilities next up previous
Next: Use of HMMs in Up: Maximum Mutual Information (MMI) Previous: gradient wrt transition probabilities

gradient wrt observation probabilities

Using the chain rule for any of the likelihoods, free or clamped,

  equation863

Differentiating eqns.1.39 and 1.40 wrt tex2html_wrap_inline2892 , to get two results for free and clamped cases, and using the common result in eqn.1.28, we get substitutions for both terms on the right hand side of eqn. 1.45. This substitution yields two separate results for free and clamped cases.

  eqnarray882

where tex2html_wrap_inline3016 is a Kronecker delta. And

  equation900

Substitution of eqns. 1.46 and 1.47 in eqn.1.38 we get the required result,

  eqnarray916

This equation can be given a somewhat ``nice'' form by defining,

  eqnarray937

where tex2html_wrap_inline3016 is a Kronecker delta, and

  equation951

With these variables we express the eqn.1.48 in the following form.

  equation961

This equation completely defines the update of observation probabilities. If however continuous densities are used then we can further propagate this derivative using the chain rule, in exactly the same way as mentioned in the case ML. A similar comments are valid also for preprocessors.



Narada Warakagoda
Fri May 10 20:35:10 MET DST 1996

Home Page