Summary

In this project, a hybrid ANN-HMM speech recognizer with NN based adaptive pre-processing was studied, with an emphasis on the pre-processing part. Feasibility of such a system was proved first by formulating the classical pre-processing based on mel scale cepstral coefficients as a Neural Network, and then by optimizing the whole system as single unit, with MMI criterion. Twelve experiments, involving a speaker independent, 5-broad class, isolated phoneme recognition task, were carried out with various modifications introduced to the pre-processing part. Degree of adaptivity of pre-processing in those experiments was varied within a large range, from complete non-adaptivity to full adaptivity with different structures.

It was shown that full adaptivity has problems with generalization performances while the non-adaptivity has inferior learning ability. Possibility to improve the generalization was demonstrated, by reducing the number of free parameters, and several methods were tried for parameter reduction.

For preprocessing, both MLP and recurrent structures were tried. A structure with a layer of Recurrent Neurons operating at the front end of the system was shown to be the best in recognition performance, and a mathematical treatment was given to prove that such a recurrent layer actually performs the optimal short time Fourier transform in the sense of recognition error.

Finally a comparative evaluation was given for all the modified versions of the system, to illustrate the superiority and un-substitutability of the adaptive pre-processing.

Narada Warakagoda
Fri May 10 20:35:10 MET DST 1996

Home Page