The best system, we investigated in this project is VRCAS_FRQ_REC,
which contains a recurrent loop in the adaptive pre-processing stage.
However the non-recurrent weights of the recurrent neurons were
obtained through Hartley transform, in order to reduce the number of
parameters. This approach is selected just because such a transform is used in
some of the other experiments, and not due to a good reason. Therefore
this restriction can be a bottle-neck to the system performance, and
thus we wish to remove it. But a direct removal of the Hartley
transform can result in degradation of the generalization ability.
Therefore we reduce the number of inputs of each recurrent neuron to a
sufficiently low value, such as 50. (It is reminded that initially this
value is 200). Note that this means that frame length of the input
speech should also be reduced down to 50. Thus we see that for a
speech signal of a given length, number of frames to be processed
will increase by the same factor, 4 . This can easily be reduced down
to the original level by sampling the pre-processed speech vector
sequence at every fourth index. This system can be trained using the
same algorithm used for VRCAS_FRQ_REC with minor modifications.
Even the above described modification can not correct one extremely crude step followed in developing the recurrent net interpretation of optimal windowing (or short time Fourier transform). Namely, we have used real values for the weights h(f,l) ( see eqn. ), even though they are actually complex. A more accurate realization can therefore be obtained by expressing the magnitude of complex term as a sum of squared real terms. This leads to a parallel structure as in fig. (but of course with a feed back loop) which may give better performances.