Although voice recognition systems differ in various ways, many of them, if not all, require some form of training which acclimatizes the system. Voice recognition systems usually share some common and fundamental techniques and have a similar structure as shown in the following figure:
A voice recognition system usually consists of three function blocks:
Feature extraction
Pattern analysis (for training)
Pattern matching (for recognition)
The workflow of a voice recognition system can be divided into two sessions:
Training
Recognition
Feature extraction is a process of transforming the large number of audio samples into a relatively small set of discriminatory features. The features are carefully chosen so that the features set are expected to be a good representation of the voice, without using the whole and large amount of audio data samples. Pattern analysis is to find the distribution of these features and their relationship with the meanings of the...