We first look at feature extraction, which is a common block shared by both pattern analysis for training and pattern matching for recognition.
Various kinds of audio features have been proposed for voice recognition, including linear predictive coding (LPC), cepstral coefficients, spectral coefficients, and so on. The Mel-Frequency Cepstral Coefficients (MFCC) are probably the most popular at present due to their simplicity and pretty good performance. In this chapter, we are using the MFCC features and associated feature-extraction techniques to build a demonstrative system, as we focus on the demonstration of rapid prototyping. Obviously, a voice recognition system may use a combination of different kinds of features for better recognition performance.
MFCC is based on the fact that the human perception system has a non-linear frequency response to sounds. The frequency response of human's ear works like a band of filters spaced linearly at low frequencies and logarithmically...