Speech recognition, also known as Automatic Speech Recognition (ASR) and speech-to-text (STT/S2T), has a long history. More traditional AI approaches have used in the for a long time; however, with recent interest in deep learning speech, recognition is getting a new boost in performance. Many major tech companies of the world have an interest in speech recognition of the different for it can be used, for example, Voice Search by Google, Siri by Apple, and Alexa by Amazon.
Many companies use pre-trained recognition software. However, in the following recipe, we will demonstrate how to implement and train a speech recognition pipeline from scratch. The accuracy of this newly trained model will be lower than the ones used in the industry. The main reason is that the quality and volume of the training data play a crucial role in accuracy. Interestingly enough, there is a lot of training data (thousands of hours of open source data...