Text to speech with Amazon Polly
Amazon Polly is all about converting text into speech, and does so using pretrained deep learning models. It's a fully managed service, so we don't have to do anything. You provide the plain text as input for synthesizing or in Speech Synthesis Markup Language (SSML) format so that an audio stream is returned. It also gives you different languages and voices to choose from, with both male and female options. The output audio from Amazon Polly can be saved in MP3 format for further usage in the application (web or mobile) or can be a JSON output for written speech.
For example, if we were to input the text "Baba went to the library" into Amazon Polly, the output speech mark object would look as follows:
"went" begins 370 milliseconds after the audio stream begins, and starts...