A text-to-speech program, easily intelligible by humans, can allow people with visual or reading impairments to listen to written words on a home computer, or can allow you to enjoy a book while driving a car. In this recipe, we'll work through loading a text-to-speech model, and having it read a text to us. In the How it works... section, we'll go through the model implementation and the model architecture.
For this recipe, please make sure you have a GPU available. On Google Colab, make sure you activate a GPU runtime. We'll also need the wget library, which we can install from the notebook as follows:
!pip install wget
We also need to clone the pytorch-dc-tts repository from GitHub and install its requirements. Please run this from the notebook (or run it from the terminal without the leading exclamation marks):
from os.path import exists
if not exists('pytorch-dc-tts'):
!git clone --quiet https://github.com/tugstugi...