The Google Speech API uses a powerful machine learning model to convert audio to text. The API recognizes over 110 languages and can process them as a stream or from stored audio files. The Speech API can perform the conversion by three methods—synchronous recognition, asynchronous recognition, and stream recognition.
We'll perform a simple recipe to use the Speech API to convert a recorded message to text using the synchronous recognition method.
The following are the initial setup verification steps for the creation of the network before the recipe can be executed:
- Create or select a GCP project
- Enable billing and enable the default APIs (some APIs like BigQuery, storage, monitoring, and few a others are enabled automatically)
- Enable the Google Cloud Speech API for the project
- Make sure that the code has access to GCP APIs using the Application Default Credential strategy or direct API keys