In the last tutorial, we saw how to use siamese networks to recognize a face. Now we will see how to use siamese networks to recognize audio. We will train our network to differentiate between the sound of a dog and the sound of a cat. The dataset of cat and dog audio can be downloaded from here: https://www.kaggle.com/mmoreaux/audio-cats-and-dogs#cats_dogs.zip.
Once we have downloaded the data, we fragment our data into three folders:
Sub_dogs, we place the dog's barking audio and in the
Cats folder, we place the cat's audio. The objective of our network is to recognize whether the audio is a dog's barking or some different sound. As we know, for a siamese network, we need to feed input as a pair; we select an audio from the
Sub_dogs folders and mark them as a genuine pair and we select an audio from the
Cats folders and mark them as an imposite pair. That is, (dogs, subdogs...