While MNIST is a very good dataset for educational purpose, it is quite small. Let's take a look at a different image recognition problem: given a picture, we want to predict if there is a cat on the image or a dog.
For this, we will use the dataset with dogs and cats pictures from a competition run on kaggle, and the dataset can be downloaded from https://www.kaggle.com/c/dogs-vs-cats.
Let's start by first reading the data.
For the dogs versus cats competition, there are two datasets; training, with 25,000 images of dogs and cats, 50% each, and testing. For the purposes of this chapter, we only need to download the training dataset. Once you have downloaded it, unpack it somewhere.
The filenames look like the following:
The label (
cat) is encoded into the filename.
As you know, the first thing we always do is to split the data into training and validation sets...