When we dealt with smaller datasets, it was enough for us to just load the entire dataset into computer memory. This is simple and works fine if your dataset is small enough; however, a lot of the time, this won't be the case. We will now look at how to overcome this issue.
In order to avoid loading all our data at once, we will need to create a data pipeline that can feed our training data to the model. This pipeline will be responsible for, among other things, loading a batch of elements from storage, preprocessing the data, and finally, feeding the data to our model. Luckily for us, this can all be easily accomplished using the TensorFlow data API.
For these examples, we are going to assume that we have saved our data into multiple (two in this case) TFRecord files like those described previously. There is no difference if you have more than two; you just have to include all their names when setting things up.
We start by creating a TFRecord dataset from a list...