When we first introduced natural language processing, we learned that we would need to perform some sort of transformation in order to represent the words numerically. We did this by creating a term-document matrix. Now that we're dealing with pictures, we'll need to perform another sort of transformation to render the images in a numeric form.
Let's take a look at an image of a few handwritten digits:
These particular digits are taken from the MNIST database of handwritten digits. (Yes, this really is a thing.) This database contains tens of thousands digits like this collected from the handwriting samples of US Census Bureau employees and high school students.
Let's suppose now that we wanted to use machine learning to recognize these digits. How might we represent them numerically from the data that we have?
One way might be to map each pixel in our image to a value in a numeric matrix of the same size. We could then represent some property of that pixel as a value in...