To make the case for CNNs, let's first imagine how we might approach an image classification task using a standard feedforward, fully connected ANN. We start with an image that's 600 x 600 pixels in size with three color channels. There are 1,080,000 pieces of information encoded in such an image (600 x 600 x 3), and therefore our input layer would require 1,080,000 neurons. If the next layer in the network contains 1,000 neurons, we'd need to maintain one billion weights between the first two layers alone. Clearly, the problem is already becoming untenable.
Assuming the ANN in this example can be trained, we'd also run into problems with scale and position invariance. If your task is to identify whether or not an image contains street signs, the network may have difficulty understanding that street signs can be located in any...