Vision data itself is a multidimensional pixel array and, when we see this type of data, our brain intuitively extracts information from it. Actually, extracting such high level information from a dataset is a very complex operation. This operation should take a huge amount of zeros and ones as the input data and it should convert this data into a conclusion such as, "Yes, this is a car!", or sometimes even, "This is Randy's car!". Maybe this process sounds very easy for you but, for computers, it is not.
A computer and the human brain work very differently. As a result, a computer is much more powerful when making calculations and the brain is much more capable with high level tasks such as scene interpretation. So, this chapter concentrates on a weak element of computer science. Despite this, it is surely possible to interpret vision scenes intelligently by using computers, but the approach of extracting meaningful information from vision data is an indirect way of approaching...