This section shows how we can structure the raw data to build the features. For each country, the data is:
A picture of the flag
Some geographical data such as continent, geographic quadrant, area, and population
The language and religion of the country
The target is to build a model that predicts a country language starting from its flag. Most of the models can deal with numeric and/or categorical data, so we can't use the image of the flag as a feature for the model. The solution is to define some features, for instance the number of colors, that describe each flag. In this way, we start from a table whose rows correspond to the countries and whose columns correspond to the flag features.
It would take a lot of time to build the matrix with the flag attributes based on the pictures. Fortunately, we can use a dataset that contains some features. The data that we have is still a bit messy, so we need to clean and transform it to build a feature table in the right format...