We used the Oozie coordinator in Chapter 1, Meet Hunk, to import massive amounts of data. Data is partitioned by date and stored in binary format with a schema. It looks like a production-ready approach. Avro is pretty well supported across the whole Hadoop ecosystem. Now we are going to create a custom application using that data. Have a look at the description of the data.
Here is a description of the data stored in the base table:
Square ID: The ID of the square that is part of the Milano grid type: numeric.
Time interval: The beginning of the time interval expressed as the number of milliseconds elapsed from the Unix Epoch on January 1, 1970 at UTC. The end of the time interval can be obtained by adding 600,000 milliseconds (10 minutes) to this value.
Country code: The phone code of a nation. Depending on the measured activity this value assumes different meanings that are explained later.