Before we get our hands wet in the world of complex analytics, we will take small baby steps and learn some basic statistical analysis first. This would help us get familiar with the approach that we will be using on big data for other solutions as well. For our analysis initially we will take a simple
cars JSON dataset that has details about a few cars from different countries. We will analyze it using Spark SQL and see how easy it is to query and analyze datasets using Spark SQL. Spark SQL is handy to use for basic analytics purposes and is nicely suited on big data. It can be run on massive datasets and data can reside in HDFS.
To start with a simple case study we are using a
cars dataset. This dataset can be obtained from http://www.carqueryapi.com/. It can be obtained from link http://www.carqueryapi.com/api/0.3/?callback=?&cmd=getMakes. This datasets contains data about cars in different countries. It is in JSON format. It is not a very big dataset from the perspective...