-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Big Data Analytics with Java
By :
Let's take the first steps towards data analysis now. Spark has a very useful module, Spark. Apache Spark has a prebuilt module called as Spark SQL and this module is used for structured data processing. Using this module, we can execute SQL queries on our underlying data. Spark lets you read data from various datasources whether text, CSV, or Parquet files on HDFS or also from hive tables or HBase tables. For simple data analysis tasks, whether you are exploring your datasets initially or trying to analyze and cut a report for your end users with simple stats this module is tremendously useful.
In this chapter, we will work on two datasets. The first dataset that we will analyze is a simple dataset and the next one is a more complex real-world dataset from an e-commerce store.
In this chapter, we will cover the following topics: