Data! Data! Data! I can't make bricks without clay!
- Sir Arthur Conan Doyle
So far, we have learned how to perform analytics on what can be referred to as "small data". However, as the amount of data increases, so does the size and the problem of how to analyze the vast amounts of data that is produced arises. When that occurs, we begin to approach "big data" and new approaches to solving problems develop and sometimes, new tools are needed as well.
To some extent, nothing changes. You still want high quality data. You still want to be able to examine the relationships and cast the problem within a predictive analytic framework.
What does change are the steps needed to achieve that end, bearing in mind that the data is more difficult to manage and as a result new tools have evolved to help you do that.
One of the tools that has evolved in recent years is Apache Spark.
In this chapter, we will cover some basics of Spark. We will start with a known small...