Up until now, we have reviewed a steady stream of pertinent topics concerning statistics and specifically, predictive analytics. In this chapter, we look to provide a tutorial dedicated to applying those concepts and practices to very large datasets. First, we'll begin by defining the phrase very large – at least as it is used to describe data defined (that we want to train our predictive models on or run our statistical algorithms against). Next, we will review the list of the challenges imposed by using bigger data sources, and finally, we will offer some ideas for meeting these challenges.
Our chapter is broken down into the following sections:
Getting started
The phases of an analytics project
Experience and data of scale
The characteristics of big data
Training models at scale
The specific challenges (of big data)
A path forward