We will again learn by doing in this chapter. This time, we will take the NCAA College Basketball as an illustrative example. Specifically, we use two CSV datasets. The first one teams.csv
contains the list of all college teams that played in the NCAA Division I competition. Each team is associated with a four-digit ID number. The second dataset stats.csv
contains the score and statistics of every game during the 2014-2015 regular season. Using the techniques learned in Chapter 2, Building and Exploring Graphs, let's parse and load these datasets and load them into RDDs:
We create a class
GameStats
that records the statistics of one team during a specific basketball game:case class GameStats( val score: Int, val fieldGoalMade: Int, val fieldGoalAttempt: Int, val threePointerMade: Int, val threePointerAttempt: Int, val threeThrowsMade: Int, val threeThrowsAttempt: Int, val offensiveRebound: Int, val defensiveRebound:...