Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Table of Contents (16 chapters)
Apache Spark Graph Processing
Credits
Foreword
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

NCAA College Basketball datasets


We will again learn by doing in this chapter. This time, we will take the NCAA College Basketball as an illustrative example. Specifically, we use two CSV datasets. The first one teams.csv contains the list of all college teams that played in the NCAA Division I competition. Each team is associated with a four-digit ID number. The second dataset stats.csv contains the score and statistics of every game during the 2014-2015 regular season. Using the techniques learned in Chapter 2, Building and Exploring Graphs, let's parse and load these datasets and load them into RDDs:

  1. We create a class GameStats that records the statistics of one team during a specific basketball game:

    case class GameStats(
        val score: Int,
        val fieldGoalMade:   Int,
        val fieldGoalAttempt: Int, 
        val threePointerMade: Int,
        val threePointerAttempt: Int,
        val threeThrowsMade: Int,
        val threeThrowsAttempt: Int, 
        val offensiveRebound: Int,
        val defensiveRebound:...