Following what we did in Chapter 10, Learning Telco Data on Spark, in this chapter, we will further extend our Apache Spark machine learning to a project of learning from open data. In Chapter 9, City Analytics on Spark, we already applied machine learning to open data, where we built models to predict service requests. Here, we will further move up into a new level where we will explore machine learning approaches of turning more open data into useful insights, as well as building models to score school districts or schools for academic achievements, technologies, and others. After that, we will build predictive models to explain what impacts the ranking and scoring of these districts.
To follow the good structure established early, in this chapter, we will still first review machine learning methods and related computing for this real-life project of learning from open data. We will then set up Apache Spark computing. At the same time, with our real...