Book Image

Apache Spark Machine Learning Blueprints

By : Alex Liu
Book Image

Apache Spark Machine Learning Blueprints

By: Alex Liu

Overview of this book

There's a reason why Apache Spark has become one of the most popular tools in Machine Learning – its ability to handle huge datasets at an impressive speed means you can be much more responsive to the data at your disposal. This book shows you Spark at its very best, demonstrating how to connect it with R and unlock maximum value not only from the tool but also from your data. Packed with a range of project "blueprints" that demonstrate some of the most interesting challenges that Spark can help you tackle, you'll find out how to use Spark notebooks and access, clean, and join different datasets before putting your knowledge into practice with some real-world projects, in which you will see how Spark Machine Learning can help you with everything from fraud detection to analyzing customer attrition. You'll also find out how to build a recommendation engine using Spark's parallel computing powers.
Table of Contents (18 chapters)
Apache Spark Machine Learning Blueprints
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Summary


This chapter constitutes an extension of what was described and discussed in the previous chapters (Chapter 3, A Holistic View on Spark to Chapter 9, City Analytics on Spark). Here, we took an approach driven by data and analytical needs rather than driven by predefined projects. We also developed some predictive models to score subscribers on customer churn, on Call Center calling probabilities, and even on purchasing propensity.

In this chapter, using a real-life project of learning from telco data, we have gone through a step-by-step process of utilizing big data to serve the telco company as well as their clients, from which we processed a large amount of data on Apache Spark. We then built several models, including regression and decision tree, to predict customer churn and Call Center calls and also purchasing, with which we then developed rules for alerts and also developed scores to help the telco company and its clients. At the same time, we completed some exploratory analytics...