Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Questions


We will now list a set of questions to test your knowledge of what you have learned so far:

  • What do you understand by logistical regression? Why is it important?
  • How does logistical regression differ from linear regression?
  • Name one powerful feature of BinaryClassifier.
  • What are the feature variables in relation to the breast cancer dataset?

The breast cancer dataset problem is a classification task that can be approached with other machine learning algorithms as well. Prominent among other techniques are Support Vector Machine (SVM), k-nearest neighbor, and decision trees. When you run the pipelines developed in this chapter, compare the time it took to build a model in each case and how many of the input rows of the dataset were classified correctly by each algorithm.

This concludes this chapter. The next chapter implements a new kind of pipeline, which is a stock prediction task pipeline. We shall see how we can use Spark to work on larger datasets. Stock price prediction is not an...