Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Spam classification pipeline 


The most important development objective of this chapter is to perform spam classification tasks with the following algorithms:

  • Stop word remover
  • Naive Bayes
  • Inverse document frequency
  • Hashing trick transformer
  • Normalizer

The practical goal of our spam classification task is this: Given a new incoming document, say, a collection of random emails from either Inbox or Spam, the classifier must be able to identify spam in the corpus. After all, this is the basis of an effective classifier. The real-world benefit behind developing this classifier to give our readers experience of developing their own spam filters. After learning how to put together the classifier, we will develop it. 

The implementation steps are in the next section. This takes us straight into the development of Scala code in a Spark environment. Given that Spark allows us to write powerful distributed ML programs such as pipelines, that is exactly what we will set out to do. We will start by understanding...