Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Project overview – problem formulation


In this chapter, the stated goal is to build a spam classifier, one that is capable of distinguishing spam terms in email messages that are mixed in with regular or expected email content as well. It is important to know that spam messages are email messages that are sent out to multiple recipients with the same content, as opposed to regular messages. We start with two email datasets, one that represents ham and one that represents spam. After stages of preprocessing, we fit the model on a training set, say 70% of the entire dataset.

This application is a typical spam filtering application in the sense that it works on text. We then put algorithms to work that help the ML process detect words, phrases, and terms most likely found in spam emails. Next, will go over the ML workflow at a high level in relation to spam filtering.

 

The ML workflow is as follows:

  • We will be developing a pipeline that will use dataframes
  • A dataframe contains a predictions column...