Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Modern Scala Projects

By : gurusamy

Modern Scala Projects

By: gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Predict the Class of a Flower from the Iris Dataset

Predict the Class of a Flower from the Iris Dataset

A multivariate classification problem

Project overview – problem formulation

Getting started with Spark

Implementing the Iris pipeline

Summary

Questions

Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala

Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala

Breast cancer classification problem

Getting started

Random Forest breast cancer pipeline

LR breast cancer pipeline

Summary

Questions

Stock Price Predictions

Stock Price Predictions

Stock price binary classification problem

Getting started

Implementation objectives

Summary

Questions

Building a Spam Classification Pipeline

Building a Spam Classification Pipeline

Spam classification problem

Project overview – problem formulation

Getting started

Spam classification pipeline

Summary

Questions

Further reading

Build a Fraud Detection System

Build a Fraud Detection System

Fraud detection problem

Project overview – problem formulation

Getting started

Implementation steps

Summary

Questions

Further reading

Build Flights Performance Prediction Model

Build Flights Performance Prediction Model

Overview of flight delay prediction

Getting started

Implementation and deployment

Summary

Questions

Further reading

Building a Recommendation Engine

Building a Recommendation Engine

Problem overviews

Detailed overview

Implementation and deployment

Summary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Predict the Class of a Flower from the Iris Dataset

This chapter kicks off a machine learning (ML) initiative in Scala and Spark. Speaking of Spark, its Machine Learning Library (MLlib) living under the spark.ml package and accessible via its MLlib DataFrame-based API will help us develop scalable data analysis applications. The MLlib DataFrame-based API, also known as Spark ML, provides powerful learning algorithms and pipeline building tools for data analysis. Needless to say, we will, starting this chapter, leverage MLlib's classification algorithms.

The Spark ecosystem, also boasting of APIs to R, Python, and Java in addition to Scala, empowers our readers, be they beginner, or seasoned data professionals, to make sense of and extract analytics from various datasets.

Speaking of datasets, the Iris dataset is the simplest, yet the most famous data analysis task in the ML space. This chapter builds a solution to the data analysis classification task that the Iris dataset represents.

Here is the dataset we will refer to:

UCI Machine Learning Repository: Iris Data Set
Accessed July 13, 2018
Website URL: https://archive.ics.uci.edu/ml/datasets/Iris

The overarching learning objective of this chapter is to implement a Scala solution to the so-called multivariate classification task represented by the Iris dataset.

The following list is a section-wise breakdown of individual learning outcomes:

A multivariate classification problem
Project overview—problem formulation
Getting started with Spark
Implementing a multiclass classification pipeline

The following section offers the reader an in-depth perspective on the Iris dataset classification problem.

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Modern Scala Projects

Search

Your notes and bookmarks