Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Preface

Scala, along with the Spark Framework, forms a rich and powerful data processing ecosystem. This book is a journey into the depths of this ecosystem. The machine learning (ML) projects presented in this book enable you to create practical, robust, data analytics solutions, with an emphasis on automating data workflows with the Spark ML pipeline API. This book showcases, or carefully cherry-picks from, Scala’s functional libraries and other constructs to help readers roll out their own scalable data processing frameworks. The projects in this book enable data practitioners across all industries to gain insights into data that will help organizations to obtain a strategic and competitive advantage. Modern Scala Projects focuses on the application of supervisory learning ML techniques that classify data and make predictions. You'll begin with working on a project to predict a class of flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine.

By the end of this book, you will be able to build efficient data science projects that fulfill your software requirements.

Who this book is for

This book is for Scala developers who would like to gain some hands-on experience with some interesting real-world projects. Prior programming experience with Scala is necessary.

What this book covers

Chapter 1, Predict the Class of a Flower from the Iris Dataset, focuses on building a machine learning model leveraging a time-tested statistical method based on regression. The chapter draws the reader into data processing, all the way to training and testing a relatively simple machine learning model.

Chapter 2, Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala, taps into a publicly available breast cancer dataset. It evaluates various feature selection algorithms, transforms data, and builds a classification model.

 

Chapter 3, Stock Price Predictions, says that stock price prediction can be an impossible task. In this chapter, we take a new approach. Accordingly, we build and train a neural network model with training data to solve the apparently intractable problem of stock price prediction. A data pipeline, with Spark at its core, distributes training of the model across multiple machines in a cluster. A real-life dataset is fed into the pipeline. Training data goes through preprocessing and normalization steps before a model is trained to fit the data. We may also provide a means to visualize the results of our prediction and evaluate our model after training.

Chapter 4, Building a Spam Classification Pipeline, informs the reader that the overarching learning objective of this chapter is to implement a spam filtering data analysis pipeline. We will rely on the Spark ML library's machine learning APIs and its supporting libraries to build a spam classification pipeline.

Chapter 5, Build a Fraud Detection System, applies machine learning techniques and algorithms to build a practical ML pipeline that helps find questionable charges on consumers’ credit cards. The data is drawn from a publicly accessible Consumer Complaints Database. The chapter demonstrates the tools contained in Spark ML for building, evaluating, and tuning a pipeline. Feature extraction is one function served by Spark ML that is covered here.

Chapter 6, Build Flights Performance Prediction Model, makes us able to leverage flight departure and arrival data to predict for the user if their flight is delayed or canceled. Here, we will build a decisions trees-based model to derive useful predictors, such as what time of the day is best to have a seat on a flight, with a minimum chance of delay.

Chapter 7, Building a Recommendation Engine, draws the reader into the implementation of a scalable recommendations engine. The collaborative-filtering approach is laid out as the reader walks through a phased recommendations-generating process based on users’ past preferences.

To get the most out of this book

Prior knowledge of Scala is assumed. Knowledge of basic concepts like Spark ML will be an add-on.

 

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Modern-Scala-Projects. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/ModernScalaProjects_ColorImages.pdf

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "A variable representing the age of a girl called Huan (Age_Huan)."

 

A block of code is set as follows:

val dataFrame = spark.createDataFrame(result5).toDF(featureVector, speciesLabel)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

sc.getConf.getAll
res4: Array[(String, String)] = Array((spark.repl.class.outputDir,C:\Users\Ilango\AppData\Local\Temp\spark-10e24781-9aa8-495c-a8cc-afe121f8252a\repl-c8ccc3f3-62ee-46c7-a1f8-d458019fa05f), (spark.app.name,Spark shell), (spark.sql.catalogImplementation,hive), (spark.driver.port,58009), (spark.debug.maxToStringFields,150),

Any command-line input or output is written as follows:

scala> val dataSetPath = "C:\\Users\\Ilango\\Documents\\Packt\\DevProjects\\Chapter2\\"

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

 

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.