Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Stock price binary classification problem


Stock prices have a tendency to go up and down. We want to Spark ML and a Spark time-series library to explore historical stock price data going back a couple years and come up numbers like the average closing price. We also want our stock price prediction model to forecast what the stock price will be over the timeframe of a few days.

This chapter presents an ML methodology to reduce the complexity associated with stock price prediction. We will obtain a smaller set of optimal financial indicators by feature selection and employ a Random Forest algorithm to build a price prediction pipeline.

We must first download the dataset from the ModernScalaProjects_Code folder.

Stock price prediction dataset at a glance

We will use data from two sources:

  • Reddit worldnews
  • Dow Jones Industrial Average (DJIA

The Getting started section that follows has two clear goals:

  • Moving our development environment into a virtual appliance from a previous local Spark shell-centered...