Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Implementation and deployment


Implementation depends on setting up the big data infrastructure. Please verify that your MongoDB installation is running properly. Now we shall list implementation objectives as follows:

  • Splitting data into test, train and validation datasets
  • Data ingestion
  • Data analysis

 

Implementation objectives

The overall objective is to perform data analysis on an on-time flight dataset corresponding to the year 2007-2008. Of the 2007 flight data, 80% will be used as the training dataset and the rest as a validation dataset. In so far as model performance evaluation is concerned, 100% of the 2008 flight data becomes the testing dataset.

The following are the implementation objectives required to implement the flight prediction model:

  • Download the flight dataset.
  •  You may develop the pipeline in four ways:
  • Incrementally in your local Spark shell
  • By firing up your Horton Sandbox on your host machine managed virtual machine, and developing code in a powerful Zeppelin Notebook environment...