Book Image

Modern Scala Projects

By : Ilango gurusamy
Book Image

Modern Scala Projects

By: Ilango gurusamy

Overview of this book

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Questions


Before readers head to the next chapter, we invite readers to attempt an upgrade on the flight performance model. The idea is this—feed in a couple more predictors that enhance the flight delay ML process in a way that makes predictions deeper and more incisive. 

Here are a few questions to open further vistas of learning:

  1. What is a parquet file and what are its advantages, especially when a dataset becomes larger, and data shuffling between nodes becomes necessary?
  2. What are the advantages of data compressed in a columnar format?
  1. Occasionally, you might run into this error: "Unable to find encoder stored in Dataset. Primitive types (Int, String, and so on) and Product types (case classes) are supported by importing spark.implicits._". How do you get around this error? What is the root cause? Hint—build a simple dataframe with a dataset from the first chapter. Use the spark.read approach and attempt a printSchema on it. If that produces the aforementioned error, investigate if it could...