Book Image

Scala Programming Projects

By : Mikael Valot, Nicolas Jorand
Book Image

Scala Programming Projects

By: Mikael Valot, Nicolas Jorand

Overview of this book

Scala Programming Projects is a comprehensive project-based introduction for those who are new to Scala. Complete with step-by-step instructions and easy-to-follow tutorials that demonstrate best practices when building applications, this Scala book will have you building real-world projects in no time. Starting with the fundamentals of software development, you’ll begin with simple projects, such as developing a financial independence calculator, and then advance to more complex projects, such as a building a shopping application and a Bitcoin transaction analyzer. You’ll explore a variety of Scala features, including its OOP and FP capabilities, and learn how to write concise, reactive, and concurrent applications in a type-safe manner. You’ll also understand how to use libraries such as Akka and Play. Furthermore, you’ll be able to integrate your Scala apps with Kafka, Spark, and Zeppelin, along with deploying applications on a cloud platform. By the end of the book, you’ll have a firm foundation in Java programming that’ll enable you to solve a variety of real-world problems, and you’ll have built impressive projects to add to your professional portfolio.
Table of Contents (18 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Introducing Spark Streaming


In Chapter 10, Fetching and Persisting Bitcoin Market Data, we used Spark to save transactions in a batch mode. The batch mode is fine when you have to perform an analysis on a bunch of data all at once.

But in some cases, you might need to process data as it is entering into the system. For example, in a trading system, you might want to analyze all the transactions done by the broker to detect fraudulent transactions. You could perform this analysis in batch mode after the market is closed; but in this case, you can only act after the fact.

Spark Streaming allows you to consume a streaming source (file, socket, and Kafka topic) by dividing the input data into many micro-batches. Each micro-batch is an RDD that can then be processed by the Spark Engine. Spark divides the input data using a time window. So if you define a time window of 10 seconds, then Spark Streaming will create and process a new RDD every 10 seconds:

Going back to our fraud detection system, by...