Book Image

Fast Data Processing Systems with SMACK Stack

By : Raúl Estrada
Book Image

Fast Data Processing Systems with SMACK Stack

By: Raúl Estrada

Overview of this book

SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. This highly practical guide will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing. We’ll start off with an introduction to SMACK and show you when to use it. First you’ll get to grips with functional thinking and problem solving using Scala. Next you’ll come to understand the Akka architecture. Then you’ll get to know how to improve the data structure architecture and optimize resources using Apache Spark. Moving forward, you’ll learn how to perform linear scalability in databases with Apache Cassandra. You’ll grasp the high throughput distributed messaging systems using Apache Kafka. We’ll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspect of SMACK using a few case studies. By the end of the book, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing.
Table of Contents (15 chapters)
Fast Data Processing Systems with SMACK Stack
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface

Preface

The SMACK stack is a generalized web-scale data pipeline. It was popularized in the San Francisco Bay Area data engineering meet ups and conferences and spread around the world. SMACK stands for:

  • S = Spark: This involves data in-memory distributed computing. Think in Apache Flink, Apache Ignite, Google Millwheel, and so on.
  • M = Mesos: This involves Cluster OS, distributed system management, scheduling and scaling. Think in Apache YARN, Kubernetes, Docker, and so on.
  • A = Akka: This is the API. It is an implementation of the actor's model. Think in Scala, Erlang, Elixir, GoLang and so on.
  • C = Cassandra: This is a persistence layer, noSQL database. Think in Apache HBase, Riak, Google BigTable, MongoDB, and so on.
  • K = Kafka: This is a distributed streaming platform, the message broker. Think in Apache Storm, ActiveMQ, RabbitMQ, Kestrel, JMS, and so on.

During the years 2014, 2015, and 2016, surveys show that among all software developers, those with higher wages are the data engineers, the data scientists, and the data architects. This is because there is a huge demand for technical professionals in data and unfortunately for large organizations and fortunately for developers, there is a very low offer.

If you are reading this book, it is for two reasons: either you want to belong to best paid IT professionals, or you already belong and you want to learn how today's trends in the not too distant future will become requirements.

This book explains how to dominate the SMACK stack, which is also called the Spark++, because it seems to be the open stack that will succeed in the near future.

What this book covers

Chapter 1Introducing SMACK,speaks about the fundamental SMACK architecture. We review the differences between the technologies in SMACK and the traditional data technologies. We also reviewed every technology in the SMACK and briefly expose each tool's potential.

Chapter 2, The Model - Scala and Akka, makes it easy by dividing the text into two parts: Scala (the language) and Akka (the actor model implementation for the JVM). It is a mini Scala Akka cookbook to learn through several exercises. The first half is for the fundamental parts of Scala, the second half is focused on the Akka actor model.

Chapter 3, The Engine - Apache Spark, describes the process of setting up a new project with the help of templates by importing an existing project, serving a web application, and using File Watchers.

Chapter 4, The Storage -  Apache Cassandra, describes using package managers and building systems for your application by means of WebStorm's built-in features.

Chapter 5, The Broker - Apache Kafka, focuses on the state-of-the-art technologies of the web industry and describes the process of building a typical application in them using the power of WebStorm features.

Chapter 6, The Manager - Apache Mesos, shows you how to use JavaScript, HTML, and CSS to develop a mobile application and how to set up the environment to test run this mobile application.

Chapter 7, Study case 1 - Spark and Cassandra, shows how to perform the debugging, tracing, profiling, and code style checking activities directly in WebStorm.

Chapter 8, Study case 2 - Connectors, presents a couple of proven ways to easily perform application testing in WebStorm using some of the most popular testing libraries.

Chapter 9Study case 3 - Mesos and Docker, speaks about a second portion of powerful features provided within WebStorm. In this chapter, we focus on some of WebStorm's power features that help us boost productivity and developer experience.

What you need for this book

The reader should have some experience in programming (Java or Scala), some experience in Linux/Unix operating systems and the basics of databases:

  • For Scala, the reader should know the basics about programming
  • For Spark, the reader should know the fundamentals of Scala Programming Language
  • For Mesos, the reader should know the basics of the Operating Systems administration
  • For Cassandra, the reader should know the fundamentals of Databases
  • For Kafka, the reader should have basic knowledge about Scala

Who this book is for

This book is for software developers, data architects, and data engineers looking for how to integrate the most successful Open Source Data stack architecture and how to choose the correct technology in every layer and also what are the practical benefits in every case.

There are a lot of books that talk about each technology separately. This book is for people looking for alternative technologies and practical examples on how to connect the entire stack.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "In the case of HDFS, we should change the mesos.hdfs.role in the file mesos-site.xml to the value of role1."

A block of code is set as follows:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)

exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample
     /etc/asterisk/cdr_mysql.conf

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking the Next button moves you to the next screen".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  1. Log in or register to our website using your e-mail address and password.
  2. Hover the mouse pointer on the SUPPORT tab at the top.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box.
  5. Select the book for which you're looking to download the code files.
  6. Choose from the drop-down menu where you purchased this book from.
  7. Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Fast-Data-Processing-Systems-with-SMACK-Stack. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/FastDataProcessingSystemswithSMACKStack_ColorImages.pdf .

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.