Book Image

Big Data Processing using Apache Spark [Video]

By : Tomasz Lelek
Book Image

Big Data Processing using Apache Spark [Video]

By: Tomasz Lelek

Overview of this book

<p><span id="description" class="sugar_field">Every year we have a big increment of data that we need to store and analyze. When we want to aggregate all data about our users and analyze that data to find insights from it, terabytes of data undergo processing. To be able to process such amounts of data, we need to use a technology that can distribute multiple computations and make them more efficient. Apache Spark is a technology that allows us to process big data leading to faster and scalable processing.</span></p> <p><span id="description" class="sugar_field">In this course, we will learn how to leverage Apache Spark to be able to process big data quickly. We will cover the basics of Spark API and its architecture in detail. In the second section of the course, we will learn about Data Mining and Data Cleaning, wherein we will look at the Input Data Structure and how Input data is loaded In the third section we will be writing actual jobs that analyze data. By the end of the course, you will have sound understanding of the Spark framework which will help you in writing the code understand the processing of big data.</span></p> <h2><span class="sugar_field">Style and Approach</span></h2> <p><span class="sugar_field"><span id="trade_selling_points_c" class="sugar_field">Filled with hands-on examples, this course will help you learn how to process big data using Apache.</span></span></p>
Table of Contents (3 chapters)
Chapter 2
Data Mining and Data Cleaning
Content Locked
Section 4
Cleaning Input Data
In this video, we look at how to tokenizing input data - Remove whitespaces and other non-relevant tokens - Learn how to normalize strings