Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learning Hadoop 2
  • Table Of Contents Toc
Learning Hadoop 2

Learning Hadoop 2

By : GABRIELE MODENA
3.8 (4)
close
close
Learning Hadoop 2

Learning Hadoop 2

3.8 (4)
By: GABRIELE MODENA

Overview of this book

If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.
Table of Contents (13 chapters)
close
close
12
Index

Writing MapReduce programs

In this chapter, we will be focusing on batch workloads; given a set of historical data, we will look at properties of that dataset. In Chapter 4, Real-time Computation with Samza, and Chapter 5, Iterative Computation with Spark, we will show how a similar type of analysis can be performed over a stream of text collected in real time.

Getting started

In the following examples, we will assume a dataset generated by collecting 1,000 tweets using the stream.py script, as shown in Chapter 1, Introduction:

$ python stream.py –t –n 1000 > tweets.txt

We can then copy the dataset into HDFS with:

$ hdfs dfs -put tweets.txt <destination>

Tip

Note that until now we have been working only with the text of tweets. In the remainder of this book, we'll extend stream.py to output additional tweet metadata in JSON format. Keep this in mind before dumping terabytes of messages with stream.py.

Our first MapReduce program will be the canonical WordCount example...

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Learning Hadoop 2
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon