Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Analyzing the Twitter stream


In the following examples, we will use the implementation of JsonLoader provided by Elephant Bird to load and manipulate JSON data. We will use Pig to explore tweet metadata and analyze trends in the dataset. Finally, we will model the interaction between users as a graph and use Apache DataFu to analyze this social network.

Prerequisites

Download the elephant-bird-pig (http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-pig/4.5/elephant-bird-pig-4.5.jar), elephant-bird-hadoop-compat (http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-hadoop-compat/4.5/elephant-bird-hadoop-compat-4.5.jar), and elephant-bird-core (http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-core/4.5/elephant-bird-core-4.5.jar) JAR files from the Maven central repository and copy them onto HDFS using the following command:

$ hdfs dfs -put target/elephant-bird-pig-4.5.jar hdfs:///jar/
$ hdfs dfs –put target/elephant-bird-hadoop-compat...