Book Image

Hadoop Real-World Solutions Cookbook - Second Edition

By : Tanmay Deshpande

Book Image

Hadoop Real-World Solutions Cookbook - Second Edition

By: Tanmay Deshpande

Overview of this book

Big data is the current requirement. Most organizations produce huge amount of data every day. With the arrival of Hadoop-like tools, it has become easier for everyone to solve big data problems with great efficiency and at minimal cost. Grasping Machine Learning techniques will help you greatly in building predictive models and using this data to make the right decisions for your organization. Hadoop Real World Solutions Cookbook gives readers insights into learning and mastering big data via recipes. The book not only clarifies most big data tools in the market but also provides best practices for using them. The book provides recipes that are based on the latest versions of Apache Hadoop 2.X, YARN, Hive, Pig, Sqoop, Flume, Apache Spark, Mahout and many more such ecosystem tools. This real-world-solution cookbook is packed with handy recipes you can apply to your own everyday issues. Each chapter provides in-depth recipes that can be referenced easily. This book provides detailed practices on the latest technologies such as YARN and Apache Spark. Readers will be able to consider themselves as big data experts on completion of this book. This guide is an invaluable tutorial if you are planning to implement a big data warehouse for your business.

Hadoop Real-World Solutions Cookbook Second Edition

Hadoop Real-World Solutions Cookbook Second Edition

Credits

About the Author

About the Author

Acknowledgements

Acknowledgements

About the Reviewer

About the Reviewer

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Getting Started with Hadoop 2.X

Getting Started with Hadoop 2.X

Installing a single-node Hadoop Cluster

Installing a multi-node Hadoop cluster

Adding new nodes to existing Hadoop clusters

Executing the balancer command for uniform data distribution

Entering and exiting from the safe mode in a Hadoop cluster

Decommissioning DataNodes

Performing benchmarking on a Hadoop cluster

Exploring HDFS

Loading data from a local machine to HDFS

Exporting HDFS data to a local machine

Changing the replication factor of an existing file in HDFS

Setting the HDFS block size for all the files in a cluster

Setting the HDFS block size for a specific file in a cluster

Enabling transparent encryption for HDFS

Importing data from another Hadoop cluster

Recycling deleted data from trash to HDFS

Saving compressed data in HDFS

Mastering Map Reduce Programs

Mastering Map Reduce Programs

Writing the Map Reduce program in Java to analyze web log data

Executing the Map Reduce program in a Hadoop cluster

Adding support for a new writable data type in Hadoop

Implementing a user-defined counter in a Map Reduce program

Map Reduce program to find the top X

Map Reduce program to find distinct values

Map Reduce program to partition data using a custom partitioner

Writing Map Reduce results to multiple output files

Performing Reduce side Joins using Map Reduce

Unit testing the Map Reduce code using MRUnit

Data Analysis Using Hive, Pig, and Hbase

Data Analysis Using Hive, Pig, and Hbase

Storing and processing Hive data in a sequential file format

Storing and processing Hive data in the ORC file format

Storing and processing Hive data in the ORC file format

Storing and processing Hive data in the Parquet file format

Performing FILTER By queries in Pig

Performing Group By queries in Pig

Performing Order By queries in Pig

Performing JOINS in Pig

Writing a user-defined function in Pig

Analyzing web log data using Pig

Performing the Hbase operation in CLI

Performing Hbase operations in Java

Executing the MapReduce programming with an Hbase Table

Advanced Data Analysis Using Hive

Advanced Data Analysis Using Hive

Processing JSON data in Hive using JSON SerDe

Processing XML data in Hive using XML SerDe

Processing Hive data in the Avro format

Writing a user-defined function in Hive

Performing table joins in Hive

Executing map side joins in Hive

Performing context Ngram in Hive

Call Data Record Analytics using Hive

Twitter sentiment analysis using Hive

Implementing Change Data Capture using Hive

Multiple table inserting using Hive

Data Import/Export Using Sqoop and Flume

Data Import/Export Using Sqoop and Flume

Importing data from RDMBS to HDFS using Sqoop

Exporting data from HDFS to RDBMS

Using query operator in Sqoop import

Importing data using Sqoop in compressed format

Performing Atomic export using Sqoop

Importing data into Hive tables using Sqoop

Importing data into HDFS from Mainframes

Incremental import using Sqoop

Creating and executing Sqoop job

Importing data from RDBMS to Hbase using Sqoop

Importing Twitter data into HDFS using Flume

Importing data from Kafka into HDFS using Flume

Importing web logs data into HDFS using Flume

Automation of Hadoop Tasks Using Oozie

Automation of Hadoop Tasks Using Oozie

Implementing a Sqoop action job using Oozie

Implementing a Map Reduce action job using Oozie

Implementing a Java action job using Oozie

Implementing a Hive action job using Oozie

Implementing a Pig action job using Oozie

Implementing an e-mail action job using Oozie

Executing parallel jobs using Oozie (fork)

Scheduling a job in Oozie

Machine Learning and Predictive Analytics Using Mahout and R

Machine Learning and Predictive Analytics Using Mahout and R

Setting up the Mahout development environment

Creating an item-based recommendation engine using Mahout

Creating a user-based recommendation engine using Mahout

Using Predictive analytics on Bank Data using Mahout

Clustering text data using K-Means

Performing Population Data Analytics using R

Performing Twitter Sentiment Analytics using R

Performing Predictive Analytics using R

Integration with Apache Spark

Integration with Apache Spark

Running Spark standalone

Running Spark on YARN

Olympics Athletes analytics using the Spark Shell

Creating Twitter trending topics using Spark Streaming

Twitter trending topics using Spark streaming

Analyzing Parquet files using Spark

Analyzing JSON data using Spark

Processing graphs using Graph X

Conducting predictive analytics using Spark MLib

Hadoop Use Cases

Hadoop Use Cases

Call Data Record analytics

Web log analytics

Sensitive data masking and encryption using Hadoop

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

access log, formats
- %h / How to do it...
- %l / How to do it...
- %u / How to do it...
- %t / How to do it...
- %r / How to do it...
- %>s / How to do it...
- %b / How to do it...
- referrer / How to do it...
- user agent / How to do it...
Apache Spark
- about / Introduction
- URL / Getting ready
Area Under a Curve
- reference / How to do it...
Atomic export
- performing, Sqoop used / Performing Atomic export using Sqoop, How it works...
Avro
- reference / Processing Hive data in the Avro format
AVRO format
- Hive data, processing in / Processing Hive data in the Avro format, How to do it..., How it works...
AWS EC2
- reference / Getting ready

B

balancer command
- executing, for uniform data distribution / Executing the balancer command for uniform data distribution, How to do it...
benchmarking
- performing, on Hadoop cluster / Performing benchmarking on a Hadoop cluster, How to do it..., TestDFSIO, How it works...

C

Call Data Record analytics
- defining / Call Data Record analytics, Problem Statement, Solution
- URL / Getting ready
- problem statement / Problem Statement
- solution / Solution
Call Data Record Analytics
- performing, Hive used / How to do it...
Call Data Records (CDR)
- about / Call Data Record Analytics using Hive, Getting ready
- reference / Call Data Record Analytics using Hive
Change Data Capture (CDC)
- implementing, Hive used / Implementing Change Data Capture using Hive, How to do it, How it works
CLI
- Hbase operation, performing in / Performing the Hbase operation in CLI, How to do it, How it works...
combined access logs
- reference / Writing the Map Reduce program in Java to analyze web log data
command options, Hadoop
- reference / There's more
compressed data
- saving, on HDFS / Saving compressed data in HDFS, How it works...
configuration parameters, Hadoop
- hadoop-env.sh / How to do it...
- core-site.xml / How to do it...
- yarn-site.xml / How to do it...
- mapred-site.xml / How to do it...
- hdfs-site.xml / How to do it...
Confusion Matrix
- reference / How to do it...
context Ngram
- performing, in Hive / Performing context Ngram in Hive, How it works...
custom SerDe
- reference / How it works...

D

data
- loading, from local machine to HDFS / Loading data from a local machine to HDFS, How it works...
- exporting, from HDFS to local machine / Getting ready, How it works...
- importing, from Hadoop cluster / Importing data from another Hadoop cluster, How it works...
- importing from RDMBS to HDFS, Sqoop used / Importing data from RDMBS to HDFS using Sqoop, How to do it...
- exporting, from HDFS to RDBMS / Exporting data from HDFS to RDBMS, Getting ready, How to do it..., How it works...
- importing into Hive table, Sqoop used / Importing data into Hive tables using Sqoop, How to do it..., How it works...
- importing into HDFS, from Mainframes / Importing data into HDFS from Mainframes
- importing from RDBMS to Hbase, Sqoop used / Importing data from RDBMS to Hbase using Sqoop, How to do it..., How it works...
- importing from Kafka into HDFS, Flume used / Importing data from Kafka into HDFS using Flume, How to do it..., How it works
Data Encryption Key (DEK) / How it works...
DataNodes
- decommissioning / Decommissioning DataNodes, How it works...
deleted data
- recycling, from trash to HDFS / Recycling deleted data from trash to HDFS, How to do it..., How it works...
DistCp
- about / How to do it...
- -update option / How to do it...
- -overwrite option / How to do it...
- reference / How it works...
distinct values
- finding, Map Reduce program used / Map Reduce program to find distinct values, How to do it

E

e-mail action job
- implementing, Oozie used / Implementing an e-mail action job using Oozie, How to do it..., How it works...
Encrypted Data Encryption Key(EDEK) / How it works...
Encryption Zone Key( EZK) / How it works...
Extract-Transform-Load (ETL)
- about / Introduction

F

FILTER By queries
- performing, in Pig / Performing FILTER By queries in Pig, How to do it...
Flume
- used, for importing Twitter data into HDFS / Importing Twitter data into HDFS using Flume, How to do it...
- about / How it works
- used, for importing data from Kafka into HDFS / Importing data from Kafka into HDFS using Flume, How to do it..., How it works
- used, for importing web logs data into HDFS / Importing web logs data into HDFS using Flume, How to do it..., How it works...
full outer join / Full outer join

G

Google File System (GFS)
- about / Introduction
graphs
- processing, Graph X used / Processing graphs using Graph X, How to do it...
Graph X
- used, for processing graphs / Processing graphs using Graph X, How to do it...
- URL / How it works...
Group By queries
- performing, in Pig / Performing Group By queries in Pig, How to do it...
Group operator
- reference / How it works...

H

Hadoop
- about / Introduction
- configuration parameters / How to do it...
- support, adding for new writable data type / Adding support for a new writable data type in Hadoop, How to do it..., How it works...
- used, for sensitive data masking / Sensitive data masking and encryption using Hadoop, Getting ready, Solution, How it works...
- used, for encryption / Sensitive data masking and encryption using Hadoop, Getting ready, Solution, How it works...
- problem statement / Problem statement
- solution / Solution
Hadoop, components
- storage / Introduction
- processing / Introduction
Hadoop 2.7
- download link / How to do it...
Hadoop cluster
- benchmarking, performing on / Performing benchmarking on a Hadoop cluster, How to do it..., TestDFSIO, How it works...
- data, importing from / Importing data from another Hadoop cluster, How it works...
- Map Reduce program, executing in / Executing the Map Reduce program in a Hadoop cluster, How it works...
Hadoop clusters
- new nodes, adding to / Adding new nodes to existing Hadoop clusters, How it works...
Hadoop Distributed File System (HDFS)
- about / How to do it..., Hadoop Distributed File System (HDFS)
Hbase
- reference / Introduction, Getting ready
Hbase operation
- performing, in CLI / Performing the Hbase operation in CLI, How to do it, How it works...
- performing, in Java / Performing Hbase operations in Java, How it works...
HDFS
- replication factor, modifying of existing file / Changing the replication factor of an existing file in HDFS, How it works...
- transparent encryption, enabling for / Enabling transparent encryption for HDFS, How to do it..., How it works...
- compressed data, saving on / Saving compressed data in HDFS, How it works...
HDFS, to local machine
- data, exporting from / Getting ready, How it works...
HDFS block size
- setting, for files in cluster / Getting ready, How it works...
- setting, for specific file in cluster / Getting ready, How it works...
Hive
- reference / Introduction
- User Defined functions, writing in / Writing a user-defined function in Hive, How to do it
- table joins, performing in / Performing table joins in Hive, How to do it...
- map side joins, executing in / Executing map side joins in Hive, How to do it...
- context Ngram, performing in / Performing context Ngram in Hive, How it works...
- used, for performing Call Data Record Analytics / How to do it..., How it works...
- used, for performing Twitter sentiment analysis / Twitter sentiment analysis using Hive, How to do it..., How it works
- used, for implementing Change Data Capture (CDC) / Implementing Change Data Capture using Hive, How to do it, How it works
- used, for inserting multiple tables / Multiple table inserting using Hive, How to do it
Hive action job
- implementing, Oozie used / Implementing a Hive action job using Oozie, How to do it..., How it works...
Hive data
- processing, in sequential file format / Storing and processing Hive data in a sequential file format, How to do it...
- storing, in sequential file format / Storing and processing Hive data in a sequential file format, How to do it...
- processing, in RC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in RC file format / Storing and processing Hive data in the ORC file format, How it works...
- processing, in ORC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in ORC file format / Storing and processing Hive data in the ORC file format, How it works...
- storing, in Parquet file format / Storing and processing Hive data in the Parquet file format, How to do it...
- processing, in Parquet file format / Storing and processing Hive data in the Parquet file format, How to do it...
- processing, in AVRO format / Processing Hive data in the Avro format, How to do it..., How it works...
Hive JSON SerDe
- used, for processing JSON data / Processing JSON data in Hive using JSON SerDe, How to do it..., How it works...
Hive XML SerDe
- used, for processing XML data / Processing XML data in Hive using XML SerDe, How to do it..., How it works

I

incremental import
- defining, Sqoop used / Incremental import using Sqoop, How to do it...
installing
- Single Node Hadoop Cluster / Getting ready, How to do it..., How it works...
- multi-node Hadoop cluster / Installing a multi-node Hadoop cluster, How to do it..., How it works...
iris.txt
- URL / How to do it...
Iris flower data set
- reference / Performing Predictive Analytics using R
iris flowers
- URL / How to do it...
item based recommendation engine
- setting up, Mahout used / Creating an item-based recommendation engine using Mahout, How to do it..., How it works...

J

Java
- Hbase operation, performing in / Performing Hbase operations in Java, How it works...
Java action job
- implementing, Oozie used / Implementing a Java action job using Oozie, How to do it
Java JDK
- download link / How to do it...
job
- scheduling, in Oozie / Scheduling a job in Oozie, How to do it...
JOINS
- performing, in Pig / Performing JOINS in Pig, How to do it..., How it works
JSON data
- processing, Hive JSON SerDe used / Processing JSON data in Hive using JSON SerDe, How to do it..., How it works...
- analyzing, Spark used / Analyzing JSON data using Spark , How to do it..., How it works...
JSON SerDe binaries
- references / Getting ready

K

K-Means
- reference / Clustering text data using K-Means
Kafka
- about / Importing data from Kafka into HDFS using Flume
Key Management Server (KMS) / How to do it...

L

LazyOutputFormat
- reference link / How it works...
left outer join / Left outer join
left semi join / Left semi join
local machine, to HDFS
- data, loading from / Loading data from a local machine to HDFS, How it works...
Luhn's Algorithm
- URL / Solution

M

Mahout
- about / Introduction
- download link / How to do it...
- used, for setting up item based recommendation engine / Creating an item-based recommendation engine using Mahout, How to do it..., How it works...
- used, for setting up user based recommendation engine / Creating a user-based recommendation engine using Mahout, How to do it..., How it works...
- used, for performing predictive analytics on bank data / Using Predictive analytics on Bank Data using Mahout, How to do it...
Mahout algorithms
- reference / How it works...
Mahout development environment
- setting up / Setting up the Mahout development environment, How to do it..., How it works...
Mainframes
- data, importing into HDFS from / Importing data into HDFS from Mainframes
- about / Importing data into HDFS from Mainframes
Map Reduce
- about / Introduction
- used, for performing Reduce side Joins / Performing Reduce side Joins using Map Reduce, How to do it, How it works...
Map Reduce action job
- implementing, Oozie used / Implementing a Map Reduce action job using Oozie, How to do it...
Map Reduce code
- unit testing, MRUnit used / Unit testing the Map Reduce code using MRUnit, How to do it..., How it works...
Map Reduce program
- writing, in Java for web log data analysis / Writing the Map Reduce program in Java to analyze web log data, How to do it..., How it works...
- executing, in Hadoop cluster / Executing the Map Reduce program in a Hadoop cluster, How it works...
- user-defined counter, implementing in / Implementing a user-defined counter in a Map Reduce program, How to do it..., How it works...
- used, for finding top X / Map Reduce program to find the top X, How to do it...
- used, for finding distinct values / Map Reduce program to find distinct values, How to do it
- writing, for data partitioning / Map Reduce program to partition data using a custom partitioner, How to do it..., How it works...
MapReduce programing Hbase Table
- executing / Executing the MapReduce programming with an Hbase Table, How to do it
Map Reduce results
- writing, to multiple output files / Writing Map Reduce results to multiple output files, How to do it...
map side joins
- executing, in Hive / Executing map side joins in Hive, How to do it...
- reference / How it works...
merge joins
- about / Merge Joins
- reference / Merge Joins
MRBench
- benchmarking / MRBench
MRUnit
- Map Reduce code, unit testing / Unit testing the Map Reduce code using MRUnit, How to do it..., How it works...
multi-node Hadoop cluster
- installing / Installing a multi-node Hadoop cluster, How to do it..., How it works...
multiple tables
- inserting, Hive used / Multiple table inserting using Hive, How to do it
MySQL connector
- URL / Getting ready

N

Naive Bayes algorithm
- reference / How it works...
Ngrams
- reference / How it works...
NNBench
- benchmarking / NNBench
nodes
- adding, to existing Hadoop clusters / Adding new nodes to existing Hadoop clusters, How it works...

O

Olympics Athletes Data Analytics
- defining, Spark Shell used / Olympics Athletes analytics using the Spark Shell, How to do it...
- URL / How to do it...
Oozie
- used, for implementing Sqoop action job / Implementing a Sqoop action job using Oozie, How to do it...
- used, for implementing Map Reduce action job / Implementing a Map Reduce action job using Oozie, How to do it...
- used, for implementing Java action job / Implementing a Java action job using Oozie, How to do it
- used, for implementing Hive action job / Implementing a Hive action job using Oozie, How to do it..., How it works...
- used, for implementing Pig action job / Implementing a Pig action job using Oozie, How to do it..., How it works
- used, for implementing e-mail action job / Implementing an e-mail action job using Oozie, How to do it..., How it works...
- used, for executing parallel jobs / Executing parallel jobs using Oozie (fork), How to do it...
- job, scheduling / Scheduling a job in Oozie, How to do it...
ORC file format
- Hive data, storing in / Storing and processing Hive data in the ORC file format, How it works...
- Hive data, processing in / Storing and processing Hive data in the ORC file format, How it works...
Order By queries
- performing, in Pig / Performing Order By queries in Pig, How it works...

P

parallel jobs
- executing, Oozie used / Executing parallel jobs using Oozie (fork), How to do it...
Parquet
- about / Analyzing Parquet files using Spark
- URL / How to do it...
Parquet file format
- Hive data, processing in / Storing and processing Hive data in the Parquet file format, How to do it...
- Hive data, storing in / Storing and processing Hive data in the Parquet file format, How it works...
- reference / How it works...
Parquet files
- analyzing, Spark used / Getting ready, How to do it..., How it works...
Pearson product-moment correlation coefficient
- reference / How it works...
people.json sample
- URL / How to do it...
Pig
- reference / Introduction
- FILTER By queries, performing in / Performing FILTER By queries in Pig, How to do it...
- Group By queries, performing in / Performing Group By queries in Pig, How to do it...
- Order By queries, performing in / Performing Order By queries in Pig, How it works...
- JOINS, performing in / Performing JOINS in Pig, How to do it...
- user-defined function, writing in / Writing a user-defined function in Pig, How to do it...
- used, for analyzing web log data / Analyzing web log data using Pig, How to do it...
Pig 0.15
- reference / Getting ready
Pig action job
- implementing, Oozie used / Implementing a Pig action job using Oozie, How to do it..., How it works
population data analytics
- performing, R used / Performing Population Data Analytics using R, How to do it...
predictive analytics
- performing, R used / Performing Predictive Analytics using R, How to do it...
- conducting, Spark MLib used / Conducting predictive analytics using Spark MLib, How to do it..., How it works...

Q

query operator
- used, in Sqoop import / Using query operator in Sqoop import, How it works...

R

R
- about / Introduction
- used, for performing population data analytics / Performing Population Data Analytics using R, How to do it...
- used, for performing Twitter sentiment analytics / Performing Twitter Sentiment Analytics using R, How to do it..., How it works...
- used, for performing predictive analytics / Performing Predictive Analytics using R, How to do it...
RC file format
- Hive data, storing in / Storing and processing Hive data in the ORC file format, How it works...
- Hive data, processing in / Storing and processing Hive data in the ORC file format, How it works...
Reduce side Joins
- performing, Map Reduce used / Performing Reduce side Joins using Map Reduce, How to do it, How it works...
Remote Procedure Calls (RPC) / Processing Hive data in the Avro format
replicated joins
- about / Replicated Joins
- reference / Replicated Joins
replication factor
- modifying, of existing file in HDFS / Changing the replication factor of an existing file in HDFS, How it works...
right outer join / Right outer join

S

safe mode
- entering / Entering and exiting from the safe mode in a Hadoop cluster
- exiting from / Entering and exiting from the safe mode in a Hadoop cluster
sequential file format
- Hive data, storing in / Storing and processing Hive data in a sequential file format, How to do it...
- Hive data, processing in / Storing and processing Hive data in a sequential file format, How to do it...
SGD for logistic regression
- reference / How to do it...
Single Node Hadoop Cluster
- installing / Getting ready, How to do it..., How it works...
- HDFS file operations, performing on / There's more
skewed joins
- about / Skewed Joins
- reference / Skewed Joins
Spark
- running, on YARN / Running Spark on YARN, How to do it...
- used, for analyzing Parquet files / Getting ready, How to do it..., How it works...
- used, for analyzing JSON data / Analyzing JSON data using Spark , How to do it..., How it works...
Spark Shell
- used, for Olympics Athletes Data Analytics / Olympics Athletes analytics using the Spark Shell, How to do it...
Spark standalone
- running / Running Spark standalone, How to do it..., How it works...
Spark streaming
- used, for Twitter trending topics / Twitter trending topics using Spark streaming, How to do it...
Spark Streaming
- used, for creating Twitter trending topics / Creating Twitter trending topics using Spark Streaming, How to do it..., How it works...
- URL / How it works...
Sqoop
- used, for importing data from RDMBS to HDFS / Importing data from RDMBS to HDFS using Sqoop, How to do it...
- used, for performing Atomic export / Performing Atomic export using Sqoop, How it works...
- used, for importing data into Hive table / Importing data into Hive tables using Sqoop, How to do it..., How it works...
- used, for incremental import / Incremental import using Sqoop, How to do it...
- used, for importing data from RDBMS to Hbase / Importing data from RDBMS to Hbase using Sqoop, How to do it..., How it works...
Sqoop, in compressed format
- used, for importing data / Importing data using Sqoop in compressed format, How to do it..., How it works...
Sqoop action job
- implementing, Oozie used / Implementing a Sqoop action job using Oozie, How to do it...
Sqoop import
- query operator, using / Using query operator in Sqoop import, How to do it...
Sqoop job
- creating / Creating and executing Sqoop job, How it works...
- executing / Creating and executing Sqoop job, How it works...
Stochastic Gradient Descent (SGD) / How to do it...

T

table joins
- performing, in Hive / Performing table joins in Hive, How to do it...
- left outer join / Left outer join
- right outer join / Right outer join
- full outer join / Full outer join
- left semi join / Left semi join
TestDFSIO
- benchmarking / TestDFSIO
text data clustering, K-Means
- Mahout used / Clustering text data using K-Means, How to do it...
top X
- finding, Map Reduce program used / Map Reduce program to find the top X, How to do it...
transparent encryption
- enabling, for HDFS / Enabling transparent encryption for HDFS, How to do it..., How it works...
- reference / How it works...
TreeMap
- reference link / How to do it...
Twitter apps
- URL / How to do it...
Twitter authorization tokens
- generating / How to do it...
Twitter data
- importing Twitter data, Flume used / Importing Twitter data into HDFS using Flume, How to do it...
Twitter sentiment analysis
- performing, Hive used / Twitter sentiment analysis using Hive, How to do it..., How it works
Twitter sentiment analytics
- performing, R used / Performing Twitter Sentiment Analytics using R, How to do it..., How it works...
Twitter trending topics
- creating, Spark Streaming used / Creating Twitter trending topics using Spark Streaming, How to do it..., How it works...
- defining, Spark streaming used / Twitter trending topics using Spark streaming, How to do it...

U

uniform data distribution
- balancer command, executing for / Executing the balancer command for uniform data distribution, How to do it...
user-defined counter
- implementing, in Map Reduce program / Implementing a user-defined counter in a Map Reduce program, How to do it..., How it works...
user-defined function
- writing, in Pig / Writing a user-defined function in Pig, How to do it...
User-Defined Functions (UDFs)
- about / Introduction
user based recommendation engine
- setting up, Mahout used / Creating a user-based recommendation engine using Mahout, How to do it..., How it works...
User Defined functions
- writing, in Hive / Writing a user-defined function in Hive, How to do it

W

Web log analytics
- defining / Web log analytics, Solution
- references / Getting ready
- problem statement / Problem statement
- solution / Solution
web log data
- analyzing, Pig used / Analyzing web log data using Pig, How to do it...
web logs data into HDFS
- importing, Flume used / Importing web logs data into HDFS using Flume, How to do it..., How it works...

X

XML data
- processing, Hive XML SerDe used / Processing XML data in Hive using XML SerDe, How to do it..., How it works
XML SerDe
- references / Getting ready

Y

YARN
- Spark, running on / Running Spark on YARN, How to do it...
Yet Another Resource Negotiator (YARN)
- about / How to do it..., Yet Another Resource Negotiator (YARN)