Elasticsearch for Hadoop

Book Image

Elasticsearch for Hadoop

By : Vishal Shukla

Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Elasticsearch for Hadoop

Elasticsearch for Hadoop

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Setting Up Environment

Setting Up Environment

Setting up Hadoop for Elasticsearch

Setting up Elasticsearch

Running the WordCount example

Exploring data in Head and Marvel

Getting Started with ES-Hadoop

Getting Started with ES-Hadoop

Understanding the WordCount program

Going real — network monitoring data

Writing the NetworkLogsMapper job

Getting data from Elasticsearch to HDFS

Understanding Elasticsearch

Understanding Elasticsearch

Knowing Search and Elasticsearch

Talking to Elasticsearch

Controlling the indexing process

Elastic searching

Visualizing Big Data Using Kibana

Visualizing Big Data Using Kibana

Setting up and getting started

Discovering data

Real-Time Analytics

Real-Time Analytics

Getting started with the Twitter Trend Analyser

Injecting streaming data into Storm

Analyzing trends

Classifying tweets using percolators

ES-Hadoop in Production

ES-Hadoop in Production

Elasticsearch in a distributed environment

The ES-Hadoop architecture

Configuring the environment for production

Administration of clusters

Integrating with the Hadoop Ecosystem

Integrating with the Hadoop Ecosystem

Pigging out Elasticsearch

SQLizing Elasticsearch with Hive

Cascading with Elasticsearch

Giving Spark to Elasticsearch

ES-Hadoop on YARN

Configurations

Basic configurations

Write and query configurations

Mapping configurations

Index configurations

Network configurations

Authentication configurations

SSL configurations

Proxy configurations

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Injecting streaming data into Storm

Many of you may already be aware of Storm. However, I will introduce Storm very briefly to all who don't know about it.

Storm provides a real-time computation framework to stream data. So, stream is a core data abstraction of Storm. It is composed of an unbounded sequence of tuples. A single unit of the streaming data is known as a tuple in the Storm terminology.

The worker components of the Storm job are divided into spout and bolt. Spout is a source of streams. Bolt can consume multiple streams. It can perform any processing required and may emit new streams. You can interlink a number of spouts and bolts to create a topology. A topology is a top-level abstraction that you can submit to the Storm cluster for execution.

The following diagram shows a sample Storm topology that shows the stream flow from source to target:

Let's now write our Storm job that will listen to live streaming tweets and inject the fields we want into Elasticsearch. To start with,...