Elasticsearch for Hadoop

Book Image

Elasticsearch for Hadoop

By : Vishal Shukla

Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Elasticsearch for Hadoop

Elasticsearch for Hadoop

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Setting Up Environment

Setting Up Environment

Setting up Hadoop for Elasticsearch

Setting up Elasticsearch

Running the WordCount example

Exploring data in Head and Marvel

Getting Started with ES-Hadoop

Getting Started with ES-Hadoop

Understanding the WordCount program

Going real — network monitoring data

Writing the NetworkLogsMapper job

Getting data from Elasticsearch to HDFS

Understanding Elasticsearch

Understanding Elasticsearch

Knowing Search and Elasticsearch

Talking to Elasticsearch

Controlling the indexing process

Elastic searching

Visualizing Big Data Using Kibana

Visualizing Big Data Using Kibana

Setting up and getting started

Discovering data

Real-Time Analytics

Real-Time Analytics

Getting started with the Twitter Trend Analyser

Injecting streaming data into Storm

Analyzing trends

Classifying tweets using percolators

ES-Hadoop in Production

ES-Hadoop in Production

Elasticsearch in a distributed environment

The ES-Hadoop architecture

Configuring the environment for production

Administration of clusters

Integrating with the Hadoop Ecosystem

Integrating with the Hadoop Ecosystem

Pigging out Elasticsearch

SQLizing Elasticsearch with Hive

Cascading with Elasticsearch

Giving Spark to Elasticsearch

ES-Hadoop on YARN

Configurations

Basic configurations

Write and query configurations

Mapping configurations

Index configurations

Network configurations

Authentication configurations

SSL configurations

Proxy configurations

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

In this chapter, we discussed how to set up Storm to run in the local environment. You learned how to analyze a real-time streaming dataset with the the Twitter Trends Analyzer example. We created the Storm spouts and bolts to get real-time tweets and processed these tweets. We also created the Storm topology to configure our spouts and bolts with ES-Hadoop's EsBolt to inject tweets into Elasticsearch. We explored Elasticsearch's significant terms aggregation query to find the trends and unusually common patterns in the indexed data. We also used percolators to help us classify the documents with stored queries.

In the next chapter, you will understand the important Elasticsearch and ES-Hadoop concepts, such as shards, replicas, data colocations, and advanced configuration options. These concepts and configurations are essential to know before getting your wonderful application into production.