Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Getting data from Elasticsearch to HDFS


So far, you learned how the ES-Hadoop library could help you in getting the data from HDFS to the Elasticsearch index. There can be use cases when you have already got your data in Elasticsearch and want to select some specific subset from this data for complex analysis. This subset can be constrained by some full-text search criteria as well.

Understanding the Twitter dataset

Twitter provides the REST API to access the Twitter data. Out of the wide range of data provided by the Twitter API, we will just focus on the tweets data with the #elasticsearch and #kibana hashtags. The dataset has been dumped in the CSV file in the following format:

"602491467697881088","RT @keskival: We won #IndustryHack @CybercomFinland #Elasticsearch #Logstash #Kibana #MarkovChain #AnomalyDetection https://t.co/Iwes6VVSqk","Sun May 24 20:38:54 IST 2015","Cybercom Finland","<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>"
...