Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 2. Getting Started with ES-Hadoop

Hadoop provides you with a batch-oriented distributed storage and a computing engine. Elasticsearch is a full-text search engine with rich aggregation capabilities. Getting the data from Hadoop to Elasticsearch can open doors to run some data discovery tools to find out interesting patterns and perform full-text search or geospatial analytics. ES-Hadoop is a library that bridges Hadoop with Elasticsearch. The goal of this book is to get you up-and-running with ES-Hadoop and enable you to solve real-world analytics problems.

Our goal in this chapter is to develop MapReduce jobs to write/read the data to/from Elasticsearch. You probably already know how to write basic MapReduce jobs using Hadoop that writes its output to HDFS. ES-Hadoop is a connector library that provides a dedicated InputFormat and OutputFormat that you can use to read/write data from/to Elasticsearch in Hadoop jobs. To take the first step in this direction, we will start with how...