Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 8. Searching and Indexing

In this chapter, we will cover the following recipes:

  • Generating an inverted index using Hadoop MapReduce

  • Intradomain web crawling using Apache Nutch

  • Indexing and searching web documents using Apache Solr

  • Configuring Apache HBase as the backend data store for Apache Nutch

  • Whole web crawling with Apache Nutch using a Hadoop/HBase cluster

  • Elasticsearch for indexing and searching

  • Generating the in-links graph for crawled web pages