In this chapter, we will cover the following recipes:
Generating an inverted index using Hadoop MapReduce
Intradomain web crawling using Apache Nutch
Indexing and searching web documents using Apache Solr
Configuring Apache HBase as the backend data store for Apache Nutch
Whole web crawling with Apache Nutch using a Hadoop/HBase cluster
Elasticsearch for indexing and searching
Generating the in-links graph for crawled web pages