Scaling Big Data with Hadoop and Solr, Second Edition

Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar

Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Scaling Big Data with Hadoop and Solr Second Edition

Scaling Big Data with Hadoop and Solr Second Edition

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Processing Big Data Using Hadoop and MapReduce

Processing Big Data Using Hadoop and MapReduce

Apache Hadoop's ecosystem

Configuring Apache Hadoop

Setting up a Hadoop cluster

Common problems and their solutions

Understanding Apache Solr

Understanding Apache Solr

Setting up Apache Solr

The Apache Solr architecture

Configuring Solr

Loading data in Apache Solr

Querying for information in Solr

Enabling Distributed Search using Apache Solr

Enabling Distributed Search using Apache Solr

Understanding a distributed search

Working with SolrCloud

Sharding algorithm and fault tolerance

Apache Solr and Big Data – integration with MongoDB

Big Data Search Using Hadoop and Its Ecosystem

Big Data Search Using Hadoop and Its Ecosystem

Understanding NoSQL

Working with the Solr HDFS connector

Big data search using Katta

Using Solr 1045 Patch – map-side indexing

Using Solr 1301 Patch – reduce-side indexing

Distributed search using Apache Blur

Apache Solr and Cassandra

Scaling Solr through Storm

Advanced analytics with Solr

Scaling Search Performance

Scaling Search Performance

Understanding the limits

Optimizing search schema

Index optimization

Optimizing search runtime

Monitoring Solr instance

Use Cases for Big Data Search

Use Cases for Big Data Search

E-Commerce websites

Log management for banking

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Common problems and their solutions

The following is a list of common problems and their solutions:

When I try to format the HDFS node, I get the exception java.io.IOException: Incompatible clusterIDs in namenode and datanode?
This issue usually appears if you have a different/older cluster and you are trying to format a new namenode; however, the datanodes still point to older cluster ids. This can be handled by one of the following:
1. By deleting the DFS data folder, you can find the location from hdfs-site.xml and restart the cluster
2. By modifying the version file of HDFS usually located at <HDFS-STORAGE-PATH>/hdfs/datanode/current/
3. By formatting namenode with the problematic datanode's cluster ID:
```
  $ hdfs namenode -format -clusterId <cluster-id>
```
My Hadoop instance is not starting up with the ./start-all.sh script? When I try to access the web application, it shows the page not found error?
This could be happening because of a number of issues. To understand the issue, you must look at the Hadoop logs first. Typically, Hadoop logs can be accessed from the /var/log folder if the precompiled binaries are installed as the root user. Otherwise, they are available inside the Hadoop installation folder.
I have setup N node clusters, and I am running the Hadoop cluster with ./start-all.sh. I am not seeing many nodes in the YARN/NameNode web application?
This again can be happening due to multiple reasons. You need to verify the following:
1. Can you reach (connect to) each of the cluster nodes from namenode by using the IP address/machine name? If not, you need to have an entry in the /etc/hosts file.
2. Is the ssh login working without password? If not, you need to put the authorization keys in place to ensure logins without password.
3. Is datanode/nodemanager running on each of the nodes, and can you connect to namenode/AM? You can validate this by running ssh on the node running namenode/AM.
4. If all these are working fine, you need to check the logs and see if there are any exceptions as explained in the previous question.
5. Based on the log errors/exceptions, specific action has to be taken.