In today's world, organizations produce gigabytes of information every day from various applications that are actively utilized by employees for various purposes. The data sources can vary from application software databases, online social media, mobile devices, and system logs to factory-based operational subsystem sensors. With such huge, heterogeneous data, it becomes a challenge for IT teams to process it together and provide data analytics. In addition to this, the size of this information is growing exponentially. With such variety and veracity, using standard data-processing applications to deal with large datasets becomes a challenge and the traditional distributed system cannot handle this Big Data. In this chapter, we intend to look at the problem of handling Big Data using Apache Solr and other distributed systems.
We have already seen some information about NOSQL databases and CAP theorem in Chapter 2, Getting Started with Apache...