Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Search platforms


The Apache Lucene library provides Java-based indexing and search technology as well as spellchecking, hit highlighting, advanced analysis, and tokenization capabilities. There are two popular open source projects that use this library and provide a distributed platform with replication and caching capabilities.

Solr has been an Apache open source project since 2006, and thus, it has been used by many enterprises and has grown and improved as a project. ElasticSearch was released a few years later, and it was designed since the beginning to be distributed and easy-to-scale out to handle massive amounts of data.

As distributed systems, they both fit nicely in the Hadoop environment. Nodes that participate in the cluster can run both the Hadoop applications—an HBase database and a search platform.

As high memory nodes are usually in place, we can allocate enough memory to each system. Then, depending on the job running, we can utilize the processing and caching capabilities of...