Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 12. Analytics Using Hadoop

Hadoop has come to the fore because of its capability to aid in data analytics. As data grows in the dimensions of volume, velocity, and variety, there needs to be systems that are capable of analyzing this data efficiently and effectively. Vertically scaling hardware to handle this data is not viable because it is expensive and difficult to manage. Distributed computing and horizontal scaling are good options, and frameworks such as Hadoop automatically cater to the fault tolerance, scaling, and distribution needs of such a system.

Analytics is all about data. A question that frequently arises is when does Hadoop become overkill? Typically, it is recommended that you use Hadoop for datasets of 1 TB and upwards. However, when it becomes difficult to predict the rate of data growth, it may be a good idea to use Hadoop MapReduce because of its attractive "code once, deploy at any scale" characteristic.

There are organizations that use Hadoop to analyze a few...