Book Image

Hadoop Beginner's Guide

Book Image

Hadoop Beginner's Guide

Overview of this book

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills."Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems.Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems.While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection.In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.
Table of Contents (19 chapters)
Hadoop Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

About the Reviewers

David Gruzman is a Hadoop and big data architect with more than 18 years of hands-on experience, specializing in the design and implementation of scalable high-performance distributed systems. He has extensive expertise of OOA/OOD and (R)DBMS technology. He is an Agile methodology adept and strongly believes that a daily coding routine makes good software architects. He is interested in solving challenging problems related to real-time analytics and the application of machine learning algorithms to the big data sets.

He founded—and is working with—BigDataCraft.com, a boutique consulting firm in the area of big data. Visit their site at www.bigdatacraft.com. David can be contacted at [email protected]. More detailed information about his skills and experience can be found at http://www.linkedin.com/in/davidgruzman.

Muthusamy Manigandan is a systems architect for a startup. Prior to this, he was a Staff Engineer at VMWare and Principal Engineer with Oracle. Mani has been programming for the past 14 years on large-scale distributed-computing applications. His areas of interest are machine learning and algorithms.

Vidyasagar N V has been interested in computer science since an early age. Some of his serious work in computers and computer networks began during his high school days. Later, he went to the prestigious Institute Of Technology, Banaras Hindu University, for his B.Tech. He has been working as a software developer and data expert, developing and building scalable systems. He has worked with a variety of second, third, and fourth generation languages. He has worked with flat files, indexed files, hierarchical databases, network databases, relational databases, NoSQL databases, Hadoop, and related technologies. Currently, he is working as Senior Developer at Collective Inc., developing big data-based structured data extraction techniques from the Web and local information. He enjoys producing high-quality software and web-based solutions and designing secure and scalable data systems. He can be contacted at .