Book Image

Hadoop Real-World Solutions Cookbook - Second Edition

By : Tanmay Deshpande
Book Image

Hadoop Real-World Solutions Cookbook - Second Edition

By: Tanmay Deshpande

Overview of this book

Big data is the current requirement. Most organizations produce huge amount of data every day. With the arrival of Hadoop-like tools, it has become easier for everyone to solve big data problems with great efficiency and at minimal cost. Grasping Machine Learning techniques will help you greatly in building predictive models and using this data to make the right decisions for your organization. Hadoop Real World Solutions Cookbook gives readers insights into learning and mastering big data via recipes. The book not only clarifies most big data tools in the market but also provides best practices for using them. The book provides recipes that are based on the latest versions of Apache Hadoop 2.X, YARN, Hive, Pig, Sqoop, Flume, Apache Spark, Mahout and many more such ecosystem tools. This real-world-solution cookbook is packed with handy recipes you can apply to your own everyday issues. Each chapter provides in-depth recipes that can be referenced easily. This book provides detailed practices on the latest technologies such as YARN and Apache Spark. Readers will be able to consider themselves as big data experts on completion of this book. This guide is an invaluable tutorial if you are planning to implement a big data warehouse for your business.
Table of Contents (18 chapters)
Hadoop Real-World Solutions Cookbook Second Edition
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Index

Performing Population Data Analytics using R


So far, we talked about how to use Mahout to solve various machine learning problems. Now, we are going to explain another tool/language called R, which has built-in support for various mathematical and statistical operations.

Getting ready

To perform this recipe, you should have R installed on your machine. You can download the installer from https://cran.r-project.org/bin/windows/base/.

How to do it...

In this recipe, we are going to learn some basic operations that one can perform using R. To start with, we will have one dataset that has information about Australia's population in various states. This is what the dataset looks like:

Year NSW Vic. Qld SA WA Tas. NT ACT Aust.
1917 1904 1409 683 440 306 193 5 3 4941
1927 2402 1727 873 565 392 211 4 8 6182
1937 2693 1853 993 589 457 233 6 11 6836
1947 2985 2055 1106 646 502 257 11 17 7579
1957 3625 2656 1413 873 688 326 21 38 9640
1967 4295 3274 1700 1110 879 375 62 103 11799
1977 5002 3837 2130 1286...