Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Data is everywhere


We live in a digital era and are always connected with friends and family using social media and smartphones. In 2014, every second over 5,700 tweets were sent and 800 links were shared using Facebook and the digital universe was about 1.7 MB per minute for every person on Earth (source: IDC 2014 report). This amount of data sharing and storing is unprecedented and is contributing to what is known as big data.

The following infographic shows you the details of our current use of the top social media sites (source https://leveragenewagemedia.com/):

Other contributors to big data are the smart connected devices such as smartphones, appliances, cars, sensors, and pretty much everything that we use today and is connected to the Internet. These devices, which will soon be in trillions, continuously collect data and communicate with each other about their environment to make intelligent decisions and help us live better. This digitization of the world has added to the exponential growth of big data.

The following figure depicts the trend analysis done by Microsoft Azure, which shows the evolution of big data "internet of things". In the period 1980 to 1990, IT systems ERM/CRM primarily generated data in a well-structured format with volume in GBs. In the period between 1990 and 2000, the Web and mobile applications emerged and now the data volumes increased to terabytes. After the year 2000, social networking sites, Wikis, blogs, and smart devices emerged and now we are dealing with petabytes of data. The section in blue highlights the big data era that includes social media, sensors, and images where Volume, Velocity, and Variety are the norms. One related key trend is the price of hardware, which dropped from $190/GB in 1980 to $0.07/GB in 2010. This has been a key enabler in big data adoption.

According to the 2014 IDC digital universe report, the growth trend will continue and double in size every two years. In 2013, about 4.4 zettabytes were created and in 2020 the forecast is 44 zettabytes, which is 44 trillion gigabytes (source: http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm).

Source: Microsoft TechEd North America 2014 From Zero to Data Insights from HDInsight on Microsoft Azure

Business value of big data

While we generated 4.4 zettabytes of data in 2013, only five percent of it was actually analyzed and this is the real opportunity of big data. The IDC report forecasts that by 2020, we will analyze over 35 percent of generated data by making smarter sensors and devices. This data will drive new consumer and business behavior that will drive trillions of dollars in opportunity for IT vendors and organizations analyzing this data.

Let's look at some real use cases that have benefited from Big Data:

  • IT systems in all major banks are constantly monitoring fraudulent activities and alerting customers within milliseconds. These systems apply complex business rules and analyze historical data, geography, type of vendor, and other parameters based on the customer to get accurate results.

  • Commercial drones are transforming agriculture by analyzing real-time aerial images and identifying the problem areas. These drones are cheaper and more efficient than satellite imagery, as they fly under the clouds and can take images anytime. They identify irrigation issues related to water, pests, or fungal infections, which thereby, increases the crop productivity and quality. These drones are equipped with technology to capture high quality images every second and transfer them to a cloud hosted big data system for further processing. (You can refer to http://www.technologyreview.com/featuredstory/526491/agricultural-drones/.)

  • Developers of the blockbuster Halo 4 game were tasked to analyze player preferences and support an online tournament in the cloud. The game attracted over 4 million players in its first five days after the launch. The development team had to also design a solution that kept track of leader board for the global Halo 4 Infinity Challenge, which was open to all players. The development team chose the Azure HDInsight service to analyze the massive amounts of unstructured data in a distributed manner. The results from HDInsight were reported using Microsoft SQL Server PowerPivot and Sharepoint, and business was extremely happy with the response times for their queries, which was a few hours, or less (source: http://www.microsoft.com/casestudies/Windows-Azure/343-Industries/343-Industries-Gets-New-User-Insights-from-Big-Data-in-the-Cloud/710000002102).