Book Image

Microsoft SQL Server 2012 with Hadoop

By : Debarchan Sarkar
Book Image

Microsoft SQL Server 2012 with Hadoop

By: Debarchan Sarkar

Overview of this book

With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server. Microsoft SQL Server 2012 with Hadoop is aimed at SQL Server developers. It will quickly show you how to get Hadoop activated on SQL Server 2012 (it ships with this version). Once this is done, the book will focus on how to manage big data with Hadoop and use Hadoop Hive to query the data. It will also cover topics such as using in-memory functions by SQL Server and using tools for BI with big data. Microsoft SQL Server 2012 with Hadoop focuses on data integration techniques between relational (SQL Server 2012) and non-relational (Hadoop) worlds. It will walk you through different tools for the bi-directional movement of data with practical examples. You will learn to use open source connectors like SQOOP to import and export data between SQL Server 2012 and Hadoop, and to work with leading in-memory BI tools to create ETL solutions using the Hive ODBC driver for developing your data movement projects. Finally, this book will give you a glimpse of the present day self-service BI tools such as Excel and PowerView to consume Hadoop data and provide powerful insights on the data.
Table of Contents (12 chapters)

Summary


In this chapter, we went through what Big Data is and why it is one of the compelling needs of the industry. The diversity of data that needs to be processed has taken Information Technology to heights that were never imagined before. Organizations that are able to take advantage of Big Data to parse any and every data will be able to more effectively differentiate and derive new value for the business, whether it is in the form of revenue growth, cost savings, or creating entirely new business models. For example, financial firms using machine learning to build better fraud detection algorithms, go beyond the simple business rules involving charge frequency and location to also include an individual's customized buying patterns ultimately leading to a better customer experience.

When it comes to Big Data implementations, these new requirements challenge traditional data management technologies and call for a new approach to enable organizations to effectively manage, enrich, and gain insights from any data. Apache Hadoop is one of the undoubted leaders in the Big Data industry. The entire ecosystem, along with its supporting projects provides the users a highly reliable, fault tolerant framework that can be used for massively parallel distributed processing of unstructured and semi-structured data.

In the next chapter, you will see how to use the Sqoop connector to move Hadoop data to SQL Server 2012 and vice versa. Sqoop is another open source project, which is designed for bi-directional import/export of data from Hadoop from/to any Relational Database Management System; we will see its usage as a first step of data integration between Hadoop and SQL Server 2012.