Book Image

Practical Big Data Analytics

By : Nataraj Dasgupta
Book Image

Practical Big Data Analytics

By: Nataraj Dasgupta

Overview of this book

Big Data analytics relates to the strategies used by organizations to collect, organize, and analyze large amounts of data to uncover valuable business insights that cannot be analyzed through traditional systems. Crafting an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from your organization’s data is a challenge. Today, with hundreds of new Big Data systems, machine learning packages, and BI tools, selecting the right combination of technologies is an even greater challenge. This book will help you do that. With the help of this guide, you will be able to bridge the gap between the theoretical world of technology and the practical reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB, and even learn how to write R code for neural networks. By the end of the book, you will have a very clear and concrete understanding of what Big Data analytics means, how it drives revenues for organizations, and how you can develop your own Big Data analytics solution using the different tools and methods articulated in this book.
Table of Contents (16 chapters)
Title Page
Packt Upsell
Contributors
Preface

When do you know you have a big data problem and where do you start your search for the big data solution?


Finally, big data analytics refers to the practice of putting the data to work--in other words, the process of extracting useful information from large volumes of data through the use of appropriate technologies. There is no exact definition for many of the terms used to denote different types of analytics, as they can be interpreted in different ways and the meaning hence can be subjective.

Nevertheless, some are provided here to act as references or starting points to help you in forming an initial impression:

  • Data mining: Data mining refers to the process of extracting information from datasets through running queries or basic summarization methods such as aggregations. Finding the top 10 products by the number of sales from a dataset containing all the sales records of one million products at an online website would be the process of mining: that is, extracting useful information from a dataset. NoSQL databases such as Cassandra, Redis, and MongoDB are prime examples of tools that have strong data mining capabilities.
  • Business intelligence: Business intelligence refers to tools such as Tableau, Spotfire, QlikView, and others that provide frontend dashboards to enable users to query data using a graphical interface. Dashboard products have gained in prominence in step with the growth of data as users seek to extract information. Easy-to-use interfaces with querying and visualization features that could be used universally by both technical and non-technical users set the groundwork to democratize analytical access to data.
  • Visualization: Data can be expressed both succinctly and intuitively, using easy-to-understand visual depictions of the results. Visualization has played a critical role in understanding data better, especially in the context of analyzing the nature of the dataset and its distribution prior to more in-depth analytics. Developments in JavaScript, which saw a resurgence after a long period of quiet, such as D3.js and ECharts from Baidu, are some of the prime examples of visualization packages in the open source domain. Most BI tools contain advanced visualization capabilities and, as such, it has become an indispensable asset for any successful analytics product.
  • Statistical analytics: Statistical analytics refers to tools or platforms that allow end users to run statistical operations on datasets. These tools have traditionally existed for many years, but have gained traction with the advent of big data and the challenges that large volumes of data pose in terms of performing efficient statistical operations. Languages such as R and products such as SAS are prime examples of tools that are common names in the area of computational statistics.
  • Machine learning: Machine learning, which is often referred to by various names such as predictive analytics, predictive modeling, and others, is in essence the process of applying advanced algorithms that go beyond the realm of traditional statistics. These algorithms inevitably involve running hundreds or thousands of iterations. Such algorithms are not only inherently complex, but also very computationally intensive.

The advancement in technology has been a key driver in the growth of machine learning in analytics, to the point where it has now become a commonly used term across the industry. Innovations such as self-driving cars, traffic data on maps that adjust based on traffic patterns, and digital assistants such as Siri and Cortana are examples of the commercialization of machine learning in physical products.