Book Image

Getting Started with Greenplum for Big Data Analytics

By : Sunila Gollapudi
Book Image

Getting Started with Greenplum for Big Data Analytics

By: Sunila Gollapudi

Overview of this book

Organizations are leveraging the use of data and analytics to gain a competitive advantage over their opposition. Therefore, organizations are quickly becoming more and more data driven. With the advent of Big Data, existing Data Warehousing and Business Intelligence solutions are becoming obsolete, and a requisite for new agile platforms consisting of all the aspects of Big Data has become inevitable. From loading/integrating data to presenting analytical visualizations and reports, the new Big Data platforms like Greenplum do it all. It is now the mindset of the user that requires a tuning to put the solutions to work. "Getting Started with Greenplum for Big Data Analytics" is a practical, hands-on guide to learning and implementing Big Data Analytics using the Greenplum Integrated Analytics Platform. From processing structured and unstructured data to presenting the results/insights to key business stakeholders, this book explains it all. "Getting Started with Greenplum for Big Data Analytics" discusses the key characteristics of Big Data and its impact on current Data Warehousing platforms. It will take you through the standard Data Science project lifecycle and will lay down the key requirements for an integrated analytics platform. It then explores the various software and appliance components of Greenplum and discusses the relevance of each component at every level in the Data Science lifecycle. You will also learn Big Data architectural patterns and recap some key advanced analytics techniques in detail. The book will also take a look at programming with R and integration with Greenplum for implementing analytics. Additionally, you will explore MADlib and advanced SQL techniques in Greenplum for analytics. This book also elaborates on the physical architecture aspects of Greenplum with guidance on handling high-availability, back-up, and recovery.
Table of Contents (13 chapters)
Getting Started with Greenplum for Big Data Analytics
Credits
Foreword
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Foreword

In the last decade, we have seen the impact of exponential advances in technology on the way we work, shop, communicate, and think. At the heart of this change is our ability to collect and gain insights into data; and comments like "Data is the new oil" or "we have a Data Revolution" only amplifies the importance of data in our lives.

Tim Berners-Lee, inventor of the World Wide Web said, "Data is a precious thing and will last longer than the systems themselves." IBM recently stated that people create a staggering 2.5 quintillion bytes of data every day (that's roughly equivalent to over half a billion HD movie downloads). This information is generated from a huge variety of sources including social media posts, digital pictures, videos, retail transactions, and even the GPS tracking functions of mobile phones.

This data explosion has led to the term "Big Data" moving from an Industry buzz word to practically a household term very rapidly. Harnessing "Big Data" to extract insights is not an easy task; the potential rewards for finding these patterns are huge, but it will require technologists and data scientists to work together to solve these problems.

The book written by Sunila Gollapudi, Getting Started with Greenplum for Big Data Analytics, has been carefully crafted to address the needs of both the technologists and data scientists.

Sunila starts with providing excellent background to the Big Data problem and why new thinking and skills are required. Along with a dive deep into advanced analytic techniques, she brings out the difference in thinking between the "new" Big Data science and the traditional "Business Intelligence", this is especially useful to help understand and bridge the skill gap.

She moves on to discuss the computing side of the equation-handling scale, complexity of data sets, and rapid response times. The key here is to eliminate the "noise" in data early in the data science life cycle. Here, she talks about how to use one of the industry's leading product platforms like Greenplum to build Big Data solutions with an explanation on the need for a unified platform that can bring essential software components (commercial/open source) together backed by a hardware/appliance.

She then puts the two together to get the desired result—how to get meaning out of Big Data. In the process, she also brings out the capabilities of the R programming language, which is mainly used in the area of statistical computing, graphics, and advanced analytics.

Her easy-to-read practical style of writing with real examples shows her depth of understanding of this subject. The book would be very useful for both data scientists (who need to learn the computing side and technologies to understand) and also for those who aspire to learn data science.

V. Laxmikanth

Managing Director

Broadridge Financial Solutions (India) Private Limited

www.broadridge.com