Getting Started with Greenplum for Big Data Analytics

By : Sunila Gollapudi
By: Sunila Gollapudi

Overview of this book

Organizations are leveraging the use of data and analytics to gain a competitive advantage over their opposition. Therefore, organizations are quickly becoming more and more data driven. With the advent of Big Data, existing Data Warehousing and Business Intelligence solutions are becoming obsolete, and a requisite for new agile platforms consisting of all the aspects of Big Data has become inevitable. From loading/integrating data to presenting analytical visualizations and reports, the new Big Data platforms like Greenplum do it all. It is now the mindset of the user that requires a tuning to put the solutions to work. "Getting Started with Greenplum for Big Data Analytics" is a practical, hands-on guide to learning and implementing Big Data Analytics using the Greenplum Integrated Analytics Platform. From processing structured and unstructured data to presenting the results/insights to key business stakeholders, this book explains it all. "Getting Started with Greenplum for Big Data Analytics" discusses the key characteristics of Big Data and its impact on current Data Warehousing platforms. It will take you through the standard Data Science project lifecycle and will lay down the key requirements for an integrated analytics platform. It then explores the various software and appliance components of Greenplum and discusses the relevance of each component at every level in the Data Science lifecycle. You will also learn Big Data architectural patterns and recap some key advanced analytics techniques in detail. The book will also take a look at programming with R and integration with Greenplum for implementing analytics. Additionally, you will explore MADlib and advanced SQL techniques in Greenplum for analytics. This book also elaborates on the physical architecture aspects of Greenplum with guidance on handling high-availability, back-up, and recovery.
Table of Contents (13 chapters)
About the Reviewers

Brian Feeny is a technologist/evangelist working with many Big Data technologies such as analytics, visualization, data mining, machine learning, and statistics. He is a graduate student in Software Engineering at Harvard University, primarily focused on data science, where he gets to work on interesting data problems using some of the latest methods and technology.

Brian works for Presidio Networked Solutions, where he helps businesses with their Big Data challenges and helps them understand how to make best use of their data.

Scott Kahler started down the path in the mid 80s when he disconnected the power LED on his Commodore 64. In this fashion he could run his handwritten Dungeons and Dragons' random character generator, and his parents wouldn't complain about the computer being on all night. Since that point of time, Scott Kahler has been involved in technology and data.

His ability to get his hands on truly large datasets happened after the year 2000 failed to end technology as we know it. Scott joined up with a bunch of talented people to launch (now playing a role as a jack-of-all-trades: Programmer, DBA, and System Administrator. It was there that he first dealt with datasets that needed to be distributed to multiple nodes to be parsed and churned on in a relatively quick amount of time. A decade later, he joined Adknowledge and helped implement their Greenplum and Hadoop infrastructures taking roles as their Big Data Architect and managing IT Operations. Scott, now works for Pivotal as a field engineer spreading the gospel of next technology paradigm, scalable distributed storage, and compute.

Alan Koskelin is a software developer living in the Madison, Wisconsin area. He has worked in many industries including biotech, healthcare, and online retail. The software, he develops, is often data-centric and his personal interests lean towards ecological, environmental, and biological data.

Alan currently works for a nonprofit organization dedicated to improving reading instruction in the primary grades.

Tuomas Nevanranta is a Business Intelligence professional in Helsinki, Finland. He has an M.Sc. in Economics and Business Administration and a B.Sc. in Business Information Technology. He is currently working in a Finnish company called Rongo.

Rongo is a leading Finnish Information Management consultancy company. Rongo helps its customers to manage, refine, and utilize information in their businesses. Rongo creates added value by offering market-leading Business Intelligence solutions containing Big Data solutions, data warehousing, master data management, reporting, and scorecards.