Getting Started with Greenplum for Big Data Analytics

Getting Started with Greenplum for Big Data Analytics

By : Sunila Gollapudi

Buy this Book

Getting Started with Greenplum for Big Data Analytics

By: Sunila Gollapudi

Buy this Book

Overview of this book

Organizations are leveraging the use of data and analytics to gain a competitive advantage over their opposition. Therefore, organizations are quickly becoming more and more data driven. With the advent of Big Data, existing Data Warehousing and Business Intelligence solutions are becoming obsolete, and a requisite for new agile platforms consisting of all the aspects of Big Data has become inevitable. From loading/integrating data to presenting analytical visualizations and reports, the new Big Data platforms like Greenplum do it all. It is now the mindset of the user that requires a tuning to put the solutions to work. "Getting Started with Greenplum for Big Data Analytics" is a practical, hands-on guide to learning and implementing Big Data Analytics using the Greenplum Integrated Analytics Platform. From processing structured and unstructured data to presenting the results/insights to key business stakeholders, this book explains it all. "Getting Started with Greenplum for Big Data Analytics" discusses the key characteristics of Big Data and its impact on current Data Warehousing platforms. It will take you through the standard Data Science project lifecycle and will lay down the key requirements for an integrated analytics platform. It then explores the various software and appliance components of Greenplum and discusses the relevance of each component at every level in the Data Science lifecycle. You will also learn Big Data architectural patterns and recap some key advanced analytics techniques in detail. The book will also take a look at programming with R and integration with Greenplum for implementing analytics. Additionally, you will explore MADlib and advanced SQL techniques in Greenplum for analytics. This book also elaborates on the physical architecture aspects of Greenplum with guidance on handling high-availability, back-up, and recovery.

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Big Data, Analytics, and Data Science Life Cycle

References/Further reading

Summary

Greenplum Unified Analytics Platform (UAP)

Big Data analytics – platform requirements

Greenplum Unified Analytics Platform (UAP)

Greenplum UAP components

Greenplum Data Computing Appliance (DCA)

Greenplum Data Integration Accelerator (DIA)

References/Further reading

Summary

Advanced Analytics – Paradigms, Tools, and Techniques

Weka

In-database analytics using MADlib

References/Further reading

Summary

Implementing Analytics with Greenplum UAP

Data loading for Greenplum Database and HD

Greenplum table distribution and partitioning

Data Computing Appliance (DCA)

Greenplum Database management

In-database analytics options (Greenplum-specific)

Using R with Greenplum

Using Weka with Greenplum

Using MADlib with Greenplum

Using Greenplum Chorus

Pivotal

References/Further reading

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Summary

In this chapter, we have explored various implementation aspects of Greenplum UAP. We started with understanding data loading strategies for Greenplum and HD. We have looked at loading data into Greenplum using internal utilities and functions such as gpload and gpfdist and also using Informatica PowerExchange connector. For HD, we have explored Hive and Greenplum bulk loader utility.

We moved on to take a dive deep into distribution and partitioning aspects of Greenplum along with strategies for querying Greenplum and HD. We have looked at various functions such as ANALYZE and EXPLAIN to optimize the queries and interpretation of query plans. Finally, we have explored some in-database analytics options with Greenplum (using Windows function, integrating MADlib, and using PL/R). At the end of this chapter, readers should be fairly familiar with various implementation aspects of Greenplum in conjunction with Hadoop for implementing data storage and analytics for Big Data.

Getting Started with Greenplum for Big Data Analytics

By : Sunila Gollapudi

Getting Started with Greenplum for Big Data Analytics

By: Sunila Gollapudi

Overview of this book

Related Content you might be interested in

Current Title:

Getting Started with Greenplum for Big Data Analytics

Summary