In this chapter, we have explored various implementation aspects of Greenplum UAP. We started with understanding data loading strategies for Greenplum and HD. We have looked at loading data into Greenplum using internal utilities and functions such as gpload
and gpfdist
and also using Informatica PowerExchange connector. For HD, we have explored Hive and Greenplum bulk loader utility.
We moved on to take a dive deep into distribution and partitioning aspects of Greenplum along with strategies for querying Greenplum and HD. We have looked at various functions such as ANALYZE
and EXPLAIN
to optimize the queries and interpretation of query plans. Finally, we have explored some in-database analytics options with Greenplum (using Windows function, integrating MADlib, and using PL/R). At the end of this chapter, readers should be fairly familiar with various implementation aspects of Greenplum in conjunction with Hadoop for implementing data storage and analytics for Big Data.