Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy.
Every table in Greenplum has a data distribution method, the
DISTRIBUTED BY clause helps define the distribution strategy. We need to ensure that there is no data skew introduced on any of the segment hosts as a result of the distribution key defined.
There are two methods of distributing table data across segment hosts:
DISTRIBUTED BY (column name(s))