Each node in the cluster stores local segments of data. The number of local segments in a node is known as the scaling factor. As discussed earlier, to perform effective rebalancing when nodes are removed or added, local segments from each of the nodes redistribute themselves in the cluster in order to maintain even data distribution across the cluster.
The MAXIMUM_SKEW_PERCENT
parameter plays a crucial role when the number of segments cannot be evenly divided by the number of nodes in a new cluster. For example, if the scaling factor is 4
and there are initially 4 nodes, there will be 16 (4 x 4) segments in the whole cluster. Suppose one additional node is added to the cluster; then, it is not possible to evenly distribute 16 segments among 5 nodes. Hence, Vertica will assign more segments to some nodes as compared to others. So, one possible combination can be 4 nodes get 3 segments each and 1 node gets 4 segments. This skew is around 33...