Teradata Database distributes rows to AMPs based on the hash values generated by the hash map algorithm. This data distribution algorithm is easy and efficient. The primary index of the table defines where the data will reside in which AMP. The uniqueness of this primary index becomes very important when distributing rows to AMPs.
If data is skewed, meaning residing on one or few AMPs, a query using the index becomes slow in processing and also causes space issues in the database:
This uniqueness of rows can be generated using the following methods:
- Based on original data
- ETL process, outside Teradata
The third method that we will be using is the identity column. The identity column feature allows for generating unique numbers for each row and inserting values into a column defined in the table. It is like generating sequence numbers automatically, which helps in preserving the uniqueness of the table.