Data Flow best practices in Extract and Load
When we extract data from sources, the main focus is on getting just the changed data (inserts, updates, and deletes) since our last extraction. Reading all data (that could be millions of rows) from sources and then making a comparison row-by-row could cause a bottleneck in the system.
To get just the required data, several techniques can be applied:
Audit columns: Having a date and time column for changes or even having two columns, one for insert and another for the update of a specific record.
Checksums: Using an algorithm to create a unique identifier for a column set. Microsoft included in SQL 2008 a new SQL function that enables this technique to be applied faster and more easily.
Change Data Capture (CDC): Save logs from changes in a specific table using a higher performance approach than using triggers. This is a new functionality included in the SQL 2008 engine and now included in SSIS 2012 with a new task and two components.
Several other...