Validating data
It's a fact that data from the real world has errors. In Chapter 2, Getting Started with Transformations, we saw that errors in data can cause a Transformation to crash, and we learned to deal with them. There are other kinds of issues that don't cause the Transformation to abort but don't respect business rules. This section is about detecting these kinds of issues and reporting them.
Validating data with PDI
Validating data is about ensuring that incoming data contains expected values. There are several kinds of constraints that we may need to impose on our data. The following are just some examples:
- A field must contain only digits
- A date field must be formatted as
MM-dd-yyyyy
- A field must be either
YES
orNO
- The value of a field must exist in a reference table
If a field doesn't respect theses rules or constraints, we have to proceed somehow. Some options are as follows:
- Reporting the error to the log
- Inserting the inconsistency into a dedicated table
- Writing the line with the...