It's a fact that data from the real world is not perfect; it has errors. We already saw that errors in data can cause our transformations to crash. We also learned how to detect and report errors while avoiding undesirable situations. The main problem is that in doing so, we discard data that may be important. Sometimes the errors are not so severe; in fact, there is a possibility that we can fix them so that we don't loose data. Let's see some examples:
You have a field defined as a string, and that field represents the date of birth of a person. As values, you have, besides valid dates, other strings for example
N/A
,-
,???
, and so on. Any attempt to run a calculation with these values would lead to an error.You have two dates representing the start date and end date of the execution of a task. Suppose that you have 2013-01-05 and 2012-10-31 as the start date and end date respectively. They are well-formatted dates, but if you try to...