Understanding invalid data, specification mismatch, and data type validation
Sometimes mistakes are made, and the data is just wrong. Especially when data is entered manually, there will be typos or values put in the wrong cells. Even if the person entering the data has 99% accuracy, that still means one mistake for every hundred values entered, and when you have millions of data points, each made up of dozens of variables, those mistakes add up. Here are a few of the more common mistakes.
Invalid data
Invalid data happens when data does not match expected values or ranges. This is usually caused by a typo but is often just a matter of format. Let’s look at an example:
City |
los angeles |
LA |
Los Angeles |
Los Angelus |
la |
los... |