-
Book Overview & Buying
-
Table Of Contents
Data Engineering Best Practices
By :
We will begin with what others have said about data wrangling with a formal definition:
“Data wrangling also called data cleaning, data remediation, or data munging refers to a variety of processes designed to transform raw data into more readily used formats. The exact methods differ from project to project depending on the data you’re leveraging and the goal you’re trying to achieve.” (Harvard Business School {https://packt-debp.link/dZ7j9f})
We want to underline that in order to future-proof your data engineered solution, you have to allow data to be useful to the consumer, and often the data scientist, so it is easier to consume. Data should be easily available and not require reworking. Data should be classed and then have interfaces created for it. These are to be retained in a data catalog as part of them being made fit for purpose. The effect of not wrangling data correctly for its use comes out in the following quotation:
...