Designing folder structures for data transformation
Considering the variety and volume of data that will land in a data lake, it is very important to design a flexible, but a maintainable, folder structure. Badly designed or ad hoc folder structures will become a management nightmare and will render the data lake unusable. Some points to keep in mind while designing the folder structure are given here:
- Human readability—Human-readable folder structures will help improve data exploration and navigation.
- Representation of organizational structure—Aligning the folder structure according to the organizational structure helps segregate the data for billing and access control. Such a folder structure will help restrict cross-team data access.
- Distinguish sensitive data—The folder structure should be such that it can separate sensitive data from general data. Sensitive data will require higher levels of audit, privacy, and security policies, so keeping...