Classifying the data type
First, we will explore how the architect can classify different types of data. Data can be classified into three different types:
- Structured data
- Semi-structured data
- Unstructured data
We will also examine various file types associated with each type of data, as different file formats have their own characteristics, benefits, and drawbacks. For each data type, a solid understanding of these file types and their features can help to optimize storage costs, retrieval speeds, and scalability.
Note that there can be some ambiguity on which file format falls under which data type. In particular, file formats such as CSV and Avro are often classified as either structured or semi-structured, depending on whom you ask and what their exact definition is. However, this exact classification is not of importance to the data architect. What is important is knowing which file type is optimal in which scenario.
Structured data
Structured data...