Storage versus runtime in-memory versus message-passing formats
When we're talking about formats for representing data, there are a few different, complementary, yet competing things we typically are trying to optimize. We can generally (over-) simplify this by talking about three main components, as follows:
- Size—The final size of the data representation
- Serialize/deserialize speed—The performance for converting data between the formats and something that can be used in-memory for computations
- Ease of use—A catch-all category regarding readability, compatibility, features, and so on
How we choose to optimize between these components is usually going to be heavily dependent upon the use case for that format. When it comes to working with data, there are three high-level use case descriptions I tend to group most situations into: long-term storage, in-memory runtime processing, and message passing. Yes—these groupings are quite...