In the previous chapters of the case study, we've been skirting an issue that arises frequently when working with complex data. Files have both a logical layout and a physical format. We've been laboring under a tacit assumption that our files are in CSV format, with a layout defined by the first line of the file. In Chapter 2, we touched on file loading. In Chapter 6, we revisited loading data and partitioning it into training and testing sets.
In both previous chapters, we trusted that the data would be in a CSV format. This isn't a great assumption to make. We need to look at the alternatives and elevate our assumptions into a design choice. We also need to build in the flexibility to make changes as the context for using our application evolves.
It's common to map complex objects to dictionaries, which have a tidy JSON representation. For this reason, the
Classifier web application makes use of dictionaries. We can also parse CSV data...