Handling binary metadata files
Ingesting the hotdog-nothotdog
dataset was pretty straightforward since you only needed to parse the path. However, many scientific and open source datasets encode their labels into binary files. This approach reduces the data size and increases file parsing performance. You’re probably wondering whether we can still use them. Of course!
The LabelMe-12
dataset from the technical requirements section is one such example. It includes the label information in the annotation.bin
(binary) and annotation.txt
(human-readable) files under the ./data/test
and ./data/train
folders. Let’s focus on the binary file and only use the human-readable copy for troubleshooting.
We will perform the following steps to do this:
- Declare the custom
Label
enumeration. - Declare the custom
Annotation
class. - Read each image’s labels from the file.
- Confirm the expected counts are present.
- Normalize the file structure.