Structured versus unstructured data
The first question we want to ask ourselves about an entire dataset is whether we are working with structured or unstructured data. The answer to this question can mean the difference between needing three days or three weeks to perform a proper analysis.
The basic breakdown is as follows (this is a rehashed definition of organized and unorganized data from Chapter 1):
- Structured (that is, organized) data: This is data that can be thought of as observations and characteristics. It is usually organized using a table method (rows and columns) that can be organized in a spreadsheet format or a relational database.
- Unstructured (that is, unorganized) data: This data exists as a free entity and does not follow any standard organization hierarchy such as images, text, or videos.
Here are a few examples that could help you differentiate between the two:
- Most data that exists in text form, including server logs and Facebook...