As an example of unstructured data, we have pulled some sample server logs from a public source and included them in a text document. We can take a glimpse of what this unstructured data looks like, so we can recognize it in the future:
# Import our data manipulation tool, Pandas import pandas as pd # Create a pandas DataFrame from some unstructured Server Logs logs = pd.read_table('../data/server_logs.txt', header=None, names=['Info']) # header=None, specifies that the first line of data is the first data point, not a column name # names=['Info] is me setting the column name in our DataFrame for easier access
We created a DataFrame in pandas called logs
that hold our server logs. To take a look, let's call the .head()
method to look at the first few rows:
# Look at the first 5 rows
logs.head()
This will show us a table of the first 5 rows in our logs DataFrame as follows:
Info | |
0 | 64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] ... |
1 | 64.242.88.10 - - ... |