Let's talk about pandas, which is one of the most exciting Python libraries, especially for people who love R and want to play around with the data in a more vectorized manner. We will devote this part of the chapter only to pandas; we will discuss some basic data manipulation and handling in pandas frames.
Let's start with one of the most important tasks in any data analysis to parse the data from a CSV/other file.
Tip
I am using https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.names
Feel free to use any other CSV file.
To begin, please download the data to your local storage from the preceding links, and load it into a pandas data-frame, as shown here:
>>>import pandas as pd >>># Please provide the absolute path of the input file >>>data = pd.read_csv("PATH\\iris.data.txt",header=0") >>>data.head()
4.9 |
3.0 |
1.4 |
0.2 |
Iris-setosa | |
---|---|---|---|---|---|
0 |
4.7 ... |