-
Book Overview & Buying
-
Table Of Contents
Exploratory Data Analysis with Python Cookbook
By :
When we sort data, we arrange it in a specific sequence. This specific sequence typically helps us to spot patterns very quickly. To sort a dataset, we usually must specify one or more columns to sort by and specify the order to sort by (ascending or descending order).
In pandas, the sort_values method can be used to sort a dataset.
We will work with the Marketing Campaign data (https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis) for this recipe. Alternatively, you can retrieve this from the GitHub repository.
We will sort data using the pandas library:
pandas library:import pandas as pd
.csv file into a dataframe using read_csv. Then, subset the dataframe to include only relevant columns:marketing_data = pd.read_csv("data/marketing_campaign.csv")marketing_data = marketing_data[['ID','Year_Birth', 'Education','Marital_Status','Income','Kidhome', 'Teenhome', 'Dt_Customer', 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]
transpose (T) to show more information. Also, check the data types as well as the number of columns and rows:marketing_data.head(2).T
0 1
ID 5524 2174
Year_Birth 1957 1954
Education Graduation Graduation
… … …
NumWebVisitsMonth 7 5
marketing_data.dtypes
ID int64
Year_Birth int64
Education object
… …
NumWebVisitsMonth int64
marketing_data.shape
(2240, 11)
sorted_data = marketing_data.sort_values('NumStorePurchases', ascending=False)sorted_data[['ID','NumStorePurchases']]
ID NumStorePurchases
1187 9855 13
803 9930 13
1144 819 13
286 10983 13
1150 1453 13
... ... ...
164 8475 0
2214 9303 0
27 5255 0
1042 10749 0
2132 11181 0
Great! We have sorted our dataset.
We refer to pandas as pd in step 1. In step 2, we use read_csv to load the .csv file into a pandas dataframe and call it marketing_data. We also subset the dataframe to include only 11 relevant columns. In step 3, we inspect the dataset using head(2) to see the first two rows in the dataset; we also use transpose (T) along with head to transform the rows into columns due to the size of the data (i.e., it has many columns). We use the dtypes attribute of the dataframe to show the data types of all columns. Numeric data has int and float data types while character data has the object data type. We inspect the number of rows and columns using shape, which returns a tuple that displays the number of rows as the first element and the number of columns as the second element.
In step 4, we apply the sort_values method to sort the NumStorePurchases column. Using the sort values method, we sort NumStorePurchases in descending order. The method takes two arguments, the dataframe column to be sorted and the sorting order. false indicates a sort in descending order while true indicates a sort in ascending order.
Sorting can be done across multiple columns in pandas. We can sort based on multiple columns by supplying columns as a list in the sort_values method. The sort will be performed in the order in which the columns are supplied – that is, column 1 first, then column 2 next, and subsequent columns. Also, a sort isn’t limited to numerical columns alone; it can be used for columns containing characters.