Practical Exercises: Chapter 9
Exercise 9.1: Data Cleaning
You have a dataset with missing values and outliers.
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', None, 'Eve'],
'Age': [25, np.nan, 35, 40, 50],
'Salary': [50000, 70000, 120000, 110000, 90000],
'Experience': [2, 10, np.nan, 7, 15]}
df = pd.DataFrame(data)
- Remove rows where Name is missing.
- Fill missing values in the Age and Experience columns with their respective means.
Solution
# Remove rows where Name is missing
df.dropna(subset=['Name'], inplace=True)
# Fill missing values with mean
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Experience'].fillna(df['Experience'].mean(), inplace=True)
Exercise 9.2: Feature Engineering
Create a new feature called AgeGroup in the above DataFrame...