Introduction to the problem and the dataset
In this exercise, we will use the telecom customer churn dataset, which is available on Kaggle at the URL https://www.kaggle.com/datasets/blastchar/telco-customer-churn. The aim of the exercise is to use this dataset, prepare the data for model training, and train an XGBoost model to predict customer churn. The dataset has 21 columns and the column names are self-explanatory. The following is a preview of the dataset:
Figure 8.1 shows the labeled telecom customer churn dataset. The customerID
column is the ID of the customers. All other columns except Churn
represent the set of attributes, and the Churn
column is the target column.
Let's get our hands dirty and perform feature engineering next.