Data processing and feature engineering
In this section, let's use the telecom customer churn dataset and generate the features that can be used for training the model. Let's create a notebook, call it feature-engineering.ipynb
, and install the required dependencies:
!pip install pandas sklearn python-slugify s3fs sagemaker
Once the installation of the libraries is complete, read the data. For this exercise, I have downloaded the data from Kaggle and saved it in a location where it is accessible from the notebook.
The following command reads the data from S3:
import os
import numpy as np
import pandas as pd
from slugify import slugify
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
""" If you are executing the notebook outside AWS(Local jupyter lab, google collab or kaggle etc.), please uncomment the following 3 lines of code and set the AWS credentials """
#os.environ...