Book Image

Data Science for Marketing Analytics

By : Tommy Blanchard, Debasish Behera, Pranshu Bhatnagar
Book Image

Data Science for Marketing Analytics

By: Tommy Blanchard, Debasish Behera, Pranshu Bhatnagar

Overview of this book

Data Science for Marketing Analytics covers every stage of data analytics, from working with a raw dataset to segmenting a population and modeling different parts of the population based on the segments. The book starts by teaching you how to use Python libraries, such as pandas and Matplotlib, to read data from Python, manipulate it, and create plots, using both categorical and continuous variables. Then, you'll learn how to segment a population into groups and use different clustering techniques to evaluate customer segmentation. As you make your way through the chapters, you'll explore ways to evaluate and select the best segmentation approach, and go on to create a linear regression model on customer value data to predict lifetime value. In the concluding chapters, you'll gain an understanding of regression techniques and tools for evaluating regression models, and explore ways to predict customer choice using classification algorithms. Finally, you'll apply these techniques to create a churn model for modeling customer product choices. By the end of this book, you will be able to build your own marketing reporting and interactive dashboard solutions.
Table of Contents (12 chapters)
Data Science for Marketing Analytics
Preface

Using Regularization for Feature Selection


In the previous section, we saw how an evaluation metric such as the RMSE can be used to decide whether a variable should be included in a model or not. However, this method can be cumbersome when there are many variables involved.

When a model contains extraneous variables (variables that are not related to the outcome of interest), it can become more difficult to interpret the model. It can also lead to overfitting, where the model may change drastically if you use a different subset of the data to train the model. Therefore, it is important to select only those features that are related to the outcome for training the model.

One common way to select which features will be used by a model is to use regularization. The idea of regularization is that the model will be asked not only to try to predict the training points as accurately as possible, but will have the additional constraint of trying to minimize the weight that it puts on each of the variables...