Book Image

Data Science for Marketing Analytics - Second Edition

By : Mirza Rahim Baig, Gururajan Govindan, Vishwesh Ravi Shrimali
Book Image

Data Science for Marketing Analytics - Second Edition

By: Mirza Rahim Baig, Gururajan Govindan, Vishwesh Ravi Shrimali

Overview of this book

Unleash the power of data to reach your marketing goals with this practical guide to data science for business. This book will help you get started on your journey to becoming a master of marketing analytics with Python. You'll work with relevant datasets and build your practical skills by tackling engaging exercises and activities that simulate real-world market analysis projects. You'll learn to think like a data scientist, build your problem-solving skills, and discover how to look at data in new ways to deliver business insights and make intelligent data-driven decisions. As well as learning how to clean, explore, and visualize data, you'll implement machine learning algorithms and build models to make predictions. As you work through the book, you'll use Python tools to analyze sales, visualize advertising data, predict revenue, address customer churn, and implement customer segmentation to understand behavior. By the end of this book, you'll have the knowledge, skills, and confidence to implement data science and machine learning techniques to better understand your marketing data and improve your decision-making.
Table of Contents (11 chapters)
Preface

About the Book

Unleash the power of data to reach your marketing goals with this practical guide to data science for business.

This book will help you get started on your journey to becoming a master of marketing analytics with Python. You'll work with relevant datasets and build your practical skills by tackling engaging exercises and activities that simulate real-world market analysis projects.

You'll learn to think like a data scientist, build your problem-solving skills, and discover how to look at data in new ways to deliver business insights and make intelligent data-driven decisions.

As well as learning how to clean, explore, and visualize data, you'll implement machine learning algorithms and build models to make predictions. As you work through the book, you'll use Python tools to analyze sales, visualize advertising data, predict revenue, address customer churn, and implement customer segmentation to understand behavior.

This second edition has been updated to include new case studies that bring a more application-oriented approach to your marketing analytics journey. The code has also been updated to support the latest versions of Python and the popular data science libraries that have been used in the book. The practical exercises and activities have been revamped to prepare you for the real-world problems that marketing analysts need to solve. This will show you how to create a measurable impact on businesses large and small.

By the end of this book, you'll have the knowledge, skills, and confidence to implement data science and machine learning techniques to better understand your marketing data and improve your decision-making.

About the Authors

Mirza Rahim Baig is an avid problem solver who uses deep learning and artificial intelligence to solve complex business problems. He has more than a decade of experience in creating value from data, harnessing the power of the latest in machine learning and AI with proficiency in using unstructured and structured data across areas like marketing, customer experience, catalog, supply chain, and other e-commerce sub-domains. Rahim is also a teacher - designing, creating, teaching data science for various learning platforms. He loves making the complex easy to understand. He is also an author of The Deep Learning Workshop, a hands-on guide to start your deep learning journey and build your own next-generation deep learning models.

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision making and machine learning with Python.

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV, deep learning, and computer vision. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar. He is also an author of The Computer Vision Workshop, a book focusing on OpenCV and its applications in real-world scenarios; as well as, Machine Learning for OpenCV (2nd Edition) - which introduces how to use OpenCV for machine learning applications.

Who This Book Is For

This marketing book is for anyone who wants to learn how to use Python for cutting-edge marketing analytics. Whether you're a developer who wants to move into marketing, or a marketing analyst who wants to learn more sophisticated tools and techniques, this book will get you on the right path. Basic prior knowledge of Python is required to work through the exercises and activities provided in this book.

About the Chapters

Chapter 1, Data Preparation and Cleaning, teaches you skills related to data cleaning along with various data preprocessing techniques using real-world examples.

Chapter 2, Data Exploration and Visualization, teaches you how to explore and analyze data with the help of various aggregation techniques and visualizations using Matplotlib and Seaborn.

Chapter 3, Unsupervised Learning and Customer Segmentation, teaches you customer segmentation, one of the most important skills for a data science professional in marketing. You will learn how to use machine learning to perform customer segmentation with the help of scikit-learn. You will also learn to evaluate segments from a business perspective.

Chapter 4, Evaluating and Choosing the Best Segmentation Approach, expands your repertoire to various advanced clustering techniques and teaches principled numerical methods of evaluating clustering performance.

Chapter 5, Predicting Customer Revenue using Linear Regression, gets you started on predictive modeling of quantities by introducing you to regression and teaching simple linear regression in a hands-on manner using scikit-learn.

Chapter 6, More Tools and Techniques for Evaluating Regression Models, goes into more details of regression techniques, along with different regularization methods available to prevent overfitting. You will also discover the various evaluation metrics available to identify model performance.

Chapter 7, Supervised Learning: Predicting Customer Churn, uses a churn prediction problem as the central problem statement throughout the chapter to cover different classification algorithms and their implementation using scikit-learn.

Chapter 8, Fine-Tuning Classification Algorithms, introduces support vector machines and tree-based classifiers along with the evaluation metrics for classification algorithms. You will also learn about the process of hyperparameter tuning which will help you obtain better results using these algorithms.

Chapter 9, Multiclass Classification Algorithms, introduces a multiclass classification problem statement and the classifiers that can be used to solve such problems. You will learn about imbalanced datasets and their treatment in detail. You will also discover the micro- and macro-evaluation metrics available in scikit-learn for these classifiers.

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, and, user input are shown as follows:

"df.head(n) will return the first n rows of the DataFrame. If no n is passed, the function considers n to be 5 by default."

Words that you see on the screen, for example, in menus or dialog boxes, also appear in the same format.

A block of code is set as follows:

sales = pd.read_csv("sales.csv")

sales.head()

New important words are shown like this: "a box plot is used to depict the distribution of numerical data and is primarily used for comparisons".

Key parts of code snippets are emboldened as follows:

df1 = pd.read_csv("timeSpent.csv")

Code Presentation

Lines of code that span multiple lines are split using a backslash (\). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

For example,

df = pd.DataFrame({'Currency': pd.Series(['USD','EUR','GBP']),\

                  'ValueInINR': pd.Series([70, 89, 99])})

df = pd.DataFrame.from_dict({'Currency': ['USD','EUR','GBP'],\

                            'ValueInINR':[70, 89, 99]})

df.head()

Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the # symbol, as follows:

# Importing the matplotlib library

import matplotlib.pyplot as plt

#Declaring the color of the plot as gray

plt.bar(sales['Product line'], sales['Revenue'], color='gray')

Multi-line comments are used as follows:

"""

Importing classification report and confusion matrix from sklearn metrics

"""

from sklearn.metrics import classification_report

from sklearn.metrics import precision_recall_fscore_support

Minimum Hardware Requirements

For an optimal experience, we recommend the following hardware configuration:

  • Processor: Dual Core or better
  • Memory: 4 GB RAM
  • Storage: 10 GB available space

Downloading the Code Bundle

Download the code files from GitHub at https://packt.link/59F3X. Refer to these code files for the complete code bundle. The files here contain the exercises, activities, and some intermediate code for each chapter. This can be a useful reference when you become stuck.

On the GitHub repo's page, you can click the green Code button and then click the Download ZIP option to download the complete code as a ZIP file to your disk (refer to Figure 0.1). You can then extract these code files to a folder of your choice, for example, C:\Code.

Figure 0.1: Download ZIP option

Figure 0.1: Download ZIP option on GitHub

On your system, the extracted ZIP file should contain all the files present in the GitHub repository:

Figure 0.2: GitHub code directory structure

Figure 0.2: GitHub code directory structure (Windows Explorer)

Setting Up Your Environment

Before you explore the book in detail, you need to set up specific software and tools. In the following section, you shall see how to do that.

Installing Anaconda on Your System

The code for all the exercises and activities in this book can be executed using Jupyter Notebooks. You'll first need to install the Anaconda Navigator, which is an interface through which you can access your Jupyter Notebooks. Anaconda Navigator will be installed as a part of Anaconda Individual Edition, which is an open-source Python distribution platform available for Windows, macOS, and Linux. Installing Anaconda will also install Python. Head to https://www.anaconda.com/distribution/.

  1. From the page that opens, click the Download button (annotated by 1). Make sure you are downloading the Individual Edition.

    Figure 0.3: Anaconda homepage

    Figure 0.3: Anaconda homepage

  2. The installer should start downloading immediately. The website will, by default, choose an installer based on your system configuration. If you prefer downloading Anaconda for a different operating system (Windows, macOS, or Linux) and system configuration (32- or 64-bit), click the Get Additional Installers link at the bottom of the box (refer to Figure 0.3). The page should scroll down to a section (refer to Figure 0.4) that lets you choose from various options based on the operating system and configuration you desire. For this book, it is recommended that you use the latest version of Python (3.8 or higher).Figure 0.4: Downloading Anaconda based on the OS

    Figure 0.4: Downloading Anaconda Installers based on the OS

  3. Follow the installation steps presented on the screen. Figure 0.5: Anaconda setup

    Figure 0.5: Anaconda setup

  4. On Windows, if you've never installed Python on your system before, you can select the checkbox that prompts you to add Anaconda to your PATH. This will let you run Anaconda-specific commands (like conda) from the default command prompt. If you have Python installed or had installed an earlier version of Anaconda in the past, it is recommended that you leave it unchecked (you may run Anaconda commands from the Anaconda Prompt application instead). The installation may take a while depending on your system configuration.Figure 0.6: Anaconda installation steps

    Figure 0.6: Anaconda installation steps

    For more detailed instructions, you may refer to the official documentation for Linux by clicking this link (https://docs.anaconda.com/anaconda/install/linux/), macOS using this link (https://docs.anaconda.com/anaconda/install/mac-os/), and Windows using this link (https://docs.anaconda.com/anaconda/install/windows/).

  5. To check if Anaconda Navigator is correctly installed, look for Anaconda Navigator in your applications. Look for an application that has the following icon. Depending on your operating system, the icon's aesthetics may vary slightly.Figure 0.7: Anaconda Navigator icon

    Figure 0.7: Anaconda Navigator icon

    You can also search for the application using your operating system's search functionality. For example, on Windows 10, you can use the Windows Key + S combination and type in Anaconda Navigator. On macOS, you can use Spotlight search. On Linux, you can open the terminal and type the anaconda-navigator command and press the return key.

    Figure 0.8: Searching for Anaconda Navigator on Windows 10

    Figure 0.8: Searching for Anaconda Navigator on Windows 10

    For detailed steps on how to verify if Anaconda Navigator is installed, refer to the following link: https://docs.anaconda.com/anaconda/install/verify-install/.

  6. Click the icon to open Anaconda Navigator. It may take a while to load for the first time, but upon successful installation, you should see a similar screen:

    Figure 0.9: Anaconda Navigator screen

Figure 0.9: Anaconda Navigator screen

If you have more questions about the installation process, you may refer to the list of frequently asked questions from the Anaconda documentation: https://docs.anaconda.com/anaconda/user-guide/faq/.

Launching Jupyter Notebook

Once the Anaconda Navigator is open, you can launch the Jupyter Notebook interface from this screen. The following steps will show you how to do that:

  1. Open Anaconda Navigator. You should see the following screen:Figure 0.10: Anaconda Navigator screen

    Figure 0.10: Anaconda Navigator screen

  2. Now, click Launch under the Jupyter Notebook panel to start the notebook interface on your local system.Figure 0.11: Jupyter notebook launch option

    Figure 0.11: Jupyter notebook launch option

  3. On clicking the Launch button, you'll notice that even though nothing changes in the window shown in the preceding screenshot, a new tab opens up in your default browser. This is known as the Notebook Dashboard. It will, by default, open to your root folder. For Windows users, this path would be something similar to C:\Users\<username>. On macOS and Linux, it will be /home/<username>/.Figure 0.12: Notebook dashboard

    Figure 0.12: Notebook dashboard

    Note that you can also open a Jupyter Notebook by simply running the command jupyter notebook in the terminal or command prompt. Or you can search for Jupyter Notebook in your applications just like you did in Figure 0.8.

  4. You can use this Dashboard as a file explorer to navigate to the directory where you have downloaded or stored the code files for the book (refer to the Downloading the Code Bundle section on how to download the files from GitHub). Once you have navigated to your desired directory, you can start by creating a new Notebook. Alternatively, if you've downloaded the code from our repository, you can open an existing Notebook as well (Notebook files will have a .inpyb extension). The menus here are quite simple to use: Figure 0.13: Jupyter notebook navigator menu options walkthrough

    Figure 0.13: Jupyter notebook navigator menu options walkthrough

    If you make any changes to the directory using your operating system's file explorer and the changed file isn't showing up in the Jupyter Notebook Navigator, click the Refresh Notebook List button (annotated as 1). To quit, click the Quit button (annotated as 2). To create a new file (a new Jupyter Notebook), you can click the New button (annotated as 3).

  5. Clicking the New button will open a dropdown menu as follows:Figure 0.14: Creating a new Jupyter notebook

Figure 0.14: Creating a new Jupyter notebook

Note

A detailed tutorial on the interface and the keyboard shortcuts for Jupyter Notebooks can be found here: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html.

You can get started and create your first notebook by selecting Python 3; however, it is recommended that you also set up the virtual environment we've provided. Installing the environment will also install all the packages required for running the code in this book. The following section will show you how to do that.

Installing the ds-marketing Virtual Environment

As you run the code for the exercises and activities, you'll notice that even after installing Anaconda, there are certain libraries like kmodes which you'll need to install separately as you progress in the book. Then again, you may already have these libraries installed, but their versions may be different from the ones we've used, which may lead to varying results. That's why we've provided an environment.yml file with this book that will:

  1. Install all the packages and libraries required for this book at once.
  2. Make sure that the version numbers of your libraries match the ones we've used to write the code for this book.
  3. Make sure that the code you write based on this book remains separate from any other coding environment you may have.

You can download the environment.yml file by clicking the following link: http://packt.link/dBv1k.

Save this file, ideally in the same folder where you'll be running the code for this book. If you've downloaded the code from GitHub as detailed in the Downloading the Code Bundle section, this file should already be present in the parent directory, and you won't need to download it separately.

To set up the environment, follow these steps:

  1. On macOS, open Terminal from the Launchpad (you can find more information about Terminal here: https://support.apple.com/en-in/guide/terminal/apd5265185d-f365-44cb-8b09-71a064a42125/mac). On Linux, open the Terminal application that's native to your distribution. On Windows, you can open the Anaconda Prompt instead by simply searching for the application. You can do this by opening the Start menu and searching for Anaconda Prompt. Figure 0.15: Searching for Anaconda Prompt on Windows

    Figure 0.15: Searching for Anaconda Prompt on Windows

    A new terminal like the following should open. By default, it will start in your home directory:

    Figure 0.16: Anaconda terminal prompt

    Figure 0.16: Anaconda terminal prompt

    In the case of Linux, it would look like the following:

    Figure 0.17: Terminal in Linux

    Figure 0.17: Terminal in Linux

  2. In the terminal, navigate to the directory where you've saved the environment.yml file on your computer using the cd command. Say you've saved the file in Documents\Data-Science-for-Marketing-Analytics-Second-Edition. In that case, you'll type the following command in the prompt and press Enter:

    cd Documents\Data-Science-for-Marketing-Analytics-Second-Edition

    Note that the command may vary slightly based on your directory structure and your operating system.

  3. Now that you've navigated to the correct folder, create a new conda environment by typing or pasting the following command in the terminal. Press Enter to run the command.

    conda env create -f environment.yml

    This will install the ds-marketing virtual environment along with the libraries that are required to run the code in this book. In case you see a prompt asking you to confirm before proceeding, type y and press Enter to continue creating the environment. Depending on your system configuration, it may take a while for the process to complete.

    Note

    For a complete list of conda commands, visit the following link: https://conda.io/projects/conda/en/latest/index.html. For a detailed guide on how to manage conda environments, please visit the following link: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html.

  4. Once complete, type or paste the following command in the shell to activate the newly installed environment, ds-marketing.

    conda activate ds-marketing

    If the installation is successful, you'll see the environment name in brackets change from base to ds-marketing:

    Figure 0.18: Environment name showing up in the shell

    Figure 0.18: Environment name showing up in the shell

  5. Run the following command to install ipykernel in the newly activated conda environment:

    pip install ipykernel

    Note

    On macOS and Linux, you'll need to specify pip3 instead of pip.

  6. In the same environment, run the following command to add ipykernel as a Jupyter kernel:

    python -m ipykernel install --user --name=ds-marketing

  7. Windows only: If you're on Windows, type or paste the following command. Otherwise, you may skip this step and exit the terminal.

    conda install pywin32

  8. Select the created kernel ds-marketing when you start your Jupyter notebook.Figure 0.19: Selecting the ds-marketing kernel

Figure 0.19: Selecting the ds-marketing kernel

A new tab will open with a fresh untitled Jupyter notebook where you can start writing your code:

Figure 0.20: A new Jupyter notebook

Figure 0.20: A new Jupyter notebook

Running the Code Online Using Binder

You can also try running the code files for this book in a completely online environment through an interactive Jupyter Notebook interface called Binder. Along with the individual code files that can be downloaded locally, we have provided a link that will help you quickly access the Binder version of the GitHub repository for the book. Using this link, you can run any of the .inpyb code files for this book in a cloud-based online interactive environment. Click the following link to open the online Binder version of the book's repository to give it a try: https://packt.link/GdQOp. It is recommended that you save the link in your browser bookmarks for future reference (you may also use the launch binder link provided in the README section of the book's GitHub page).

Depending on your internet connection, it may take a while to load, but once loaded, you'll get the same interface as you would when running the code in a local Jupyter Notebook (all your shortcuts should work as well):

Figure 0.21: Binder lets you run Jupyter Notebooks in a browser

Figure 0.21: Binder lets you run Jupyter Notebooks in a browser

Binder is an online service that helps you read and execute Jupyter Notebook files (.inpyb) present in any public GitHub repository in a cloud-based environment. However, please note that there are certain memory constraints associated with Binder. This means that running multiple Jupyter Notebooks instances at the same time or running processes that consume a lot of memory (like model training) can result in a kernel crash or kernel reset. Moreover, any changes you make in these online Notebooks would not be stored, and the Notebooks will reset to the latest version present in the repository whenever you close and re-open the Binder link. A stable internet connection is required to use Binder. You can find out more about the Binder Project here: https://jupyter.org/binder.

This is a recommended option for readers who want to have a quick look at the code and experiment with it without downloading the entire repository on their local machine.

Get in Touch

Feedback from our readers is always welcome.

General feedback: If you have any questions about this book, please mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you could report this to us. Please visit www.packtpub.com/support/errata and complete the form.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you could provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit https://authors.packtpub.com/.

Please Leave a Review

Let us know what you think by leaving a detailed, impartial review on Amazon. We appreciate all feedback – it helps us continue to make great products and help aspiring developers build their skills. Please spare a few minutes to give your thoughts – it makes a big difference to us. You can leave a review by clicking the following link: https://packt.link/r/1800560478.

To Azra, Aiza, Duha and Aidama - you inspire courage, strength, and grace.

- Mirza Rahim Baig

To Appa, Amma, Vindhya, Madhu, and Ishan - The Five Pillars of my life.

- Gururajan Govindan

To Nanaji, Dadaji, and Appa - for their wisdom, inspiration, and unconditional love.

- Vishwesh Ravi Shrimali