Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Science for Marketing Analytics [Instructor Edition]
  • Table Of Contents Toc
Data Science for Marketing Analytics [Instructor Edition]

Data Science for Marketing Analytics [Instructor Edition]

By : Pranshu Bhatnagar, Tommy Blanchard, Debasish Behera
4.3 (203)
close
close
Data Science for Marketing Analytics [Instructor Edition]

Data Science for Marketing Analytics [Instructor Edition]

4.3 (203)
By: Pranshu Bhatnagar, Tommy Blanchard, Debasish Behera

Overview of this book

Data Science for Marketing Analytics covers every stage of data analytics, from working with a raw dataset to segmenting a population and modeling different parts of the population based on the segments. The course starts by teaching you how to use Python libraries, such as pandas and Matplotlib, to read data from Python, manipulate it, and create plots, using both categorical and continuous variables. Then, you'll learn how to segment a population into groups and use different clustering techniques to evaluate customer segmentation. As you make your way through the lessons, you'll explore ways to evaluate and select the best segmentation approach, and go on to create a linear regression model on customer value data to predict lifetime value. In the concluding lessons, you'll gain an understanding of regression techniques and tools for evaluating regression models, and explore ways to predict customer choice using classification algorithms. Finally, you'll apply these techniques to create a churn model for modeling customer product choices. By the end of this course, you will be able to build your own marketing reporting and interactive dashboard solutions.
Table of Contents (11 chapters)
close
close
Preface

Data Models and Structured Data

When you build an analytical solution, the first thing that you need to do is to build a data model. A data model is an overview of the data sources that you will be using, their relationships with other data sources, where exactly the data from a specific source is going to be fetched, and in what form (such as an Excel file, a database, or a JSON from an internet source).

Note

Keep in mind that the data model evolves as data sources and processes change.

A data model can contain data of the following three types:

  • Structured Data: Also known as completely structured or well-structured data, this is the simplest way to manage information. The data is arranged in a flat tabular form with the correct value corresponding to the correct attribute. There is a unique column, known as an index, for easy and quick access to the data, and there are no duplicate columns. For example, in Figure 1.1, employee_id is the unique column. Using the data in this column, you can run SQL queries and quickly access data at a specific row and column in the dataset easily. Furthermore, there are no empty rows, missing entries, or duplicate columns, thereby making this dataset quite easy to work with. What makes structured data most ubiquitous and easy to analyze is that it is stored in a standardized tabular format that makes adding, updating, deleting, and updating entries easy and programmable. With structured data, you may not have to put in much effort during the data preparation and cleaning stage.

    Data stored in relational databases such as MySQL, Amazon Redshift, and more are examples of structured data:

    Figure 1.1: Data in a MySQL table

Figure 1.1: Data in a MySQL table

  • Semi-structured data: You will not find semi-structured data to be stored in a strict, tabular hierarchy as you saw in Figure 1.1. However, it will still have its own hierarchies that group its elements and establish a relationship between them. For example, metadata of a song may include information about the cover art, the artist, song length, and even the lyrics. You can search for the artist's name and find the song you want. Such data does not have a fixed hierarchy mapping the unique column with rows in an expected format, and yet you can find the information you need.

    Another example of semi-structured data is a JSON file. JSON files are self-describing and can be understood easily. In Figure 1.2, you can see a JSON file that contains personally identifiable information of Jack Jones.

    Semi-structured data can be stored accurately in NoSQL databases.

    Figure 1.2: Data in a JSON file

Figure 1.2: Data in a JSON file

  • Unstructured data: Unstructured data may not be tabular, and even if it is tabular, the number of attributes or columns per observation may be completely arbitrary. The same data could be represented in different ways, and the attributes might not match each other, with values leaking into other parts.

    For example, think of reviews of various products stored in rows of an Excel sheet or a dump of the latest tweets of a company's Twitter profile. We can only search for specific keywords in that data, but we cannot store it in a relational database, nor will we be able to establish a concrete hierarchy between different elements or rows. Unstructured data can be stored as text files, CSV files, Excel files, images, and audio clips.

Marketing data, traditionally, comprises all three aforementioned data types. Initially, most data points originate from different data sources. This results in different implications, such as the values of a field could be of different lengths, the value for one field would not match that of other fields because of different field names, and some rows might have missing values for some of the fields.

You'll soon learn how to effectively tackle such problems with your data using Python. The following diagram illustrates what a data model for marketing analytics looks like. The data model comprises all kinds of data: structured data such as databases (top), semi-structured data such as JSON (middle), and unstructured data such as Excel files (bottom):

Figure 1.3: Data model for marketing analytics

Figure 1.3: Data model for marketing analytics

As the data model becomes complex, the probability of having bad data increases. For example, a marketing analyst working with the demographic details of a customer can mistakenly read the age of the customer as a text string instead of a number (integer). In such situations, the analysis would go haywire as the analyst cannot perform any aggregation functions, such as finding the average age of a customer. These types of situations can be overcome by having a proper data quality check to ensure that the data chosen for further analysis is of the correct data type.

This is where programming languages such as Python come into play. Python is an all-purpose general programming language that integrates with almost every platform and helps automate data production and analysis.

Apart from understanding patterns and giving at least a basic structure to data, Python forces the data model to accept the right value for the attribute. The following diagram illustrates how most marketing analytics today structure different kinds of data by passing it through scripts to make it at least semi-structured:

Figure 1.4: Data model of most marketing analytics that use Python

Figure 1.4: Data model of most marketing analytics that use Python

By making use of such structure-enforcing scripts, you will have a data model of semi-structured data coming in with expected values in the right fields; however, the data is not yet in the best possible format to perform analytics. If you can completely structure your data (that is, arrange it in flat tables, with the right value pointing to the right attribute with no nesting), it will be easy to see how every data point individually compares to other points with the help of common fields. You can easily get a feel of the data—that is, see in what range most values lie, identify the clear outliers, and so on—by simply scrolling through it.

While there are a lot of tools that can be used to convert data from an unstructured/semi-structured format to a fully structured format (for example, Spark, STATA, and SAS), the tool that is most widely used for data science, and which can be integrated with practically any framework, has rich functionalities, minimal costs, and is easy to use in our use case, is pandas.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Data Science for Marketing Analytics [Instructor Edition]
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon