Book Image

R Data Analysis Projects

By : Gopi Subramanian
Book Image

R Data Analysis Projects

By: Gopi Subramanian

Overview of this book

R offers a large variety of packages and libraries for fast and accurate data analysis and visualization. As a result, it’s one of the most popularly used languages by data scientists and analysts, or anyone who wants to perform data analysis. This book will demonstrate how you can put to use your existing knowledge of data analysis in R to build highly efficient, end-to-end data analysis pipelines without any hassle. You’ll start by building a content-based recommendation system, followed by building a project on sentiment analysis with tweets. You’ll implement time-series modeling for anomaly detection, and understand cluster analysis of streaming data. You’ll work through projects on performing efficient market data research, building recommendation systems, and analyzing networks accurately, all provided with easy to follow codes. With the help of these real-world projects, you’ll get a better understanding of the challenges faced when building data analysis pipelines, and see how you can overcome them without compromising on the efficiency or accuracy of your systems. The book covers some popularly used R packages such as dplyr, ggplot2, RShiny, and others, and includes tips on using them effectively. By the end of this book, you’ll have a better understanding of data analysis with R, and be able to put your knowledge to practical use without any hassle.
Table of Contents (9 chapters)

Understanding the recommender systems

Recommender systems or recommendation engines are a popular class of machine learning algorithms widely used today by online retail companies. With historical data about users and product interactions, a recommender system can make profitable/useful recommendations about users and their product preferences.

In the last decade, recommender systems have achieved great success with both online retailers and brick and mortar stores. They have allowed retailers to move away from group campaigns, where a group of people receive a single offer. Recommender systems technology has revolutionized marketing campaigns. Today, retailers offer a customized recommendation to each of their customers. Such recommendations can dramatically increase customer stickiness.

Retailers design and run sales campaigns to promote up-selling and cross-selling. Up-selling is a technique by which retailers try to push high-value products to their customers. Cross-selling is the practice of selling additional products to customers. Recommender systems provide an empirical method to generate recommendations for retailers up-selling and cross-selling campaigns.

Retailers can now make quantitative decisions based on solid statistics and math to improve their businesses. There are a growing number of conferences and journals dedicated to recommender systems technology, which plays a vital role today at top successful companies such as Amazon.com, YouTube, Netflix, LinkedIn, Facebook, TripAdvisor, and IMDb.

Based on the type and volume of available data, recommender systems of varying complexity and increased accuracy can be built. In the previous paragraph, we defined historical data as a user and his product interactions. Let's use this definition to illustrate the different types of data in the context of recommender systems.

Transactions

Transactions are purchases made by a customer on a single visit to a retail store. Typically, transaction data can include the products purchased, quantity purchased, the price, discount if applied, and a timestamp. A single transaction can include multiple products. It may register information about the user who made the transaction in some cases, where the customer allows the retailer to store his information by joining a rewards program.

A simplified view of the transaction data is a binary matrix. The rows of this matrix correspond to a unique identifier for a transaction; let's call it transaction ID. The columns correspond to the unique identifier for a product; let's call it product ID. The cell values are zero or one, if that product is either excluded or included in the transaction.

A binary matrix with n transactions and m products is given as follows:

Txn/Product P1 P2 P3 .... Pm
T1 0 1 1 ... 0
T2 1 1 1 .... 1
... ... ... ... ... ...
Tn o 1 1 ... 1

Weighted transactions

This is additional information added to the transaction to denote its importance, such as the profitability of the transaction as a whole or the profitability of the individual products in the transaction. In the case of the preceding binary matrix, a column called weight is added to store the importance of the transaction.

In this chapter, we will show you how to use transaction data to support cross-selling campaigns. We will see how the derived user product preferences, or recommendations from the user's product interactions (transactions/weighted transactions), can fuel successful cross-selling campaigns. We will implement and understand the algorithms that can leverage this data in R. We will work on a superficial use case in which we need to generate recommendations to support a cross-selling campaign for an imaginative retailer.

Our web application

 

Our goal, by the the end of this chapter, is to understand the concepts of association rule mining and related topics, and solve the given cross-selling campaign problem using association rule mining. We will understand how different aspects of the cross-selling campaign problem can be solved using the family of association rule mining algorithms, how to implement them in R, and finally build the web application to display our analysis and results.

We will be following a code-first approach in this book. The style followed throughout this book is to introduce a real-world problem, following which we will briefly introduce the algorithm/technique that can be used to solve this problem. We will keep the algorithm description brief. We will proceed to introduce the R package that implements the algorithm and subsequently start writing the R code to prepare the data in a way that the algorithm expects. As we invoke the actual algorithm in R and explore the results, we will get into the nitty-gritty of the algorithm. Finally, we will provide further references for curious readers.