R Data Analysis Projects

R Data Analysis Projects

Overview of this book

R offers a large variety of packages and libraries for fast and accurate data analysis and visualization. As a result, it’s one of the most popularly used languages by data scientists and analysts, or anyone who wants to perform data analysis. This book will demonstrate how you can put to use your existing knowledge of data analysis in R to build highly efficient, end-to-end data analysis pipelines without any hassle. You’ll start by building a content-based recommendation system, followed by building a project on sentiment analysis with tweets. You’ll implement time-series modeling for anomaly detection, and understand cluster analysis of streaming data. You’ll work through projects on performing efficient market data research, building recommendation systems, and analyzing networks accurately, all provided with easy to follow codes. With the help of these real-world projects, you’ll get a better understanding of the challenges faced when building data analysis pipelines, and see how you can overcome them without compromising on the efficiency or accuracy of your systems. The book covers some popularly used R packages such as dplyr, ggplot2, RShiny, and others, and includes tips on using them effectively. By the end of this book, you’ll have a better understanding of data analysis with R, and be able to put your knowledge to practical use without any hassle.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Association Rule Mining

Understanding the recommender systems

Retailer use case and data

Association rule mining

The cross-selling campaign

Weighted association rule mining

Hyperlink-induced topic search (HITS)

Negative association rules

Rules visualization

Wrapping up

Summary

Fuzzy Logic Induced Content-Based Recommendation

Introducing content-based recommendation

News aggregator use case and data

Designing the content-based recommendation engine

Complete R Code

Summary

Collaborative Filtering

Collaborative filtering

Recommenderlab package

Use case and data

Designing and implementing collaborative filtering

Complete R Code

Summary

Taming Time Series Data Using Deep Neural Networks

Time series data

Deep neural networks

Introduction to the MXNet R package

Symbolic programming in MXNet

Training test split

Complete R code

Summary

Twitter Text Sentiment Classification Using Kernel Density Estimates

Kernel density estimation

Twitter text

Sentiment classification

Dictionary based scoring

Text pre-processing

Building a sentiment classifier

Assembling an RShiny application

Complete R code

Summary

Record Linkage - Stochastic and Machine Learning Approaches

Introducing our use case

Demonstrating the use of RecordLinkage package

Stochastic record linkage

Machine learning-based record linkage

Building an RShiny application

Complete R code

Summary

Streaming Data Clustering Analysis in R

Streaming data and its challenges

Introducing stream clustering

Introducing the stream package

Use case and data

Complete R code

Summary

Analyze and Understand Networks Using R

Graphs in R

Use case and data

Data preparation

Product network analysis

Building a RShiny application

The complete R script

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Understanding the recommender systems

Recommender systems or recommendation engines are a popular class of machine learning algorithms widely used today by online retail companies. With historical data about users and product interactions, a recommender system can make profitable/useful recommendations about users and their product preferences.

In the last decade, recommender systems have achieved great success with both online retailers and brick and mortar stores. They have allowed retailers to move away from group campaigns, where a group of people receive a single offer. Recommender systems technology has revolutionized marketing campaigns. Today, retailers offer a customized recommendation to each of their customers. Such recommendations can dramatically increase customer stickiness.

Retailers design and run sales campaigns to promote up-selling and cross-selling. Up-selling is a technique by which retailers try to push high-value products to their customers. Cross-selling is the practice of selling additional products to customers. Recommender systems provide an empirical method to generate recommendations for retailers up-selling and cross-selling campaigns.

Retailers can now make quantitative decisions based on solid statistics and math to improve their businesses. There are a growing number of conferences and journals dedicated to recommender systems technology, which plays a vital role today at top successful companies such as Amazon.com, YouTube, Netflix, LinkedIn, Facebook, TripAdvisor, and IMDb.

Based on the type and volume of available data, recommender systems of varying complexity and increased accuracy can be built. In the previous paragraph, we defined historical data as a user and his product interactions. Let's use this definition to illustrate the different types of data in the context of recommender systems.

Transactions

Transactions are purchases made by a customer on a single visit to a retail store. Typically, transaction data can include the products purchased, quantity purchased, the price, discount if applied, and a timestamp. A single transaction can include multiple products. It may register information about the user who made the transaction in some cases, where the customer allows the retailer to store his information by joining a rewards program.

A simplified view of the transaction data is a binary matrix. The rows of this matrix correspond to a unique identifier for a transaction; let's call it transaction ID. The columns correspond to the unique identifier for a product; let's call it product ID. The cell values are zero or one, if that product is either excluded or included in the transaction.

A binary matrix with n transactions and m products is given as follows:

Txn/Product	_P1	_P2	_P3	....	_Pm
_T1	0	1	1	...	0
_T2	1	1	1	....	1
...	...	...	...	...	...
_Tn	o	1	1	...	1

Weighted transactions

This is additional information added to the transaction to denote its importance, such as the profitability of the transaction as a whole or the profitability of the individual products in the transaction. In the case of the preceding binary matrix, a column called weight is added to store the importance of the transaction.

In this chapter, we will show you how to use transaction data to support cross-selling campaigns. We will see how the derived user product preferences, or recommendations from the user's product interactions (transactions/weighted transactions), can fuel successful cross-selling campaigns. We will implement and understand the algorithms that can leverage this data in R. We will work on a superficial use case in which we need to generate recommendations to support a cross-selling campaign for an imaginative retailer.

Our web application

Our goal, by the the end of this chapter, is to understand the concepts of association rule mining and related topics, and solve the given cross-selling campaign problem using association rule mining. We will understand how different aspects of the cross-selling campaign problem can be solved using the family of association rule mining algorithms, how to implement them in R, and finally build the web application to display our analysis and results.

We will be following a code-first approach in this book. The style followed throughout this book is to introduce a real-world problem, following which we will briefly introduce the algorithm/technique that can be used to solve this problem. We will keep the algorithm description brief. We will proceed to introduce the R package that implements the algorithm and subsequently start writing the R code to prepare the data in a way that the algorithm expects. As we invoke the actual algorithm in R and explore the results, we will get into the nitty-gritty of the algorithm. Finally, we will provide further references for curious readers.

R Data Analysis Projects

R Data Analysis Projects

Overview of this book

Related Content you might be interested in

Current Title:

R Data Analysis Projects

R Machine Learning Projects

Mastering Machine Learning with R

Mastering Predictive Analytics with R

Understanding the recommender systems

Transactions

Weighted transactions

Our web application