Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying R for Data Science
  • Table Of Contents Toc
R for Data Science

R for Data Science

By : Toomey
3.4 (5)
close
close
R for Data Science

R for Data Science

3.4 (5)
By: Toomey

Overview of this book

If you are a data analyst who has a firm grip on some advanced data analysis techniques and wants to learn how to leverage the features of R, this is the book for you. You should have some basic knowledge of the R language and should know about some data science topics.
Table of Contents (14 chapters)
close
close

Association rules

Association rules describe associations between two datasets. This is most commonly used in market basket analysis. Given a set of transactions with multiple, different items per transaction (shopping bag), how can the item sales be associated? The most common associations are as follows:

  • Support: This is the percentage of transactions that contain A and B.
  • Confidence: This is the percentage (of time that rule is correct) of cases containing A that also contain B.
  • Lift: This is the ratio of confidence to the percentage of cases containing B. Please note that if lift is 1, then A and B are independent.

Mine for associations

The most widely used tool in R from association rules is apriori.

Usage

The apriori rules library can be called as follows:

apriori(data, parameter = NULL, appearance = NULL, control = NULL)

The various parameters of the apriori library are explained in the following table:

Parameter

Description

data

This is the transaction data.

parameter

This stores the default behavior to mine, with support as 0.1, confidence as 0.8, and maxlen as 10. You can change parameter values accordingly.

appearance

This is used to restrict items that appear in rules.

control

This is used to adjust the performance of the algorithm used.

Example

You will need to load the apriori rules library as follows:

> install.packages("arules")
> library(arules)

The market basket data can be loaded as follows:

> data <- read.csv("http://www.salemmarafi.com/wp-content/uploads/2014/03/groceries.csv")

Then, we can generate rules from the data as follows:

> rules <- apriori(data) 

parameter specification:
confidenceminvalsmaxaremavaloriginalSupport support minlenmaxlen target
        0.8    0.1    1 none FALSE            TRUE     0.1      1     10  rules
   ext
 FALSE

algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[655 item(s), 15295 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [5 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

There are several points to highlight in the results:

  • As you can see from the display, we are using the default settings (confidence 0.8, and so on)
  • We found 15,000 transactions for three items (picked from the 655 total items available)
  • We generated five rules

We can examine the rules that were generated as follows:

> rules

set of 5 rules 
> inspect(rules)

lhsrhs              support confidence     lift
1 {semi.finished.bread=} => {margarine=}   0.2278522          1 2.501226
2 {semi.finished.bread=} => {ready.soups=} 0.2278522          1 1.861385
3 {margarine=}           => {ready.soups=} 0.3998039          1 1.861385
4 {semi.finished.bread=,                                                
   margarine=}           => {ready.soups=} 0.2278522          1 1.861385
5 {semi.finished.bread=,                                                
   ready.soups=}         => {margarine=}   0.2278522          1 2.501226

The code has been slightly reformatted for readability.

Looking over the rules, there is a clear connection between buying bread, soup, and margarine—at least in the market where and when the data was gathered.

If we change the parameters (thresholds) used in the calculation, we get a different set of rules. For example, check the following code:

> rules <- apriori(data, parameter = list(supp = 0.001, conf = 0.8))

This code generates over 500 rules, but they have questionable meaning as we now have the rules with 0.001 confidence.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
R for Data Science
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon