-
Book Overview & Buying
-
Table Of Contents
R for Data Science
By :
Association rules describe associations between two datasets. This is most commonly used in market basket analysis. Given a set of transactions with multiple, different items per transaction (shopping bag), how can the item sales be associated? The most common associations are as follows:
The most widely used tool in R from association rules is apriori.
The apriori rules library can be called as follows:
apriori(data, parameter = NULL, appearance = NULL, control = NULL)
The various parameters of the apriori library are explained in the following table:
|
Parameter |
Description |
|---|---|
|
|
This is the transaction data. |
|
|
This stores the default behavior to mine, with |
|
|
This is used to restrict items that appear in rules. |
|
|
This is used to adjust the performance of the algorithm used. |
You will need to load the apriori rules library as follows:
> install.packages("arules")
> library(arules)The market basket data can be loaded as follows:
> data <- read.csv("http://www.salemmarafi.com/wp-content/uploads/2014/03/groceries.csv")Then, we can generate rules from the data as follows:
> rules <- apriori(data)
parameter specification:
confidenceminvalsmaxaremavaloriginalSupport support minlenmaxlen target
0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules
ext
FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[655 item(s), 15295 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [5 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].There are several points to highlight in the results:
We can examine the rules that were generated as follows:
> rules
set of 5 rules
> inspect(rules)
lhsrhs support confidence lift
1 {semi.finished.bread=} => {margarine=} 0.2278522 1 2.501226
2 {semi.finished.bread=} => {ready.soups=} 0.2278522 1 1.861385
3 {margarine=} => {ready.soups=} 0.3998039 1 1.861385
4 {semi.finished.bread=,
margarine=} => {ready.soups=} 0.2278522 1 1.861385
5 {semi.finished.bread=,
ready.soups=} => {margarine=} 0.2278522 1 2.501226The code has been slightly reformatted for readability.
Looking over the rules, there is a clear connection between buying bread, soup, and margarine—at least in the market where and when the data was gathered.
If we change the parameters (thresholds) used in the calculation, we get a different set of rules. For example, check the following code:
> rules <- apriori(data, parameter = list(supp = 0.001, conf = 0.8))
This code generates over 500 rules, but they have questionable meaning as we now have the rules with 0.001 confidence.
Change the font size
Change margin width
Change background colour