Wrapping up
The final step in any data analysis project is documentation—either generating a report of the findings or documenting the scripts and data used. In our case, we are going to wrap up with a small application. We will use RShiny
, an R web application framework. RShiny
is a powerful framework for developing interactive web applications using R. We will leverage the code that we have written to generate a simple, yet powerful, user interface for our retail customers.
To keep things simple, we have a set of three screens. The first screen, as shown in the following screenshot, allows the user to vary support and confidence thresholds and view the rules generated. It also has additional interest measures, lift, conviction, and leverage. The user can sort the rule by any of these interest measures:
Another screen is a scatter plot representation of the rules:
Finally, a graph representation to view the product grouping for the easy selection of products for the cross-selling campaign is as follows:
The complete source code is available in <../App.R>
:
######################################################################## # # R Data Analysis Projects # # Chapter 1 # # Building Recommender System # A step step approach to build Association Rule Mining # # Script: # # Rshiny app # # Gopi Subramanian #########################################################################
library(shiny) library(plotly) library(arules) library(igraph) library(arulesViz)
get.txn <- function(data.path, columns){ # Get transaction object for a given data file # # Args: # data.path: data file name location # columns: transaction id and item id columns. # # Returns: # transaction object transactions.obj <- read.transactions(file = data.path, format = "single", sep = ",", cols = columns, rm.duplicates = FALSE, quote = "", skip = 0, encoding = "unknown") return(transactions.obj) }
get.rules <- function(support, confidence, transactions){ # Get Apriori rules for given support and confidence values # # Args: # support: support parameter # confidence: confidence parameter # # Returns: # rules object parameters = list( support = support, confidence = confidence, minlen = 2, # Minimal number of items per item set maxlen = 10, # Maximal number of items per item set target = "rules" ) rules <- apriori(transactions, parameter = parameters) return(rules) }
find.rules <- function(transactions, support, confidence, topN = 10){ # Generate and prune the rules for given support confidence value # # Args: # transactions: Transaction object, list of transactions # support: Minimum support threshold # confidence: Minimum confidence threshold # Returns: # A data frame with the best set of rules and their support and confidence values # Get rules for given combination of support and confidence all.rules <- get.rules(support, confidence, transactions) rules.df <-data.frame(rules = labels(all.rules) , all.rules@quality) other.im <- interestMeasure(all.rules, transactions = transactions) rules.df <- cbind(rules.df, other.im[,c('conviction','leverage')]) # Keep the best rule based on the interest measure best.rules.df <- head(rules.df[order(-rules.df$leverage),],topN) return(best.rules.df) }
plot.graph <- function(cross.sell.rules){ # Plot the associated items as graph # # Args: # cross.sell.rules: Set of final rules recommended # Returns: # None edges <- unlist(lapply(cross.sell.rules['rules'], strsplit, split='=>')) g <- graph(edges = edges) return(g) }
columns <- c("order_id", "product_id") ## columns of interest in data file data.path = '../../data/data.csv' ## Path to data file transactions.obj <- get.txn(data.path, columns) ## create txn object
server <- function(input, output) {
cross.sell.rules <- reactive({ support <- input$Support confidence <- input$Confidence cross.sell.rules <- find.rules( transactions.obj, support, confidence ) cross.sell.rules$rules <- as.character(cross.sell.rules$rules) return(cross.sell.rules) }) gen.rules <- reactive({ support <- input$Support confidence <- input$Confidence gen.rules <- get.rules( support, confidence ,transactions.obj) return(gen.rules) }) output$rulesTable <- DT::renderDataTable({ cross.sell.rules() }) output$graphPlot <- renderPlot({ g <-plot.graph(cross.sell.rules()) plot(g) }) output$explorePlot <- renderPlot({ plot(x = gen.rules(), method = NULL, measure = "support", shading = "lift", interactive = FALSE) }) }
ui <- fluidPage( headerPanel(title = "X-Sell Recommendations"), sidebarLayout( sidebarPanel( sliderInput("Support", "Support threshold:", min = 0.01, max = 1.0, value = 0.01), sliderInput("Confidence", "Support threshold:", min = 0.05, max = 1.0, value = 0.05) ), mainPanel( tabsetPanel( id = 'xsell', tabPanel('Rules', DT::dataTableOutput('rulesTable')), tabPanel('Explore', plotOutput('explorePlot')), tabPanel('Item Groups', plotOutput('graphPlot')) ) ) ) )
shinyApp(ui = ui, server = server)
We have described the get.txn
, get.rules
, and find.rules
functions in the previous section. We will not go through them again here. The preceding code is a single page RShiny
app code; both the server and the UI component reside in the same file.
The UI component is as follows:
ui <- fluidPage( headerPanel(title = "X-Sell Recommendations"), sidebarLayout( sidebarPanel( sliderInput("Support", "Support threshold:", min = 0.01, max = 1.0, value = 0.01), sliderInput("Confidence", "Support threshold:", min = 0.05, max = 1.0, value = 0.05) ), mainPanel( tabsetPanel( id = 'xsell', tabPanel('Rules', DT::dataTableOutput('rulesTable')), tabPanel('Explore', plotOutput('explorePlot')), tabPanel('Item Groups', plotOutput('graphPlot')) ) ) ) )
We define the screen layout in this section. This section can also be kept in a separate file called UI.R
. The page is defined by two sections, a panel in the left, defined by sidebarPanel
, and a main section defined under mainPanel
. Inside the side bar, we have defined two slider controls for the support and confidence thresholds respectively. The main panel contains a tab-separated window, defined by tabPanel
.
The main panel has three tabs; each tab has a slot defined for the final set of rules, with their interest measures, a scatter plot for the rules, and finally the graph plot of the rules.
The server component is as follows:
server <- function(input, output) {
cross.sell.rules <- reactive({ support <- input$Support confidence <- input$Confidence cross.sell.rules <- find.rules( transactions.obj, support, confidence ) cross.sell.rules$rules <- as.character(cross.sell.rules$rules) return(cross.sell.rules) })
The cross.sell.rules
data frame is defined as a reactive component. When the values of the support and confidence thresholds change in the UI, cross.sell.rules
data frame will be recomputed. This frame will be served to the first page, where we have defined a slot for this table, called rulesTable
:
gen.rules <- reactive({ support <- input$Support confidence <- input$Confidence gen.rules <- get.rules( support, confidence ,transactions.obj) return(gen.rules) })
This reactive component retrieves the calculations and returns the rules object every time the support or/and confidence threshold is changed by the user in the UI:
output$rulesTable <- DT::renderDataTable({ cross.sell.rules() })
The preceding code renders the data frame back to the UI:
output$graphPlot <- renderPlot({ g <-plot.graph(cross.sell.rules()) plot(g) }) output$explorePlot <- renderPlot({ plot(x = gen.rules(), method = NULL, measure = "support", shading = "lift", interactive = FALSE) }) }
The preceding two pieces of code render the plot back to the UI.