library(arules)
library(arulesViz)
library(DT)
Load the grocery data into a sparse matrix.
groceries <- read.transactions("groceries.csv", sep = ",")
summary(groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda yogurt (Other)
2513 1903 1809 1715 1372 34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46 29 14 14 9
21 22 23 24 26 27 28 29 32
11 4 6 1 1 1 1 3 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000
includes extended item information - examples:
labels
1 abrasive cleaner
2 artif. sweetener
3 baby cosmetics
Look at the first five transactions.
inspect(groceries[1:5])
items
[1] {citrus fruit,margarine,ready soups,semi-finished bread}
[2] {coffee,tropical fruit,yogurt}
[3] {whole milk}
[4] {cream cheese,meat spreads,pip fruit,yogurt}
[5] {condensed milk,long life bakery product,other vegetables,whole milk}
Examine the frequency of items.
itemFrequency(groceries[, 1:3])
abrasive cleaner artif. sweetener baby cosmetics
0.0035587189 0.0032536858 0.0006100661
plot the frequency of items
itemFrequencyPlot(groceries, support = 0.1)
itemFrequencyPlot(groceries, topN = 20)
A visualization of the sparse matrix for the first five transactions.
image(groceries[1:5])
Visualization of a random sample of 100 transactions.
image(sample(groceries, 100))
Default settings result in zero rules learned. See that no rules are produced.
apriori(groceries)
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.8 0.1 1 none FALSE TRUE 5 0.1 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 983
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [8 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
set of 0 rules
Set better support and confidence levels to learn more rules.
groceryrules <- apriori(groceries, parameter = list(support =
0.006, confidence = 0.25, minlen = 2))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.25 0.1 1 none FALSE TRUE 5 0.006 2 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 59
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [109 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [463 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
groceryrules
set of 463 rules
Summary of grocery association rules.
summary(groceryrules)
set of 463 rules
rule length distribution (lhs + rhs):sizes
2 3 4
150 297 16
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 2.000 3.000 2.711 3.000 4.000
summary of quality measures:
support confidence lift count
Min. :0.006101 Min. :0.2500 Min. :0.9932 Min. : 60.0
1st Qu.:0.007117 1st Qu.:0.2971 1st Qu.:1.6229 1st Qu.: 70.0
Median :0.008744 Median :0.3554 Median :1.9332 Median : 86.0
Mean :0.011539 Mean :0.3786 Mean :2.0351 Mean :113.5
3rd Qu.:0.012303 3rd Qu.:0.4495 3rd Qu.:2.3565 3rd Qu.:121.0
Max. :0.074835 Max. :0.6600 Max. :3.9565 Max. :736.0
mining info:
data ntransactions support confidence
groceries 9835 0.006 0.25
Look at the first three rules.
inspect(groceryrules[1:10])
lhs rhs support confidence lift count
[1] {potted plants} => {whole milk} 0.006914082 0.4000000 1.565460 68
[2] {pasta} => {whole milk} 0.006100661 0.4054054 1.586614 60
[3] {herbs} => {root vegetables} 0.007015760 0.4312500 3.956477 69
[4] {herbs} => {other vegetables} 0.007727504 0.4750000 2.454874 76
[5] {herbs} => {whole milk} 0.007727504 0.4750000 1.858983 76
[6] {processed cheese} => {whole milk} 0.007015760 0.4233129 1.656698 69
[7] {semi-finished bread} => {whole milk} 0.007117438 0.4022989 1.574457 70
[8] {beverages} => {whole milk} 0.006812405 0.2617188 1.024275 67
[9] {detergent} => {other vegetables} 0.006405694 0.3333333 1.722719 63
[10] {detergent} => {whole milk} 0.008947636 0.4656085 1.822228 88
Sort rules by support.
top.support <- sort(groceryrules, decreasing = TRUE, na.last = NA, by = "support")
inspect(head(top.support, 10))
lhs rhs support confidence lift count
[1] {other vegetables} => {whole milk} 0.07483477 0.3867578 1.513634 736
[2] {whole milk} => {other vegetables} 0.07483477 0.2928770 1.513634 736
[3] {rolls/buns} => {whole milk} 0.05663447 0.3079049 1.205032 557
[4] {yogurt} => {whole milk} 0.05602440 0.4016035 1.571735 551
[5] {root vegetables} => {whole milk} 0.04890696 0.4486940 1.756031 481
[6] {root vegetables} => {other vegetables} 0.04738180 0.4347015 2.246605 466
[7] {yogurt} => {other vegetables} 0.04341637 0.3112245 1.608457 427
[8] {tropical fruit} => {whole milk} 0.04229792 0.4031008 1.577595 416
[9] {tropical fruit} => {other vegetables} 0.03589222 0.3420543 1.767790 353
[10] {bottled water} => {whole milk} 0.03436706 0.3109476 1.216940 338
Sort rules by confidence.
top.confidence <- sort(groceryrules, decreasing = TRUE, na.last = NA, by = "confidence")
inspect(head(top.confidence, 10))
lhs rhs support confidence lift
[1] {butter,whipped/sour cream} => {whole milk} 0.006710727 0.6600000 2.583008
[2] {butter,yogurt} => {whole milk} 0.009354347 0.6388889 2.500387
[3] {butter,root vegetables} => {whole milk} 0.008235892 0.6377953 2.496107
[4] {curd,tropical fruit} => {whole milk} 0.006507372 0.6336634 2.479936
[5] {butter,tropical fruit} => {whole milk} 0.006202339 0.6224490 2.436047
[6] {other vegetables,tropical fruit,yogurt} => {whole milk} 0.007625826 0.6198347 2.425816
[7] {domestic eggs,tropical fruit} => {whole milk} 0.006914082 0.6071429 2.376144
[8] {other vegetables,root vegetables,yogurt} => {whole milk} 0.007829181 0.6062992 2.372842
[9] {domestic eggs,root vegetables} => {whole milk} 0.008540925 0.5957447 2.331536
[10] {citrus fruit,root vegetables} => {other vegetables} 0.010371124 0.5862069 3.029608
count
[1] 66
[2] 92
[3] 81
[4] 64
[5] 61
[6] 75
[7] 68
[8] 77
[9] 84
[10] 102
Sort rules by lift.
top.lift <- sort(groceryrules, decreasing = TRUE, na.last = NA, by = "lift")
inspect(head(top.lift, 10))
lhs rhs support confidence
[1] {herbs} => {root vegetables} 0.007015760 0.4312500
[2] {berries} => {whipped/sour cream} 0.009049314 0.2721713
[3] {other vegetables,tropical fruit,whole milk} => {root vegetables} 0.007015760 0.4107143
[4] {beef,other vegetables} => {root vegetables} 0.007930859 0.4020619
[5] {other vegetables,tropical fruit} => {pip fruit} 0.009456024 0.2634561
[6] {beef,whole milk} => {root vegetables} 0.008032537 0.3779904
[7] {other vegetables,pip fruit} => {tropical fruit} 0.009456024 0.3618677
[8] {pip fruit,yogurt} => {tropical fruit} 0.006405694 0.3559322
[9] {citrus fruit,other vegetables} => {root vegetables} 0.010371124 0.3591549
[10] {other vegetables,whole milk,yogurt} => {tropical fruit} 0.007625826 0.3424658
lift count
[1] 3.956477 69
[2] 3.796886 89
[3] 3.768074 69
[4] 3.688692 78
[5] 3.482649 93
[6] 3.467851 79
[7] 3.448613 93
[8] 3.392048 63
[9] 3.295045 102
[10] 3.263712 75
With a data.table
inspectDT(groceryrules)
save table as a html page.
p <- inspectDT(groceryrules)
htmlwidgets::saveWidget(p, "arules.html", selfcontained = FALSE)
browseURL("arules.html")
Read about the arulesViz package arulesViz.
Plot support and confidence and support and lift.
plot(groceryrules)
plot(groceryrules, measure = c("support", "lift"), shading = "confidence")
plot(groceryrules, method = "two-key plot")
subrules <- groceryrules[quality(groceryrules)$confidence > 0.5]
plot(subrules, method = "matrix", measure = "lift")
Itemsets in Antecedent (LHS)
[1] "{root vegetables,tropical fruit,whole milk}"
[2] "{onions,whole milk}"
[3] "{root vegetables,whole milk,yogurt}"
[4] "{root vegetables,shopping bags}"
[5] "{pork,root vegetables}"
[6] "{root vegetables,tropical fruit}"
[7] "{tropical fruit,whole milk,yogurt}"
[8] "{tropical fruit,whipped/sour cream}"
[9] "{butter,whipped/sour cream}"
[10] "{butter,root vegetables}"
[11] "{citrus fruit,root vegetables}"
[12] "{butter,yogurt}"
[13] "{domestic eggs,root vegetables}"
[14] "{fruit/vegetable juice,root vegetables}"
[15] "{curd,tropical fruit}"
[16] "{pip fruit,root vegetables}"
[17] "{butter,tropical fruit}"
[18] "{other vegetables,tropical fruit,yogurt}"
[19] "{frozen vegetables,root vegetables}"
[20] "{domestic eggs,tropical fruit}"
[21] "{other vegetables,root vegetables,yogurt}"
[22] "{rolls/buns,root vegetables}"
[23] "{other vegetables,sugar}"
[24] "{curd,yogurt}"
[25] "{citrus fruit,whipped/sour cream}"
[26] "{curd,other vegetables}"
[27] "{butter,other vegetables}"
[28] "{other vegetables,root vegetables,tropical fruit}"
[29] "{curd,root vegetables}"
[30] "{root vegetables,yogurt}"
[31] "{frankfurter,yogurt}"
[32] "{root vegetables,whipped/sour cream}"
[33] "{domestic eggs,other vegetables}"
[34] "{pork,rolls/buns}"
[35] "{frozen vegetables,other vegetables}"
[36] "{domestic eggs,yogurt}"
[37] "{margarine,rolls/buns}"
[38] "{rolls/buns,whipped/sour cream}"
[39] "{cream cheese,yogurt}"
[40] "{pip fruit,yogurt}"
[41] "{whipped/sour cream,yogurt}"
[42] "{baking powder}"
[43] "{beef,yogurt}"
[44] "{sausage,tropical fruit}"
[45] "{other vegetables,pip fruit}"
[46] "{tropical fruit,yogurt}"
[47] "{pastry,yogurt}"
[48] "{root vegetables,sausage}"
[49] "{other vegetables,yogurt}"
[50] "{other vegetables,rolls/buns,root vegetables}"
[51] "{pastry,tropical fruit}"
[52] "{other vegetables,whipped/sour cream}"
[53] "{fruit/vegetable juice,yogurt}"
Itemsets in Consequent (RHS)
[1] "{whole milk}" "{other vegetables}"
plot(subrules, method = "matrix3D", measure = "lift")
method 'matrix3D' is deprecated use method 'matrix' with engine '3d'
Itemsets in Antecedent (LHS)
[1] "{root vegetables,tropical fruit,whole milk}"
[2] "{onions,whole milk}"
[3] "{root vegetables,whole milk,yogurt}"
[4] "{root vegetables,shopping bags}"
[5] "{pork,root vegetables}"
[6] "{root vegetables,tropical fruit}"
[7] "{tropical fruit,whole milk,yogurt}"
[8] "{tropical fruit,whipped/sour cream}"
[9] "{butter,whipped/sour cream}"
[10] "{butter,root vegetables}"
[11] "{citrus fruit,root vegetables}"
[12] "{butter,yogurt}"
[13] "{domestic eggs,root vegetables}"
[14] "{fruit/vegetable juice,root vegetables}"
[15] "{curd,tropical fruit}"
[16] "{pip fruit,root vegetables}"
[17] "{butter,tropical fruit}"
[18] "{other vegetables,tropical fruit,yogurt}"
[19] "{frozen vegetables,root vegetables}"
[20] "{domestic eggs,tropical fruit}"
[21] "{other vegetables,root vegetables,yogurt}"
[22] "{rolls/buns,root vegetables}"
[23] "{other vegetables,sugar}"
[24] "{curd,yogurt}"
[25] "{citrus fruit,whipped/sour cream}"
[26] "{curd,other vegetables}"
[27] "{butter,other vegetables}"
[28] "{other vegetables,root vegetables,tropical fruit}"
[29] "{curd,root vegetables}"
[30] "{root vegetables,yogurt}"
[31] "{frankfurter,yogurt}"
[32] "{root vegetables,whipped/sour cream}"
[33] "{domestic eggs,other vegetables}"
[34] "{pork,rolls/buns}"
[35] "{frozen vegetables,other vegetables}"
[36] "{domestic eggs,yogurt}"
[37] "{margarine,rolls/buns}"
[38] "{rolls/buns,whipped/sour cream}"
[39] "{cream cheese,yogurt}"
[40] "{pip fruit,yogurt}"
[41] "{whipped/sour cream,yogurt}"
[42] "{baking powder}"
[43] "{beef,yogurt}"
[44] "{sausage,tropical fruit}"
[45] "{other vegetables,pip fruit}"
[46] "{tropical fruit,yogurt}"
[47] "{pastry,yogurt}"
[48] "{root vegetables,sausage}"
[49] "{other vegetables,yogurt}"
[50] "{other vegetables,rolls/buns,root vegetables}"
[51] "{pastry,tropical fruit}"
[52] "{other vegetables,whipped/sour cream}"
[53] "{fruit/vegetable juice,yogurt}"
Itemsets in Consequent (RHS)
[1] "{whole milk}" "{other vegetables}"
plot(groceryrules, method = "grouped")
Network plot and Parallel Coodinates plot.
subrules2 <- head(groceryrules, n = 50, by = "lift")
plot(subrules2, method = "graph")
plot(subrules2, method = "paracoord")
oneRule <- sample(groceryrules, 1)
inspect(oneRule)
lhs rhs support confidence lift count
[1] {newspapers} => {whole milk} 0.0273513 0.3426752 1.34111 269
plot(oneRule, method = "doubledecker", data = groceries)
Sorting grocery rules by lift.
inspect(sort(groceryrules, by = "lift")[1:5])
lhs rhs support confidence
[1] {herbs} => {root vegetables} 0.007015760 0.4312500
[2] {berries} => {whipped/sour cream} 0.009049314 0.2721713
[3] {other vegetables,tropical fruit,whole milk} => {root vegetables} 0.007015760 0.4107143
[4] {beef,other vegetables} => {root vegetables} 0.007930859 0.4020619
[5] {other vegetables,tropical fruit} => {pip fruit} 0.009456024 0.2634561
lift count
[1] 3.956477 69
[2] 3.796886 89
[3] 3.768074 69
[4] 3.688692 78
[5] 3.482649 93
Finding subsets of rules containing any berry items.
berryrules <- subset(groceryrules, items %in% "berries")
inspect(berryrules)
lhs rhs support confidence lift count
[1] {berries} => {whipped/sour cream} 0.009049314 0.2721713 3.796886 89
[2] {berries} => {yogurt} 0.010574479 0.3180428 2.279848 104
[3] {berries} => {other vegetables} 0.010269446 0.3088685 1.596280 101
[4] {berries} => {whole milk} 0.011794611 0.3547401 1.388328 116
plot(berryrules, method = "graph")
plot(berryrules, method = "paracoord")
number of rows of result is not a multiple of vector length (arg 2)
Writing the rules to a CSV file.
write(groceryrules, file = "groceryrules.csv",
sep = ",", quote = TRUE, row.names = FALSE)
Converting the rule set to a data frame.
groceryrules_df <- as(groceryrules, "data.frame")
groceryrules_df
str(groceryrules_df)
'data.frame': 463 obs. of 5 variables:
$ rules : Factor w/ 463 levels "{baking powder} => {other vegetables}",..: 340 302 207 206 208 341 402 21 139 140 ...
$ support : num 0.00691 0.0061 0.00702 0.00773 0.00773 ...
$ confidence: num 0.4 0.405 0.431 0.475 0.475 ...
$ lift : num 1.57 1.59 3.96 2.45 1.86 ...
$ count : num 68 60 69 76 76 69 70 67 63 88 ...