I used supermarket receipts to analyze associations (support, confidence, lift, etc.) between consumer goods. Instead of employing Apriori Algorithm - the traditional solution - I used a simpler method (originally proposed by my friend Tianye Song) which generates the same results within a shorter execution time. Loops are avoided so it took only 0.04 second to analyze 9995 receipts.
As Usman Malik wrote, "Association rule mining is a technique to identify underlying relations between different items. Take an example of a Super Market where customers can buy variety of items. Usually, there is a pattern in what the customers buy. For instance, mothers with babies buy baby products such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers and chips etc. In short, transactions involve a pattern. More profit can be generated if the relationship between the items purchased in different transactions can be identified."
Sample - SuperStore.xls is my raw data. It contains 9995 receipts; each has columns such as Row ID
, Order ID
, Order Date
, Ship Date
, Ship Mode
, Customer ID
, Customer Name
, Segment
, Country
, City
, State
, Postal Code
, Region
, Product ID
, Category
, Sub-Category
, Product Name
, Sales
, Quantity
, Discount
, Profit
.
The code is detailedly commented so each step should be easy to understand. As you can see, this optimized method does not contain any loop (my first version had 3 loops and 2 ifs) so it runs much faster.
Demo results can be seen in my code file. The file includes XID
, YID
, n_XY
, n_X
, n_Y
, n_Transactions
, Support
, Confidence
, and Lift
. Click here to download the complete excel.
If you would like to know more about association rule mining, here are some suggested sources.