Excel was used to analyze the dataset and knime workflow that is below was created to study the frequent item sets and association rules.
With this analysis i want to answer several question to help understand the supermarket sales performance.
1 - Study of the variation of the number of frequent item sets (free, closed and maximal) varying the support. 2 - Study of the variation of the number of association rules fixing the support. 3 - Selection of one of the rules with largest lift. Justify the value of the lift. 4 - What is the conviction of the rule used in 3? 5 - Are there differences among the frequent item sets in the beginning of the month and end of month?
Therefore, with this understanding of the data i tried to answer the questions below.
1. Study of the variation of the number of frequent item sets (free, closed and maximal) varying the support.
2. Study of the variation of the number of association rules fixing the support.
3. Selection of one of the rules with largest lift. Justify the value of the lift. Considering the figure below i choose the lift (beef -> root vegetables).
4. What is the conviction of the rule used in 3? The conviction is calculated consider the function,
In this example the sup(root vegetable) is 0.1089 and conf(beef->root vegetable) is 0.331.
5. Are there differences among the frequent item sets in the beginning of the month and end of month?
Small basket – basket less than or equal than 7 items Large basket – the others Event - Assuming that the increase in “larger baskets” is related to some event, we analyze these days separately End of month – the days when there is an increase in the number of baskets, subset day 21 to day 28. Beginning of month - we have two branches: 1) to maintain consistency in relation to the weekly cycle at the end of the month, day 05 to day 12. 2) ignore the weekly cycle, subset day 01 to day 08. Note: For this we used a support equal to 0,01.
Considering the figure below with a top 15 of most frequent item sets, my conclusions are:
This is expected to happen once small baskets represent more than 80% of sales. [Pastry] has a strong presence in the small baskets, which is reflected in the subsets beginning and end of the month. [Canned beer] has a strong presence in the small baskets, and more influence in beginning of month. [newspaper] are present in the “small basket”, however their importance is diluted in the total of transactions, being only relevant in the subsets: event and the end of the month.
© 2024 Victor Malheiro