Skip to content

Latest commit

 

History

History
executable file
·
24 lines (18 loc) · 1.18 KB

README.md

File metadata and controls

executable file
·
24 lines (18 loc) · 1.18 KB

Created on 2019-08-20

Project description

Implement Newman-Girvan modified modularity algorithm from scratch and apply in transactional data to perform clustering.

Dataset Information

Transactional dataset contains transactions of a retail company in a given period.

  • InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.

  • StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.

  • Quantity: The quantities of each product (item) per transaction. Numeric.

Graph mining techniques

The transactional data will be loaded then treated as bipartite graph with pre-defined sources and targets. The bipartite graph (B) then will be trasformed into weighted undirected graph (G) to be analysed using modified Newman Girvan modularity.

Detected communities after (n) optimal splits having maximum modularity are considered optimal clusters of items (StockCode) and will be compared with the results of other traditional clustering techniques.