We provide (1) datasets and source code for (2) benchmark task, (3) downstream task and (4) ablation study of WHATsNET in the paper: Classification of Edge-Dependent Labels of Nodes in Hypergraphs, Minyoung Choe, Sunwoo Kim, Jaemin Yoo, and Kijung Shin, KDD 2023.
(1) Datasets
We provide six real-world datasets for our new benchmark task(/dataset/
) and preprocessing code (/dataset/PreprocessCode/
)
- Co-authorship : DBLP and AMinerAuthor
- Email : Enron and Eu
- StackOverflow: Biology and Physics
# File Organization
|__ hypergraph.txt # used for constructing hypergraph; i-th line indicates i-th hyperedge includes v_1, v_2, ...
|__ hypergraph_pos.txt # used for edge-dependent node labels; i-th line indicates depending on i-th hyperedge v_1's label, v_2's label, ... (same order as hypergraph.txt)
|__ [valid/test]_hindex_0.txt # used for splitting train/valid/test
(2) Benchmark Task
We provide source code for running WHATsNET as well as nine competitors in all the above benchmark datasets
- BaselineU and BaselineP
- HNHN, HGNN, HCHA, HAT, UniGCNII, HNN
- HST, AST
- WHATsNET
(3) Downstream Task
We apply our benchmark task on the following downstream tasks,
- Ranking Aggregation: https://github.com/uthsavc/hypergraph-halo-ranking
- Clustering: https://github.com/pnnl/HyperNetX/blob/master/tutorials/Tutorial%2011%20-%20Laplacians%20and%20Clustering.ipynb
- Product Return Prediction: https://github.com/jianboli/HyperGo
(4) Reproducing ALL results in Paper
- Ablation Studies of WHATsNET
- w/o WithinATT and WithinOrderPE
- WHATsNET-IM
- Positional encodings schemes
- Replacing WithinATT in updating node embeddings
- Number of inducing points
- Types of node centralities
- Visualization of WHATsNET
- Evaluation on Node Label Distribution Preservation of WHATsNET
Before training WHATsNET, calculating node centralities is required
cd preprocess
python nodecentrality.py --algo [degree,kcore,pagerank,eigenvec] --dataname [name of dataset]
You can
- train WHATsNET
- evaluate WHATsNET on JSD of node-level label dist.
- predict edge-dependent node labels by trained WHATsNET
- analysis node embeddings for visualization: concatenated embeddings of a node and hyperedge pair, node embeddings before/after WithinATT
by following below code,
python train.py/evaluate.py/predict/analysis.py --vorder_input "degree_nodecentrality,eigenvec_nodecentrality,pagerank_nodecentrality,kcore_nodecentrality"
--embedder whatsnet --att_type_v OrderPE --agg_type_v PrevQ --att_type_e OrderPE --agg_type_e PrevQ
--dataset_name [name for dataset]
--num_att_layer [number of layers in WithinATT]
--num_layers [number of layers]
--bs [batch size]
--lr [learning rate]
--sampling [size of sampling incident hyperedges in aggregation at nodes]
[--analyze_att when running analysis.py]
--scorer sm --scorer_num_layers 1 --dropout 0.7 --optimizer "adam" --k 0 --gamma 0.99 --dim_hidden 64 --dim_edge 128 --dim_vertex 128 --epochs 100 --test_epoch 5
You can run all ten models for each dataset(DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis) by
cd run
./run_[DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis].sh
We set hyperparameters of each model chosen by the best mean of Micro-F1 and Macro-F1 from the search space
We provide edge-dependent node labels predicted by WHATsNET as well as AST and HST in train_results/
We also provide shell scripts for all-in-one process (train, predict and evaluate on the downstream task) in run/DownstreamTask/
You can run three downstream tasks with WHATsNET and baselines by
- Ranking Aggregation: In the
RankingAggregation
directory, runranking.py
for Halo2 game dataset and runaminer_ranking.py
for AMiner dataset with author H-index - Clustering: In the
Clustering
directory, runclustering.py
for DBLP and runclustering_aminer.py
for AMiner - Product Return Prediction: In the
ProductReturnPred
directory, make synthetic dataset bymakedata/Simulate data.ipynb
and prepare dataset for training models by our benchmark task throughmakedata/MakeHypergraph.ipynb
. After training models, runmakedata/prepare_predicted.py
and evaluate them byscript/main_prod.py
You can also run all ablation studies of WHATsNET by
cd run
./run_ablation.sh
./run_ablation_centrality.sh
The environment of running codes is specified in requirements.txt
Additionally, install required libraries following install.sh