Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC1 and CC2 Benchmarking #2

Open
AtreusCorp opened this issue Feb 14, 2021 · 5 comments
Open

CC1 and CC2 Benchmarking #2

AtreusCorp opened this issue Feb 14, 2021 · 5 comments
Labels
question Further information is requested

Comments

@AtreusCorp
Copy link

AtreusCorp commented Feb 14, 2021

Hi there. I've read the associated paper and am very interested in the methodology. In particular (for a course project) I would like to see if I can improve on your results (even in a small way) but am having trouble seeing how to run the given code on a test set. My questions are as follows:

  1. Does the s2 dataset (for which processing instructions are listed in the README) correspond to the CC2 set from the paper?
  2. Does your code have a quick way of reproducing your test metrics on the imputed data? I would be very happy to implement something like this.
  3. Is there a straightforward path to partitioning the given dataset for train / test purposes? Or perhaps managing 2 datasets, one for train and one for test?

Any clarity here would be very much appreciated. Thanks for the neat paper!

@stefaniaebli
Copy link
Owner

Hi, thank you for your interest and kind words. We would be very interested in seeing our results improved. I will answer your questions below.

  1. The s2 dataset is a very large dataset. The data we work with - CC1 and CC2 - are 2 smaller sampled datasets from the s2 dataset (we described in the appendix of the paper how these 2 datasets have been subsampled). In particular, CC1 and CC2 are coauthorship complexes and you can use the script s2_4_bipartite_to_downsampled.py to obtain coauthorship complexes with the same procedure.

  2. I can add the code we used for reproducing our test metric.

  3. The straightforward way to do it is, as you said, maintaining 2 datasets. For instance we did it using as a train the CC1 dataset and as a test the CC2 dataset. With the script mentioned in point  you can subsample from the s2 dataset many test and train datasets for this purpose.

Let me know if you have any further questions, I will be more than happy to discuss them.

@mdeff mdeff added the question Further information is requested label Mar 25, 2021
@cxw-droid
Copy link

Hi,

Thanks for the interesting paper. Would you mind posting your testing code so possibly the paper results can be reproduced? I looked at the code impute_citations.py and it seems it only trains a model but does not test the accuracy etc..

Thanks.

@AtreusCorp
Copy link
Author

@cxw-droid You might get some utility from my fork. I have implemented some of this, albeit with poor documentation.
https://github.com/AtreusCorp/simplicial_neural_networks

@mdeff
Copy link
Collaborator

mdeff commented Dec 10, 2021

I can add the code we used for reproducing our test metric.

Could we do that @stefaniaebli? It doesn't matter if it's ugly. Anything is better than nothing. :)

@stefaniaebli
Copy link
Owner

Hi, sorry for the late answer! @mdeff sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants