Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce results even after doing the pre-training step for each dataset. #8

Open
rajat-tech-002 opened this issue Feb 4, 2021 · 3 comments

Comments

@rajat-tech-002
Copy link

rajat-tech-002 commented Feb 4, 2021

image
I used the same code and did pre-training for each dataset properly. Also, ran the code for 5 steps and took mean and std as mentioned in the paper.
Unable to reproduce results for Search Snippets and Biomedical Dataset.

@zkharryhhhh
Copy link

image
I used the same code and did pre-training for each dataset properly. Also, ran the code for 5 steps and took mean and std as mentioned in the paper.
Unable to reproduce results for Search Snippets and Biomedical Dataset.

Hello, thanks for your sharing. I have the same problem , the results I reproduces are similar to yours. For Search Snippets, acc is 69.71, nmi is 54.23. For Biomedical, acc is 35.87, nmi is 30.17. I do experiments many times, the results for these two datasets have a slight change ,which are all much worse than the results in paper. Can you sovle the problem now?maybe the settings of hyperparameter? Can you give some advice?

@hadifar
Copy link
Owner

hadifar commented Mar 12, 2021

Thanks for your interest in our paper.
As you already might notice, pre-training autoencoder plays important role in our approach. You can find the pretrain model in repo as well.

@zkharryhhhh
Copy link

Thanks for your interest in our paper.
As you already might notice, pre-training autoencoder plays important role in our approach. You can find the pretrain model in repo as well.

Thanks very much for @hadifar. And I'm sorry that I am late to reply. Your advice on pre-training autoencoder is correct. The pre-training autoencoder is important. I use your model for Stackoverflow and my own pre-trainning autoencoder for Stackoverflow which both get nice results just as your paper. But my problem of reproducing results is for another two datasets, Search Snippets and Biomedical Dataset. In your repo ,there are not pre-training autoencoder for above two datasets. So I use your model and data from xu2017(https://github.com/jacoxu/STC2/tree/master/dataset) just as your paper to get a pre-training autoencoder model. And then I get worse results what I described by using the pre-training model. Now I wonder if the hyperparameter is not proper or some other reasons. Can you give me some advice on the results for the other two datasets? If I express unclearly or the experiments have other settings , is there an email address so that I can contact you? And my email address is zhangkai2020c@iscas.ac.cn. I am trying some things for short text clustering based on your work, and look forward to communicate with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants