Unable to reproduce results even after doing the pre-training step for each dataset. #8

rajat-tech-002 · 2021-02-04T09:22:17Z

I used the same code and did pre-training for each dataset properly. Also, ran the code for 5 steps and took mean and std as mentioned in the paper.
Unable to reproduce results for Search Snippets and Biomedical Dataset.

zkharryhhhh · 2021-03-12T06:14:11Z

I used the same code and did pre-training for each dataset properly. Also, ran the code for 5 steps and took mean and std as mentioned in the paper.
Unable to reproduce results for Search Snippets and Biomedical Dataset.

Hello, thanks for your sharing. I have the same problem , the results I reproduces are similar to yours. For Search Snippets, acc is 69.71, nmi is 54.23. For Biomedical, acc is 35.87, nmi is 30.17. I do experiments many times, the results for these two datasets have a slight change ,which are all much worse than the results in paper. Can you sovle the problem now?maybe the settings of hyperparameter? Can you give some advice?

hadifar · 2021-03-12T06:47:39Z

Thanks for your interest in our paper.
As you already might notice, pre-training autoencoder plays important role in our approach. You can find the pretrain model in repo as well.

zkharryhhhh · 2021-03-12T09:06:44Z

Thanks for your interest in our paper.
As you already might notice, pre-training autoencoder plays important role in our approach. You can find the pretrain model in repo as well.

Thanks very much for @hadifar. And I'm sorry that I am late to reply. Your advice on pre-training autoencoder is correct. The pre-training autoencoder is important. I use your model for Stackoverflow and my own pre-trainning autoencoder for Stackoverflow which both get nice results just as your paper. But my problem of reproducing results is for another two datasets, Search Snippets and Biomedical Dataset. In your repo ,there are not pre-training autoencoder for above two datasets. So I use your model and data from xu2017(https://github.com/jacoxu/STC2/tree/master/dataset) just as your paper to get a pre-training autoencoder model. And then I get worse results what I described by using the pre-training model. Now I wonder if the hyperparameter is not proper or some other reasons. Can you give me some advice on the results for the other two datasets? If I express unclearly or the experiments have other settings , is there an email address so that I can contact you? And my email address is zhangkai2020c@iscas.ac.cn. I am trying some things for short text clustering based on your work, and look forward to communicate with you.

rajat-tech-002 mentioned this issue Feb 4, 2021

How did you reproduce the results for the Biomedical and Search Snippets Dataset for SIF-Auto in your paper? rashadulrakib/short-text-clustering-enhancement#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce results even after doing the pre-training step for each dataset. #8

Unable to reproduce results even after doing the pre-training step for each dataset. #8

rajat-tech-002 commented Feb 4, 2021 •

edited

Loading

zkharryhhhh commented Mar 12, 2021

hadifar commented Mar 12, 2021

zkharryhhhh commented Mar 12, 2021

Unable to reproduce results even after doing the pre-training step for each dataset. #8

Unable to reproduce results even after doing the pre-training step for each dataset. #8

Comments

rajat-tech-002 commented Feb 4, 2021 • edited Loading

zkharryhhhh commented Mar 12, 2021

hadifar commented Mar 12, 2021

zkharryhhhh commented Mar 12, 2021

rajat-tech-002 commented Feb 4, 2021 •

edited

Loading