-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check needed for NCOL(dtm) <= # of topics #23
Comments
Since you're getting results with some datasets/metrics and not others, I suspect you may have NAs, NANs, NULL, or other non-numeric values in your data that are causing this type of error. If you confirm the data aren't the issue, it would be helpful if you could post the traceback to pinpoint the error. Just a note: if memory serves correctly, the original author wrote this package as a grad school project. I took over as the maintainer while working towards my own graduate degree. I'm out of school now so it's been a while since I've actively worked on the project (hence the delayed response), and there isn't any active development going on. If you're interested in contributing to the project, I'm happy to add you to the repo. Thanks! |
Hello and thank you for your reply. I believe the data is not the issue because (1) only "Arun2010" gives me the error while other metrics return results, and (2) for some "topics" settings, "Arun2010" also gives me the result normally. The following command gives me the error but if I delete ", 80" from the "topics" option, it gives me the result normally.
Anyway, here is the traceback() result:
Any help would be highly appreciated. Thank you. |
What are the dimensions of your input If that's the case, there should be a check to confirm that the number of topics specified in |
Yes, you are absolutely right. The column number is 71 and svd() outputs only 71 singular values. It causes the error. And yes again, that number check should be performed and more human readable error message would be nice. |
Ok, glad we were able to identify the issue. I tagged this as something that needs work. I question whether it ever makes sense to have more topics than terms. My suggestion would be for the check to throw an error if topics > terms, regardless of which algorithm is selected, unless someone can give a good example of why you'd want to have more topics than terms. The error should occur before actual processing begins -- it wouldn't be fun for your processing to run for a few days only to get an error at the end. |
Hmm, it may be possible that term A forms topic Alpha, term B forms topic Beta, and term A & B together form topic Gamma. 2 terms and 3 topics may be possible I think. So it would be fine to raise an error only when users specify "Arun2010". |
Fix for #23 -- thanks @ko-ichi-h !
Hello,
Thank you for developing such a useful software!
When I run FindTopicsNumber(), I can get results normally for some data, but I get the following error for some data.
And here is the R script file that gave me the above error:
ldatuning_error.zip
If I exclude "Arun2010" from "metrics" option, I get results normally without any errors.
My sessionInfo():
I also get the same error with R 3.x.
Best.
The text was updated successfully, but these errors were encountered: