Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalledProcessError #1

Open
sadickam opened this issue Oct 30, 2022 · 1 comment
Open

CalledProcessError #1

sadickam opened this issue Oct 30, 2022 · 1 comment

Comments

@sadickam
Copy link

Hello Team,

Thank you for this repo and the python package.

I am using the python package for topic modelling on twitter data and has my code set up based on your example on medium as follows:

from gdtm.models import TND
# Set these paths to the path where you saved the Mallet implementation of each model, plus bin/mallet
tnd_path = 'C:/Users/sadick/Downloads/topic-noise-models-source-main.zip/mallet-tnd/bin/mallet'

# We pass in the paths to the java code along with the data set and whatever parameters we want to set
model = TND(dataset=dataset, mallet_path=tnd_path, k=30, beta1=25, top_words=20)

topics = model.get_topics()
noise = model.get_noise_distribution()

When I run the code, I get the traceback below:

CalledProcessError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_20604/2788543094.py in
5
6 # We pass in the paths to the java code along with the data set and whatever parameters we want to set
----> 7 model = TND(dataset=dataset, mallet_path=tnd_path, k=30, beta1=25, top_words=20)
8
9 topics = model.get_topics()
~.conda\envs\general\lib\site-packages\gdtm\models\tnd.py in init(self, dataset, k, alpha, beta0, beta1, noise_words_max, iterations, top_words, topic_word_distribution, noise_distribution, corpus, dictionary, mallet_path, random_seed, run, workers)
76 self._prepare_data()
77 if self.noise_distribution is None:
---> 78 self._compute_tnd()
79
80 def _prepare_data(self):
~.conda\envs\general\lib\site-packages\gdtm\models\tnd.py in _compute_tnd(self)
96
97 """
---> 98 model = TNDMallet(self.mallet_path, self.corpus, num_topics=self.k, id2word=self.dictionary,
99 workers=self.workers,
100 alpha=self.alpha, beta=self.beta0, skew=self.beta1,
~.conda\envs\general\lib\site-packages\gdtm\wrappers\tnd.py in init(self, mallet_path, corpus, num_topics, alpha, beta, id2word, workers, prefix, optimize_interval, iterations, topic_threshold, random_seed, noise_words_max, skew, is_parent)
81 self.skew = skew
82 if corpus is not None and not is_parent:
---> 83 self.train(corpus)
84
85
~.conda\envs\general\lib\site-packages\gdtm\wrappers\tnd.py in train(self, corpus)
104
105 """
--> 106 self.convert_input(corpus, infer=False)
107 cmd = self.mallet_path + ' train-topics --input %s --num-topics %s --alpha %s --optimize-interval %s '
108 '--num-threads %s --output-state %s --output-doc-topics %s --output-topic-keys %s '
~.conda\envs\general\lib\site-packages\gdtm\wrappers\base_wrapper.py in convert_input(self, corpus, infer, serialize_corpus)
215 cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet())
216 logger.info("converting temporary corpus to MALLET format with %s", cmd)
--> 217 check_output(args=cmd, shell=True)
218
219 def getitem(self, bow, iterations=100):
~.conda\envs\general\lib\site-packages\gensim\utils.py in check_output(stdout, *popenargs, **kwargs)
1889 error = subprocess.CalledProcessError(retcode, cmd)
1890 error.output = output
-> 1891 raise error
1892 return output
1893 except KeyboardInterrupt:
CalledProcessError: Command 'C:/Users/sadick/Downloads/topic-noise-models-source-main.zip/mallet-tnd/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\sadick\AppData\Local\Temp\750b80_corpus.txt --output C:\Users\sadick\AppData\Local\Temp\750b80_corpus.mallet' returned non-zero exit status 1.

I will be grateful if you would have a look and provide some guidance regarding this issue.

Regards
Sadick

@rchurch4
Copy link
Collaborator

Hi Sadick,

Thanks for trying out our topic models! I am not super familar with Windows, but I do know that CalledProcessErrors usually occur when the environment is misconfigured. To run a Mallet-based model on Windows, I believe you need to point to the .bat file in bin. Check here for what I mean: https://stackoverflow.com/questions/55288724/gensim-mallet-calledprocesserror-returned-non-zero-exit-status

Please let us know whether that works. There is also a common issue of permissions when calling Mallet-based models through a python script, which requires one to reassign the permissions of the Mallet source code wherever it lives on your computer. My bet is that your problem is the former.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants