Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deveaud 2014 correctly implemented? #27

Open
internaut opened this issue Mar 18, 2022 · 0 comments
Open

Deveaud 2014 correctly implemented? #27

internaut opened this issue Mar 18, 2022 · 0 comments

Comments

@internaut
Copy link

My understanding of the metric introduced by Deveaud et al. 2014 (section 3.2) differs from how it is implemented in the ldatuning package. However, I can't tell if my understanding is correct, since the authors of the paper didn't reply to my questions and don't provide any code. Still, I wanted to raise the following points that I stumbled upon:

In the ldatuning implementation, the divergence for the whole word distributions for each pair of topics is calculated (lines 254ff). However, my interpretation of the paper is that for any two topics k and k', at first the top n words in their word distribution are determined (sets W_k and W_k' in the paper). This doesn't happen in the implementation – there's no parameter n for the Deveaud2014 function. Furthermore, I think that the divergence is only calculated for the subset of words that occur in the top n list of both topics, i.e. the intersection of W_k and W_k' (see subscript of the sums in eq. 2).

Apart from these possible issues in the implementation, I was wondering about two things in the paper, but as I said I couldn't reach the authors for a discussion. I'd still like to raise these questions here, because maybe someone else has an opinion about that:

  1. The formula for the Jenson-Shannon divergence (JSD) in the paper is different from the one that is usually used: JSD(P||Q) = 1/2 * D(P||M) + 1/2 * D(Q||M), with M = 1/2 (P+Q) and D(X||Y) being the Kullback-Leibler divergence. The paper doesn't explain why. I can see in the comments of the code that the author of ldatuning also stumbled upon this.

  2. What if W_k and W_k' are disjoint, i.e. there are no top words that occur in a pair of topics at the same time? This will actually happen quite often with a large vocabulary, a low n and a high number of topics. In my understanding, this should mean that the word distributions for the top words in both topics completely diverge since they don't even have common top words. So I'd argue that in this case the divergence for such a pair of topics should be the upper bound of the JSD function (which is 1 when the log base is 2). They paper doesn't say anything about what should happen if W_k and W_k' were disjoint, so I guess this means they wouldn't add anything to the total divergence, i.e. if a pair of topics don't share common top words they don't diverge at all, which seems like a strange reasoning to me.

I also wondered why they came up with their own metric anyway, since there were already several topic model evaluation metrics available at the time (Griffiths & Steyvers 2004, Cao et al. 2009, Wallach et al. 2009, Arun et al. 2010 and more). I don't see in the paper how they assessed the performance of their metric compared to the other metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant