You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding of the metric introduced by Deveaud et al. 2014 (section 3.2) differs from how it is implemented in the ldatuning package. However, I can't tell if my understanding is correct, since the authors of the paper didn't reply to my questions and don't provide any code. Still, I wanted to raise the following points that I stumbled upon:
In the ldatuning implementation, the divergence for the whole word distributions for each pair of topics is calculated (lines 254ff). However, my interpretation of the paper is that for any two topics k and k', at first the top n words in their word distribution are determined (sets W_k and W_k' in the paper). This doesn't happen in the implementation – there's no parameter n for the Deveaud2014 function. Furthermore, I think that the divergence is only calculated for the subset of words that occur in the top n list of both topics, i.e. the intersection of W_k and W_k' (see subscript of the sums in eq. 2).
Apart from these possible issues in the implementation, I was wondering about two things in the paper, but as I said I couldn't reach the authors for a discussion. I'd still like to raise these questions here, because maybe someone else has an opinion about that:
The formula for the Jenson-Shannon divergence (JSD) in the paper is different from the one that is usually used: JSD(P||Q) = 1/2 * D(P||M) + 1/2 * D(Q||M), with M = 1/2 (P+Q) and D(X||Y) being the Kullback-Leibler divergence. The paper doesn't explain why. I can see in the comments of the code that the author of ldatuning also stumbled upon this.
What if W_k and W_k' are disjoint, i.e. there are no top words that occur in a pair of topics at the same time? This will actually happen quite often with a large vocabulary, a low n and a high number of topics. In my understanding, this should mean that the word distributions for the top words in both topics completely diverge since they don't even have common top words. So I'd argue that in this case the divergence for such a pair of topics should be the upper bound of the JSD function (which is 1 when the log base is 2). They paper doesn't say anything about what should happen if W_k and W_k' were disjoint, so I guess this means they wouldn't add anything to the total divergence, i.e. if a pair of topics don't share common top words they don't diverge at all, which seems like a strange reasoning to me.
I also wondered why they came up with their own metric anyway, since there were already several topic model evaluation metrics available at the time (Griffiths & Steyvers 2004, Cao et al. 2009, Wallach et al. 2009, Arun et al. 2010 and more). I don't see in the paper how they assessed the performance of their metric compared to the other metrics.
The text was updated successfully, but these errors were encountered:
My understanding of the metric introduced by Deveaud et al. 2014 (section 3.2) differs from how it is implemented in the ldatuning package. However, I can't tell if my understanding is correct, since the authors of the paper didn't reply to my questions and don't provide any code. Still, I wanted to raise the following points that I stumbled upon:
In the ldatuning implementation, the divergence for the whole word distributions for each pair of topics is calculated (lines 254ff). However, my interpretation of the paper is that for any two topics
k
andk'
, at first the topn
words in their word distribution are determined (setsW_k
andW_k'
in the paper). This doesn't happen in the implementation – there's no parametern
for theDeveaud2014
function. Furthermore, I think that the divergence is only calculated for the subset of words that occur in the topn
list of both topics, i.e. the intersection ofW_k
andW_k'
(see subscript of the sums in eq. 2).Apart from these possible issues in the implementation, I was wondering about two things in the paper, but as I said I couldn't reach the authors for a discussion. I'd still like to raise these questions here, because maybe someone else has an opinion about that:
The formula for the Jenson-Shannon divergence (JSD) in the paper is different from the one that is usually used:
JSD(P||Q) = 1/2 * D(P||M) + 1/2 * D(Q||M)
, withM = 1/2 (P+Q)
andD(X||Y)
being the Kullback-Leibler divergence. The paper doesn't explain why. I can see in the comments of the code that the author of ldatuning also stumbled upon this.What if
W_k
andW_k'
are disjoint, i.e. there are no top words that occur in a pair of topics at the same time? This will actually happen quite often with a large vocabulary, a lown
and a high number of topics. In my understanding, this should mean that the word distributions for the top words in both topics completely diverge since they don't even have common top words. So I'd argue that in this case the divergence for such a pair of topics should be the upper bound of theJSD
function (which is 1 when the log base is 2). They paper doesn't say anything about what should happen ifW_k
andW_k'
were disjoint, so I guess this means they wouldn't add anything to the total divergence, i.e. if a pair of topics don't share common top words they don't diverge at all, which seems like a strange reasoning to me.I also wondered why they came up with their own metric anyway, since there were already several topic model evaluation metrics available at the time (Griffiths & Steyvers 2004, Cao et al. 2009, Wallach et al. 2009, Arun et al. 2010 and more). I don't see in the paper how they assessed the performance of their metric compared to the other metrics.
The text was updated successfully, but these errors were encountered: