You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when using the rouge_score with accumulate="best", the results are dependent on the order of the labels. As of my understanding, accumulate="best" should return the best f score over all references.
Minimal example:
fromtorchmetrics.functional.textimportrouge_scorepreds="a b c"references= ["a b c", "c b a"]
references_rev= ["c b a", "a b c"]
print(rouge_score(preds, references, accumulate='best'))
print(rouge_score(preds, references_rev, accumulate='best'))
I compared the results to the rouge-score package:
fromrouge_scoreimportrouge_scorerscorer=rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=False)
preds="a b c"references= ["a b c", "c b a"]
references_rev= ["c b a", "a b c"]
print(scorer.score_multi(references, preds))
print(scorer.score_multi(references_rev, preds))
🐛 Bug
Hi,
when using the rouge_score with accumulate="best", the results are dependent on the order of the labels. As of my understanding, accumulate="best" should return the best f score over all references.
Minimal example:
gives different results:
Did I missread the documentation or is this a bug. Accumulate='avg' works as expected.
Maybe the bug is in https://github.com/Lightning-AI/torchmetrics/blob/v1.1.0/src/torchmetrics/functional/text/rouge.py#L378
where there is a todo comment.
I compared the results to the rouge-score package:
which gives the same results in both cases:
Environment
The text was updated successfully, but these errors were encountered: