Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rouge_score with accumulate='best' gives mixed results #2148

Closed
volksen opened this issue Oct 6, 2023 · 4 comments · Fixed by #2830
Closed

rouge_score with accumulate='best' gives mixed results #2148

volksen opened this issue Oct 6, 2023 · 4 comments · Fixed by #2830
Assignees
Labels
bug / fix Something isn't working good first issue Good for newcomers topic: Text v1.1.x

Comments

@volksen
Copy link

volksen commented Oct 6, 2023

🐛 Bug

Hi,

when using the rouge_score with accumulate="best", the results are dependent on the order of the labels. As of my understanding, accumulate="best" should return the best f score over all references.

Minimal example:

from torchmetrics.functional.text import rouge_score

preds = "a b c"
references = ["a b c", "c b a"]
references_rev = ["c b a", "a b c"]

print(rouge_score(preds, references, accumulate='best'))
print(rouge_score(preds, references_rev, accumulate='best'))

gives different results:

{'rouge1_fmeasure': tensor(1.), 'rouge1_precision': tensor(1.), 'rouge1_recall': tensor(1.), 'rouge2_fmeasure': tensor(1.), 'rouge2_precision': tensor(1.), 'rouge2_recall': tensor(1.), 'rougeL_fmeasure': tensor(1.), 'rougeL_precision': tensor(1.), 'rougeL_recall': tensor(1.), 'rougeLsum_fmeasure': tensor(1.), 'rougeLsum_precision': tensor(1.), 'rougeLsum_recall': tensor(1.)}
{'rouge1_fmeasure': tensor(1.), 'rouge1_precision': tensor(1.), 'rouge1_recall': tensor(1.), 'rouge2_fmeasure': tensor(0.), 'rouge2_precision': tensor(0.), 'rouge2_recall': tensor(0.), 'rougeL_fmeasure': tensor(0.3333), 'rougeL_precision': tensor(0.3333), 'rougeL_recall': tensor(0.3333), 'rougeLsum_fmeasure': tensor(0.3333), 'rougeLsum_precision': tensor(0.3333), 'rougeLsum_recall': tensor(0.3333)}

Did I missread the documentation or is this a bug. Accumulate='avg' works as expected.
Maybe the bug is in https://github.com/Lightning-AI/torchmetrics/blob/v1.1.0/src/torchmetrics/functional/text/rouge.py#L378
where there is a todo comment.

I compared the results to the rouge-score package:

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=False)
preds = "a b c"
references = ["a b c", "c b a"]
references_rev = ["c b a", "a b c"]
print(scorer.score_multi(references, preds))
print(scorer.score_multi(references_rev, preds))

which gives the same results in both cases:

{'rouge1': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rouge2': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rougeL': Score(precision=1.0, recall=1.0, fmeasure=1.0)}
{'rouge1': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rouge2': Score(precision=1.0, recall=1.0, fmeasure=1.0), 'rougeL': Score(precision=1.0, recall=1.0, fmeasure=1.0)}

Environment

  • TorchMetrics Version: 1.1.2
  • Python 3.10.12
  • torch 2.0.1
@volksen volksen added bug / fix Something isn't working help wanted Extra attention is needed labels Oct 6, 2023
@github-actions
Copy link

github-actions bot commented Oct 6, 2023

Hi! thanks for your contribution!, great first issue!

@Borda Borda changed the title rouge_score with accumulate='best' gives mixed results rouge_score with accumulate='best' gives mixed results Oct 6, 2023
@stancld
Copy link
Contributor

stancld commented Oct 13, 2023

Thanks for the report! Gonna check this weekend.

@Borda
Copy link
Member

Borda commented Feb 2, 2024

Thanks for the report! Gonna check this weekend.

@stancld, did you have a chance to have a look at it? 🐰

@Borda Borda added the good first issue Good for newcomers label Aug 29, 2024
@rittik9
Copy link
Contributor

rittik9 commented Sep 10, 2024

@Borda pls assign it to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working good first issue Good for newcomers topic: Text v1.1.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@volksen @Borda @stancld @rittik9 and others