Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does SFT sum the cross-entropy loss within each sequence? #68

Open
YJWon99 opened this issue Feb 17, 2024 · 3 comments
Open

Why does SFT sum the cross-entropy loss within each sequence? #68

YJWon99 opened this issue Feb 17, 2024 · 3 comments

Comments

@YJWon99
Copy link

YJWon99 commented Feb 17, 2024

Thank you for maintaining such an important repository. I really enjoyed and learned a lot from reading your DPO paper.

I have one question regarding the SFT loss implementation in the repository. Apparently, the SFT loss sums the cross entropy loss within each sequences. However, from my understanding, language modeling loss conventionally averages the cross entropy loss for all tokens within the batch (Ref: GPT2 Loss). I think this results in a difference in computing the standard cross entropy loss between TRL's SFTTrainer and this repository's SFT loss. Why is SFT implemented this way?

@YJWon99 YJWon99 changed the title Why does SFTTrainer sum the cross-entropy loss within each sequence? Why does SFT sum the cross-entropy loss within each sequence? Feb 17, 2024
@HuXiangkun
Copy link

Same question here. Hi @YJWon99 , do you have any ideas now?

@yiyepiaoling0715
Copy link

has been solved? same question

@HuXiangkun
Copy link

@yiyepiaoling0715 I think it's a bug in their code, it should be averaged over the sequence and I made the revision in my experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants