A question about the SFTTrainer (also a theoretical question about SFT in general) #1083

PradeepKadubandi · 2023-12-11T21:10:50Z

I have a general question about Supervised Fine Tuning (SFT) for Dialogue applications.

Should the SFT process use the same LM objective (next-token prediction) that is used in pre-training a language model?

The "Dialogue" task is predicting "assistant" tokens, right? Shouldn't the objective be predicting only those tokens? Is one way to do this is to set labels for only assistant tokens and ignore the labels on others?

The SFTTrainer implementation does not set labels - as far as I understand, this leads to "labels" being cloned to "input_ids" and shifted right (within transformers code) leading to using "next-token" prediction objective.

More on a philosophical note - if using the same objective as pre-training for SFT, why shouldn't that be called "Fine Tuning" the model (On a dialogue dataset of course) rather than "Supervised Fine Tuning". What am I missing? Is there a reference paper that explains this well? The right approach to do SFT for Dialogue applications?

PradeepKadubandi · 2023-12-11T21:42:25Z

It is not obvious hence the question. For example, the InstructGPT paper mentions SFT but mainly redirects to the (seemingly) first attempt at SFT in this paper which talks about a "Summarization" task but not a "Dialogue" task.

In that paper, when human labelers are asked to summarize and then when the paper mentions "Behavioral Cloning" is used to finetune the LLM to adapt to this task, I'd imagine that only "Summary" section is considered label but not the entire prompt/document. Following that principle, for "Dialogue" tasks, intuitively, I'd imagine that only "assistant" turns should be part of labels.

lvwerra · 2023-12-14T22:45:43Z

We off both options: doing "vanilla" CLM or masking out the user queries: https://huggingface.co/docs/trl/sft_trainer#advanced-usage

I don't think there is a systematic distinction between fine-tuning, supervised fine-tuning or even instruction tuning. Just terms people use to essentially describe the same thing :)

PradeepKadubandi · 2023-12-18T18:09:22Z

Thank you for the pointer! DataCollatorForCompletionOnlyLM is good to know (and what I was looking for in a sense :-))

About the terms, yeah I can see that these can be loosely interchangeable. Based on my literature reading, I have a view of how they are similar and they are (or should be) different - but perhaps everyone has their own view/interpretation :-)

github-actions · 2024-01-12T15:05:05Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Hyfred · 2024-11-09T18:09:19Z

these can be loosely interchangeable

Hi @PradeepKadubandi, thank you for pointing out the question—it was indeed confusing when I tried to fine-tune the model. The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label.

However, the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between these two. Do you have any sense of which method is better suited to specific scenarios?

PradeepKadubandi mentioned this issue Dec 12, 2023

A question about the SFTTrainer (also a theoretical question about SFT in general) huggingface/alignment-handbook#74

Open

github-actions bot closed this as completed Jan 20, 2024

CambioML mentioned this issue Jan 24, 2024

support sft training on d2l CambioML/pykoi-rlhf-finetuned-transformers#100

Closed

jxiw mentioned this issue Sep 24, 2024

Why doesn’t kl_div ignore -100 in pseudo labels? jxiw/MambaInLlama#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the SFTTrainer (also a theoretical question about SFT in general) #1083

A question about the SFTTrainer (also a theoretical question about SFT in general) #1083

PradeepKadubandi commented Dec 11, 2023

PradeepKadubandi commented Dec 11, 2023 •

edited

Loading

lvwerra commented Dec 14, 2023

PradeepKadubandi commented Dec 18, 2023

github-actions bot commented Jan 12, 2024

Hyfred commented Nov 9, 2024

A question about the SFTTrainer (also a theoretical question about SFT in general) #1083

A question about the SFTTrainer (also a theoretical question about SFT in general) #1083

Comments

PradeepKadubandi commented Dec 11, 2023

PradeepKadubandi commented Dec 11, 2023 • edited Loading

lvwerra commented Dec 14, 2023

PradeepKadubandi commented Dec 18, 2023

github-actions bot commented Jan 12, 2024

Hyfred commented Nov 9, 2024

PradeepKadubandi commented Dec 11, 2023 •

edited

Loading