llamafactory 和 trl 训练的DPO 存在差异 #6577

suanflower · 2025-01-09T06:33:41Z

Reminder

I have read the README and searched the existing issues.

System Info

无

Reproduction

官方

Others

你好，我用llamafactory 和 trl 用相同的数据相同的配置进行DPO训练，但是loss 无法重合并且在AT和MT 指标上也有较大的差距.

##客观指标：
llamafactory: AT 6.95, MT 7.97
trl： AT 6.62, MT 7.64

##loss曲线

##超参：

LLama_factory
_name_or_path:"" dispatch_batches:null even_batches:true gradient_accumulation_kwargs:null non_blocking:false split_batches:false use_seedable_sampler:true adafactor:false adam_beta1:0.9 adam_beta2:0.999 adam_epsilon:0.00000001 add_cross_attention:false 0:"Qwen2ForCausalLM" attention_dropout:0 auto_find_batch_size:false bad_words_ids:null batch_eval_metrics:false begin_suppress_tokens:null bf16:true bf16_full_eval:false bos_token_id:151,643 chunk_size_feed_forward:0 cross_attention_hidden_size:null data_seed:null dataloader_drop_last:false dataloader_num_workers:0 dataloader_persistent_workers:false dataloader_pin_memory:true dataloader_prefetch_factor:null ddp_backend:null ddp_broadcast_buffers:null ddp_bucket_cap_mb:null ddp_find_unused_parameters:null ddp_timeout:180,000,000 debug: decoder_start_token_id:null deepspeed:"examples/deepspeed/ds_z3_config.json" disable_tqdm:false dispatch_batches:null diversity_penalty:0 do_eval:true do_predict:false do_sample:false do_train:true early_stopping:false encoder_no_repeat_ngram_size:0 eos_token_id:151,643 eval_accumulation_steps:null eval_delay:0 eval_do_concat_batches:true eval_on_start:false eval_steps:100 eval_strategy:"steps" evaluation_strategy:"steps" exponential_decay_length_penalty:null finetuning_task:null forced_bos_token_id:null forced_eos_token_id:null fp16:false fp16_backend:"auto" fp16_full_eval:false fp16_opt_level:"O1" fsdp: min_num_params:0 xla:false xla_fsdp_grad_ckpt:false xla_fsdp_v2:false fsdp_min_num_params:0 fsdp_transformer_layer_cls_to_wrap:null full_determinism:false generation_config:null generation_max_length:null generation_num_beams:null gradient_accumulation_steps:2 gradient_checkpointing:true gradient_checkpointing_kwargs:null greater_is_better:null group_by_length:false half_precision_backend:"auto" hidden_act:"silu" hidden_size:3,584 hub_always_push:false hub_model_id:null hub_private_repo:false hub_strategy:"every_save" hub_token:"<HUB_TOKEN>" 0:"LABEL_0" 1:"LABEL_1" ignore_data_skip:false include_inputs_for_metrics:false include_num_input_tokens_seen:false include_tokens_per_second:false initializer_range:0.02 intermediate_size:18,944 is_decoder:false is_encoder_decoder:false jit_mode_eval:false label_names:null label_smoothing_factor:0 LABEL_0:0 LABEL_1:1 learning_rate:0.000001 length_column_name:"length" length_penalty:1 load_best_model_at_end:false local_rank:0 log_level:"passive" log_level_replica:"warning" log_on_each_node:true logging_dir:"" logging_first_step:false logging_nan_inf_filter:true logging_steps:10 logging_strategy:"steps" lr_scheduler_type:"cosine" max_grad_norm:1 max_length:20 max_position_embeddings:131,072 max_steps:-1 max_window_layers:28 metric_for_best_model:null min_length:0 model_type:"qwen2" model/num_parameters:0 mp_parameters:"" neftune_noise_alpha:null no_cuda:false no_repeat_ngram_size:0 num_attention_heads:28 num_beam_groups:1 num_beams:1 num_hidden_layers:28 num_key_value_heads:4 num_return_sequences:1 num_train_epochs:2 optim:"adamw_torch" optim_args:null optim_target_modules:null output_attentions:false output_dir:"" output_hidden_states:false output_scores:false overwrite_output_dir:true pad_token_id:null past_index:-1 per_device_eval_batch_size:1 per_device_train_batch_size:1 per_gpu_eval_batch_size:null per_gpu_train_batch_size:null predict_with_generate:false prediction_loss_only:false prefix:null problem_type:null push_to_hub:false push_to_hub_model_id:null push_to_hub_organization:null push_to_hub_token:"<PUSH_TO_HUB_TOKEN>" ray_scope:"last" remove_invalid_values:false remove_unused_columns:false repetition_penalty:1 0:"wandb" restore_callback_states_from_checkpoint:false resume_from_checkpoint:null return_dict:true return_dict_in_generate:false rms_norm_eps:0.000001 rope_theta:1,000,000 run_name:"dpo_cst_20250105_qwen25_7b_like_v8_daxue" save_on_each_node:false save_only_model:false save_safetensors:true save_steps:500 save_strategy:"epoch" save_total_limit:null seed:42 sep_token_id:null skip_memory_metrics:true sliding_window:131,072 sortish_sampler:false split_batches:null suppress_tokens:null task_specific_params:null temperature:1 tf_legacy_loss:false tf32:null tie_encoder_decoder:false tie_word_embeddings:false tokenizer_class:null top_k:50 top_p:1 torch_compile:false torch_compile_backend:null torch_compile_mode:null torch_dtype:"float16" torchdynamo:null torchscript:false tpu_metrics_debug:false tpu_num_cores:null transformers_version:"4.42.2" typical_p:1 use_bfloat16:false use_cache:false use_cpu:false use_ipex:false use_legacy_prediction_loop:false use_mps_device:false use_mrope:false use_sliding_window:false vocab_size:152,064 warmup_ratio:0.1 warmup_steps:0 weight_decay:0

TRL
_name_or_path:"" dispatch_batches:null even_batches:true gradient_accumulation_kwargs:null non_blocking:false split_batches:false use_seedable_sampler:true adafactor:false adam_beta1:0.9 adam_beta2:0.999 adam_epsilon:0.00000001 add_cross_attention:false 0:"Qwen2ForCausalLM" attention_dropout:0 auto_find_batch_size:false bad_words_ids:null batch_eval_metrics:false begin_suppress_tokens:null beta:0.1 bf16:true bf16_full_eval:false bos_token_id:151,643 chunk_size_feed_forward:0 cross_attention_hidden_size:null data_seed:null dataloader_drop_last:false dataloader_num_workers:0 dataloader_persistent_workers:false dataloader_pin_memory:true dataloader_prefetch_factor:null dataset_num_proc:null ddp_backend:null ddp_broadcast_buffers:null ddp_bucket_cap_mb:null ddp_find_unused_parameters:null ddp_timeout:180,000,000 debug: decoder_start_token_id:null deepspeed:"examples/accelerate_configs/deepspeed_zero3.json" disable_dropout:true disable_tqdm:false discopop_tau:0.05 dispatch_batches:null diversity_penalty:0 do_eval:true do_predict:false do_sample:false do_train:true early_stopping:false encoder_no_repeat_ngram_size:0 eos_token_id:151,643 eval_accumulation_steps:null eval_delay:0 eval_do_concat_batches:true eval_on_start:false eval_steps:100 eval_strategy:"steps" evaluation_strategy:"steps" exponential_decay_length_penalty:null f_alpha_divergence_coef:1 f_divergence_type:"reverse_kl" finetuning_task:null force_use_ref_model:false forced_bos_token_id:null forced_eos_token_id:null fp16:false fp16_backend:"auto" fp16_full_eval:false fp16_opt_level:"O1" fsdp: min_num_params:0 xla:false xla_fsdp_grad_ckpt:false xla_fsdp_v2:false fsdp_min_num_params:0 fsdp_transformer_layer_cls_to_wrap:null full_determinism:false generate_during_eval:false gradient_accumulation_steps:2 gradient_checkpointing:true gradient_checkpointing_kwargs:null greater_is_better:null group_by_length:false half_precision_backend:"auto" hidden_act:"silu" hidden_size:3,584 hub_always_push:false hub_model_id:null hub_private_repo:false hub_strategy:"every_save" hub_token:"<HUB_TOKEN>" 0:"LABEL_0" 1:"LABEL_1" ignore_data_skip:false include_inputs_for_metrics:false include_num_input_tokens_seen:false include_tokens_per_second:false initializer_range:0.02 intermediate_size:18,944 is_decoder:false is_encoder_decoder:null jit_mode_eval:false label_names:null label_pad_token_id:-100 label_smoothing:0 label_smoothing_factor:0 LABEL_0:0 LABEL_1:1 learning_rate:0.000001 length_column_name:"length" length_penalty:1 load_best_model_at_end:false local_rank:0 log_level:"passive" log_level_replica:"warning" log_on_each_node:true logging_dir:"" logging_first_step:false logging_nan_inf_filter:true logging_steps:10 logging_strategy:"steps" loss_type:"sigmoid" lr_scheduler_type:"cosine" max_completion_length:null max_grad_norm:1 max_length:4,096 max_position_embeddings:131,072 max_prompt_length:null max_steps:-1 max_window_layers:28 metric_for_best_model:null min_length:0 model_adapter_name:null model_init_kwargs:null model_type:"qwen2" model/num_parameters:0 mp_parameters:"" neftune_noise_alpha:null no_cuda:false no_repeat_ngram_size:0 num_attention_heads:28 num_beam_groups:1 num_beams:1 num_hidden_layers:28 num_key_value_heads:4 num_return_sequences:1 num_train_epochs:2 optim:"adamw_torch" optim_args:null optim_target_modules:null output_attentions:false output_dir:"" output_hidden_states:false output_scores:false overwrite_output_dir:true pad_token_id:null padding_value:null past_index:-1 per_device_eval_batch_size:1 per_device_train_batch_size:1 per_gpu_eval_batch_size:null per_gpu_train_batch_size:null precompute_ref_batch_size:null precompute_ref_log_probs:false prediction_loss_only:false prefix:null problem_type:null push_to_hub:false push_to_hub_model_id:null push_to_hub_organization:null push_to_hub_token:"<PUSH_TO_HUB_TOKEN>" ray_scope:"last" ref_adapter_name:null ref_model_init_kwargs:null ref_model_mixup_alpha:0.9 ref_model_sync_steps:64 reference_free:false remove_invalid_values:false remove_unused_columns:true repetition_penalty:1 0:"wandb" restore_callback_states_from_checkpoint:false resume_from_checkpoint:null return_dict:true return_dict_in_generate:false rms_norm_eps:0.000001 rope_theta:1,000,000 rpo_alpha:null run_name:"dpo_cst_20250108_dpo_log09_del_imend_hp" save_on_each_node:false save_only_model:false save_safetensors:true save_steps:500 save_strategy:"epoch" save_total_limit:3 seed:42 sep_token_id:null skip_memory_metrics:true sliding_window:131,072 split_batches:null suppress_tokens:null sync_ref_model:false task_specific_params:null temperature:1 tf_legacy_loss:false tf32:null tie_encoder_decoder:false tie_word_embeddings:false tokenizer_class:null top_k:50 top_p:1 torch_compile:false torch_compile_backend:null torch_compile_mode:null torch_dtype:"float16" torchdynamo:null torchscript:false tpu_metrics_debug:false tpu_num_cores:null transformers_version:"4.42.2" truncation_mode:"keep_end" typical_p:1 use_bfloat16:false use_cache:false use_cpu:false use_ipex:false use_legacy_prediction_loop:false use_mps_device:false use_mrope:false use_num_logits_to_keep:false use_sliding_window:false use_weighting:false vocab_size:152,064 warmup_ratio:0.1 warmup_steps:0 weight_decay:0

The text was updated successfully, but these errors were encountered:

hiyouga · 2025-01-09T13:13:18Z

无法复现

github-actions bot added the pending This problem is yet to be addressed label Jan 9, 2025

suanflower closed this as completed Jan 9, 2025

suanflower reopened this Jan 9, 2025

hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Jan 9, 2025

hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamafactory 和 trl 训练的DPO 存在差异 #6577

llamafactory 和 trl 训练的DPO 存在差异 #6577

suanflower commented Jan 9, 2025 •

edited

Loading

hiyouga commented Jan 9, 2025

llamafactory 和 trl 训练的DPO 存在差异 #6577

llamafactory 和 trl 训练的DPO 存在差异 #6577

Comments

suanflower commented Jan 9, 2025 • edited Loading

Reminder

System Info

Reproduction

Others

hiyouga commented Jan 9, 2025

suanflower commented Jan 9, 2025 •

edited

Loading