Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【DPO相关】请问DPO训练结束后,应该如何进行推理呢?能使用类似SFT训练后推理的脚本吗 #6558

Closed
1 task done
Maydaytyh opened this issue Jan 8, 2025 · 1 comment
Labels
solved This problem has been already solved

Comments

@Maydaytyh
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.31
  • Python version: 3.11.11
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA RTX A6000

Reproduction

这是训练的推理文件

### model
model_name_or_path: mergd_llama2_7b
# trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: q_proj,v_proj
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dataset
template: llama2
# cutoff_len: 2048
# max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: new_path
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 5.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

# ### eval
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500(chatkbqa)

这是推理的配置文件

### model
model_name_or_path: mergd_llama2_7b
adapter_name_or_path: new_path

### method
stage: dpo
do_predict: true
finetuning_type: lora
lora_target: q_proj,v_proj

### dataset
template: llama2
eval_dataset: eval_dataset
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: output_dir
logging_steps: 10
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

这样子运行后,无法得到输出文件,所以想请教一下DPO之后应该如何进行推理呢?

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 8, 2025
@hiyouga hiyouga closed this as completed Jan 8, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants