【DPO相关】请问DPO训练结束后，应该如何进行推理呢？能使用类似SFT训练后推理的脚本吗 #6558

Maydaytyh · 2025-01-08T02:36:49Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.31
Python version: 3.11.11
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA RTX A6000

Reproduction

这是训练的推理文件

### model
model_name_or_path: mergd_llama2_7b
# trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: q_proj,v_proj
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dataset
template: llama2
# cutoff_len: 2048
# max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: new_path
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 5.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

# ### eval
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500(chatkbqa)

这是推理的配置文件

### model
model_name_or_path: mergd_llama2_7b
adapter_name_or_path: new_path

### method
stage: dpo
do_predict: true
finetuning_type: lora
lora_target: q_proj,v_proj

### dataset
template: llama2
eval_dataset: eval_dataset
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: output_dir
logging_steps: 10
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

这样子运行后，无法得到输出文件，所以想请教一下DPO之后应该如何进行推理呢？

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2025-01-08T03:45:43Z

https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#batch-generation-using-vllm-tensor-parallel

github-actions bot added the pending This problem is yet to be addressed label Jan 8, 2025

hiyouga closed this as completed Jan 8, 2025

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【DPO相关】请问DPO训练结束后，应该如何进行推理呢？能使用类似SFT训练后推理的脚本吗 #6558

【DPO相关】请问DPO训练结束后，应该如何进行推理呢？能使用类似SFT训练后推理的脚本吗 #6558

Maydaytyh commented Jan 8, 2025

hiyouga commented Jan 8, 2025

【DPO相关】请问DPO训练结束后，应该如何进行推理呢？能使用类似SFT训练后推理的脚本吗 #6558

【DPO相关】请问DPO训练结束后，应该如何进行推理呢？能使用类似SFT训练后推理的脚本吗 #6558

Comments

Maydaytyh commented Jan 8, 2025

Reminder

System Info

Reproduction

Others

hiyouga commented Jan 8, 2025