Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问在sft多个数据集时,max_steps应该怎么算 #6581

Closed
1 task done
128Ghe980 opened this issue Jan 9, 2025 · 1 comment
Closed
1 task done

请问在sft多个数据集时,max_steps应该怎么算 #6581

128Ghe980 opened this issue Jan 9, 2025 · 1 comment
Labels
solved This problem has been already solved

Comments

@128Ghe980
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

deepspeed --hostfile=/hostfile_remote
src/train.py
--deepspeed $deepspeed_config
--stage sft
--do_train
--model_name_or_path $MODEL
--dataset XXX_img,XXX_noimg
--template qwen2_vl
--finetuning_type full
--streaming True
--buffer_size 128
--preprocessing_batch_size 128
--dispatch_batches False
--output_dir $output_dir
--overwrite_cache
--overwrite_output_dir True
--warmup_ratio 0.01
--weight_decay 0.1
--adam_beta2 0.95
--per_device_train_batch_size $per_device_train_batch_size
--gradient_accumulation_steps $gradient_accumulation_steps
--ddp_timeout 9000
--learning_rate 5e-6
--lr_scheduler_type cosine
--logging_steps 1
--cutoff_len 10240
--save_steps 2000
--save_total_limit 3
--plot_loss
--num_train_epochs 1
--report_to 'wandb'
--bf16 True
--tf32 True

Reproduction

无需复现

Others

我现在正在sft qwen2-vl,使用了两个数据集。由于数据量大,直接处理会在tokenizer处卡很久(7h+),所以我按照其他issue加入了
--streaming True \ --buffer_size 128 \ --preprocessing_batch_size 128 \ --dispatch_batches False \
但是训练时会报错:
raise ValueError("Please specify max_steps in streaming mode.")

所以这个max_steps是必须的吗?
另外我该怎么计算这个值呢?特别是两个数据集时

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 9, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 9, 2025

max_steps = num_examples * num_epochs // (per_device_train_batch_size * gradient_accumulation_steps * num_gpus)

@hiyouga hiyouga closed this as completed Jan 9, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants