We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed --hostfile=/hostfile_remote src/train.py --deepspeed $deepspeed_config --stage sft --do_train --model_name_or_path $MODEL --dataset XXX_img,XXX_noimg --template qwen2_vl --finetuning_type full --streaming True --buffer_size 128 --preprocessing_batch_size 128 --dispatch_batches False --output_dir $output_dir --overwrite_cache --overwrite_output_dir True --warmup_ratio 0.01 --weight_decay 0.1 --adam_beta2 0.95 --per_device_train_batch_size $per_device_train_batch_size --gradient_accumulation_steps $gradient_accumulation_steps --ddp_timeout 9000 --learning_rate 5e-6 --lr_scheduler_type cosine --logging_steps 1 --cutoff_len 10240 --save_steps 2000 --save_total_limit 3 --plot_loss --num_train_epochs 1 --report_to 'wandb' --bf16 True --tf32 True
无需复现
我现在正在sft qwen2-vl,使用了两个数据集。由于数据量大,直接处理会在tokenizer处卡很久(7h+),所以我按照其他issue加入了 --streaming True \ --buffer_size 128 \ --preprocessing_batch_size 128 \ --dispatch_batches False \ 但是训练时会报错: raise ValueError("Please specify max_steps in streaming mode.")
--streaming True \ --buffer_size 128 \ --preprocessing_batch_size 128 \ --dispatch_batches False \
raise ValueError("Please specify
in streaming mode.")
所以这个max_steps是必须的吗? 另外我该怎么计算这个值呢?特别是两个数据集时
The text was updated successfully, but these errors were encountered:
max_steps = num_examples * num_epochs // (per_device_train_batch_size * gradient_accumulation_steps * num_gpus)
Sorry, something went wrong.
No branches or pull requests
Reminder
System Info
deepspeed --hostfile=/hostfile_remote
src/train.py
--deepspeed $deepspeed_config
--stage sft
--do_train
--model_name_or_path $MODEL
--dataset XXX_img,XXX_noimg
--template qwen2_vl
--finetuning_type full
--streaming True
--buffer_size 128
--preprocessing_batch_size 128
--dispatch_batches False
--output_dir $output_dir
--overwrite_cache
--overwrite_output_dir True
--warmup_ratio 0.01
--weight_decay 0.1
--adam_beta2 0.95
--per_device_train_batch_size $per_device_train_batch_size
--gradient_accumulation_steps $gradient_accumulation_steps
--ddp_timeout 9000
--learning_rate 5e-6
--lr_scheduler_type cosine
--logging_steps 1
--cutoff_len 10240
--save_steps 2000
--save_total_limit 3
--plot_loss
--num_train_epochs 1
--report_to 'wandb'
--bf16 True
--tf32 True
Reproduction
Others
我现在正在sft qwen2-vl,使用了两个数据集。由于数据量大,直接处理会在tokenizer处卡很久(7h+),所以我按照其他issue加入了
--streaming True \ --buffer_size 128 \ --preprocessing_batch_size 128 \ --dispatch_batches False \
但是训练时会报错:
raise ValueError("Please specify
max_stepsin streaming mode.")
所以这个max_steps是必须的吗?
另外我该怎么计算这个值呢?特别是两个数据集时
The text was updated successfully, but these errors were encountered: