Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_training not correctly respecting process_data: False #391

Open
JamesKunstle opened this issue Jan 15, 2025 · 0 comments
Open

run_training not correctly respecting process_data: False #391

JamesKunstle opened this issue Jan 15, 2025 · 0 comments

Comments

@JamesKunstle
Copy link
Contributor

When setting up the arguments to subprocess torchrun in main_ds.py::run_training, we set the data path to the following:

f"--data_path={train_args.data_output_dir}/data.jsonl",

This assumes that we preprocessed the the input data file and wrote the result to data.jsonl
However, this isn't the case if we've set process_data=False in TrainArgs. If this is the case, we should be honoring the data_path argument as the input .jsonl data file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant