`run_training` not correctly respecting `process_data: False` #391

JamesKunstle · 2025-01-15T00:46:01Z

When setting up the arguments to subprocess torchrun in main_ds.py::run_training, we set the data path to the following:

f"--data_path={train_args.data_output_dir}/data.jsonl",

This assumes that we preprocessed the the input data file and wrote the result to data.jsonl
However, this isn't the case if we've set process_data=False in TrainArgs. If this is the case, we should be honoring the data_path argument as the input .jsonl data file.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`run_training` not correctly respecting `process_data: False` #391

`run_training` not correctly respecting `process_data: False` #391

JamesKunstle commented Jan 15, 2025

run_training not correctly respecting process_data: False #391

run_training not correctly respecting process_data: False #391

Comments

JamesKunstle commented Jan 15, 2025

`run_training` not correctly respecting `process_data: False` #391

`run_training` not correctly respecting `process_data: False` #391