Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 985 Bytes

memory_profiler.md

File metadata and controls

13 lines (11 loc) · 985 Bytes

Enable Memory Profiling

Launch training job with the following command (or alternatively set configs in toml files)

CONFIG_FILE="./train_configs/debug_model.toml" ./run_llama_train.sh --profiling.enable_memory_snapshot --profiling.save_memory_snapshot_folder memory_snapshot
  • --profiling.enable_memory_snapshot: to enable memory profiling
  • --profiling.save_memory_snapshot_folder: configures the folder which memory snapshots are dumped into (./outputs/memory_snapshot/ by default)
    • In case of OOMs, the snapshots will be in ./outputs/memory_snapshot/iteration_x_exit.
    • Regular snapshots (taken every profiling.profile_freq iterations) will be in memory_snapshot/iteration_x.

You cab find the saved pickle files in your output folder. To visualize a snapshot file, you can drag and drop it to https://pytorch.org/memory_viz. To learn more details on memory profiling, please visit this tutorial.