Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch 2.5 & torchtune 0.3+ #315

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0fa1419
Started instrumeting recipes from newer torchtune for milabench
rkarhila-amd Nov 12, 2024
3d83577
This specific torchtune version requires a roundabout way of importin…
rkarhila-amd Nov 12, 2024
37b35f7
Updated recipes and configs
rkarhila-amd Nov 14, 2024
87b987f
file left out from previous commit + conf typo fix
rkarhila-amd Nov 15, 2024
b6cf6be
Merge branch 'master' of github.com:mila-iqia/milabench into pytorch2.5
Nov 22, 2024
d47751c
Update dockerfile
Jan 14, 2025
5020932
-
Jan 15, 2025
7936166
Delete benchmarks/geo_gnn/bad.txt
Delaunay Jan 15, 2025
40ff390
Update base.yaml
Delaunay Jan 15, 2025
feb9cca
update dependencies to torch 2.5
Jan 15, 2025
35cdcfa
Add shared setup
Jan 15, 2025
2b2bcb2
Merge branch 'docker' of github.com:mila-iqia/milabench into docker
Jan 15, 2025
a0293eb
Update torchtune and pytorch
Jan 16, 2025
1340e16
Merge branch 'docker' of github.com:mila-iqia/milabench into staging
Jan 16, 2025
776e3e1
Update LLM benchmarks
Jan 16, 2025
df7d8a1
use python 3.10
Jan 17, 2025
cf751f7
Add utility to help launch milabench with docker
Jan 17, 2025
d73af7d
Make torchrun use docker in multinode
Jan 17, 2025
40c35bd
Add docker to ForeachNode
Jan 17, 2025
3f860c7
Add documentation for docker + multinode
Jan 17, 2025
68cc940
Disable GPU warden on prepare
Jan 21, 2025
b2e4cc2
Maximise build space
Jan 21, 2025
1c538b0
Add missing dependencies
Jan 21, 2025
f035e5b
Increase root system size
Jan 21, 2025
0710fff
Add to avoid flooding journald
Jan 21, 2025
8cad4a2
Fix dataset path for vjepa
Jan 22, 2025
a784485
Update llm-lora-ddp-gpus
Jan 22, 2025
684e894
Update llm-lora-ddp-gpus
Jan 22, 2025
d02a574
Update llm-lora-ddp-gpus
Jan 22, 2025
2091a16
Fix rlhf-gpus
Jan 22, 2025
f67e5de
Update llava model
Jan 22, 2025
a1a9a06
Fix llm-lora-ddp-gpus
Jan 22, 2025
612a8c8
Fix llm-lora-single
Jan 22, 2025
130a131
Update llm-full-mp-gpus
Jan 22, 2025
5b4fe16
Remove dataset.pack
Jan 22, 2025
ad2f3e3
update batch resizing logic
Jan 23, 2025
316fdfa
Remove the process monitor grom the GPU monitor
Jan 23, 2025
60843ba
Add channel last to resnet50
Jan 23, 2025
61ca0a3
Set right version for cantilever
Feb 4, 2025
ea605fc
Update sizer.py
Delaunay Feb 5, 2025
3249edf
Update recipes.rst
Delaunay Feb 6, 2025
2b16586
MI325 - rocm 6.2 (#331)
Delaunay Feb 26, 2025
ab5e50b
Add L40S config and H100 config
Mar 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
file left out from previous commit + conf typo fix
rkarhila-amd committed Nov 15, 2024
commit 87b987f66e739fb1d8289a78d915083d53420aa4
12 changes: 12 additions & 0 deletions benchmarks/llm/recipes/full_finetune_distributed.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python3

# As of November 2024, the development of torchrun is very rapid.
# This is the recipe based on torchrun recipe git commit e137afe (post release 0.3.1)
# https://github.com/pytorch/torchtune/blob/7bfb3336446f0d874ab5d4595249839b735b7076/recipes/lora_finetune_distributed.py

# Torchtune 0.2.1 recipe with device instrumenation (c) Mila
# https://github.com/mila-iqia/milabench/blob/a60a3aae21e87e46bcce403620a3f56c12878554/benchmarks/llm/recipes/full_finetune_distributed.py

# The instrumentation edits (c) AMD


# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
2 changes: 1 addition & 1 deletion config/base.yaml
Original file line number Diff line number Diff line change
@@ -652,7 +652,7 @@ llm-full-mp-gpus:

argv:
#"{milabench_code}/recipes/full_finetune_distributed.py": true
tuneworkaroundrecipes.full_finetune_distributed: true
tuneworkaroundrecipes/full_finetune_distributed.py: true
--config: "{milabench_code}/configs/llama3_70B_full.yaml"
epochs=1: true
output_dir={milabench_extra}/output: true