Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch 2.6+rocm6.2.4 #337

Open
wants to merge 8 commits into
base: staging
Choose a base branch
from
Open

Pytorch 2.6+rocm6.2.4 #337

wants to merge 8 commits into from

Conversation

Delaunay
Copy link
Collaborator

No description provided.

@Delaunay
Copy link
Collaborator Author

=================
Benchmark results
=================

System
------
cpu:      AMD EPYC 9575F 64-Core Processor
n_cpu:    256
product:  AMD Instinct MI325 OAM
n_gpu:    8
memory:   262128

Breakdown
---------
bench                    | fail |   n | ngpu |           perf |   sem% |   std% | peak_memory |           score | weight
brax                     |    1 |   1 |    8 |            nan |   nan% |   nan% |         nan |             nan |   1.00
diffusion-gpus           |    0 |   1 |    8 |         263.76 |   3.5% |  26.5% |       63289 |          263.76 |   1.00
diffusion-single         |    0 |   8 |    1 |          45.85 |   0.6% |  13.1% |       58942 |          371.97 |   0.00
dimenet                  |    8 |   8 |    1 |            nan |   nan% |   nan% |         nan |             nan |   1.00
dinov2-giant-gpus        |    1 |   1 |    8 |            nan |   nan% |   nan% |       25115 |             nan |   1.00
dinov2-giant-single      |    8 |   8 |    1 |            nan |   nan% |   nan% |       56759 |             nan |   0.00
dqn                      |    0 |   8 |    1 | 28721415016.72 |   1.3% |  99.3% |        2315 | 229386568802.49 |   0.00
bf16                     |    0 |   8 |    1 |         897.88 |   0.4% |  10.9% |        2611 |         7252.73 |   0.00
fp16                     |    0 |   8 |    1 |         891.03 |   0.4% |  10.3% |        2745 |         7196.57 |   0.00
fp32                     |    0 |   8 |    1 |         151.92 |   0.4% |   9.9% |        3129 |         1226.94 |   0.00
tf32                     |    0 |   8 |    1 |         151.98 |   0.4% |   9.9% |        3129 |         1227.42 |   0.00
bert-fp16                |    0 |   8 |    1 |         440.56 |   1.1% |  17.1% |       17394 |         3623.88 |   0.00
bert-fp32                |    0 |   8 |    1 |         173.22 |   1.1% |  16.6% |       23764 |         1424.65 |   0.00
bert-tf32                |    0 |   8 |    1 |         173.25 |   1.1% |  17.0% |       23764 |         1424.82 |   0.00
bert-tf32-fp16           |    0 |   8 |    1 |         442.41 |   1.1% |  17.0% |       17394 |         3639.61 |   1.00
reformer                 |    0 |   8 |    1 |          89.07 |   0.5% |  11.4% |       14673 |          722.29 |   1.00
t5                       |    0 |   8 |    1 |          95.80 |   0.5% |  11.9% |       35725 |          777.15 |   0.00
whisper                  |    0 |   8 |    1 |         593.22 |   1.3% |  27.9% |       10791 |         4786.90 |   0.00
lightning                |    0 |   8 |    1 |         758.23 |   0.4% |  10.6% |       27150 |         6115.80 |   0.00
lightning-gpus           |    0 |   1 |    8 |        4881.93 |   1.9% |  19.2% |       47046 |         4881.93 |   1.00
llava-single             |    0 |   8 |    1 |           3.61 |   0.5% |  12.3% |       73001 |           29.21 |   1.00
llama                    |    0 |   8 |    1 |         603.95 |   5.0% |  90.4% |       28844 |         4549.66 |   1.00
llm-full-mp-gpus         |    1 |   1 |    8 |            nan |   nan% |   nan% |         nan |             nan |   1.00
llm-lora-ddp-gpus        |    0 |   1 |    8 |       31976.20 |   0.7% |   3.7% |       32863 |        31976.20 |   1.00
llm-lora-mp-gpus         |    1 |   1 |    8 |            nan |   nan% |   nan% |         nan |             nan |   1.00
llm-lora-single          |    0 |   8 |    1 |        6018.13 |   0.7% |  10.6% |       32670 |        48889.42 |   1.00
pna                      |    8 |   8 |    1 |            nan |   nan% |   nan% |         nan |             nan |   1.00
ppo                      |    0 |   8 |    1 |     7480260.93 |   0.5% |  57.8% |        2307 |     59841698.04 |   1.00
recursiongfn             |    0 |   8 |    1 |       12500.20 |   1.3% |  29.6% |        8668 |       100611.94 |   1.00
rlhf-gpus                |    0 |   1 |    8 |       28051.41 |   1.3% |   7.2% |      127147 |        28051.41 |   0.00
rlhf-single              |    0 |   8 |    1 |        3473.90 |   0.6% |  14.8% |      136327 |        28009.42 |   1.00
focalnet                 |    0 |   8 |    1 |         252.13 |   0.6% |  12.6% |       27362 |         2042.82 |   0.00
torchatari               |    0 |   8 |    1 |        4161.54 |   0.4% |   8.5% |        5333 |        33559.31 |   1.00
convnext_large-fp16      |    0 |   8 |    1 |         380.03 |   1.1% |  17.1% |       30951 |         3124.56 |   0.00
convnext_large-fp32      |    0 |   8 |    1 |         207.91 |   1.1% |  16.5% |       53501 |         1708.65 |   0.00
convnext_large-tf32      |    0 |   8 |    1 |         207.72 |   1.0% |  15.9% |       54677 |         1705.06 |   0.00
convnext_large-tf32-fp16 |    0 |   8 |    1 |         372.98 |   1.1% |  17.6% |       29542 |         3061.48 |   1.00
regnet_y_128gf           |    0 |   8 |    1 |         162.62 |   0.6% |  13.4% |       32699 |         1319.27 |   1.00
resnet152-ddp-gpus       |    0 |   1 |    8 |        5578.99 |   1.0% |   7.3% |       32244 |         5578.99 |   0.00
resnet50                 |    0 |   8 |    1 |        1573.27 |   1.2% |  26.7% |       21314 |        12708.11 |   1.00
resnet50-noio            |    0 |   8 |    1 |        2405.55 |   0.1% |   5.9% |       30159 |        19277.66 |   0.00
vjepa-gpus               |    0 |   1 |    8 |         163.44 |   3.5% |  27.6% |       73510 |          163.44 |   1.00
vjepa-single             |    0 |   8 |    1 |          25.59 |   0.8% |  17.3% |       65179 |          206.62 |   1.00

Scores
------
Failure rate:      10.22% (FAIL)
Score:             597.83

Errors
------
28 errors, details in HTML report.

@Delaunay
Copy link
Collaborator Author

~/rocm$ du -h -d 2 .
23G     ./results/cache
29M     ./results/runs
314G    ./results/data
2.8M    ./results/extra
20G     ./results/venv
356G    ./results
176K    ./env/bin
24K     ./env/include
20K     ./env/share
18G     ./env/lib
18G     ./env
373G    .

@Delaunay
Copy link
Collaborator Author

Breakdown
---------
bench                    | fail |   n | ngpu |           perf |   sem% |   std% | peak_memory |           score | weight
diffusion-gpus           |    0 |   1 |    8 |         359.57 |   1.4% |  11.0% |      145712 |          359.57 |   1.00
diffusion-single         |    0 |   8 |    1 |          49.40 |   0.6% |  13.1% |      130216 |          400.81 |   0.00
dimenet                  |    0 |   8 |    1 |         787.28 |   0.7% |  14.6% |       11888 |         6375.55 |   1.00
dqn                      |    0 |   8 |    1 | 28519310039.61 |   1.3% |  99.8% |        2315 | 227759976518.32 |   0.00
bf16                     |    0 |   8 |    1 |         899.16 |   0.4% |  10.8% |        2669 |         7263.06 |   0.00
fp16                     |    0 |   8 |    1 |         891.75 |   0.4% |  10.3% |        2745 |         7202.50 |   0.00
fp32                     |    0 |   8 |    1 |         152.03 |   0.4% |   9.9% |        3129 |         1227.78 |   0.00
tf32                     |    0 |   8 |    1 |         151.85 |   0.4% |   9.9% |        3129 |         1226.34 |   0.00
bert-fp16                |    0 |   8 |    1 |         572.05 |   1.0% |  15.4% |      157138 |         4694.99 |   0.00
bert-fp32                |    0 |   8 |    1 |         185.43 |   0.9% |  14.5% |      162416 |         1516.49 |   0.00
bert-tf32                |    0 |   8 |    1 |         186.18 |   0.9% |  14.3% |      162416 |         1522.73 |   0.00
bert-tf32-fp16           |    0 |   8 |    1 |         563.11 |   1.0% |  16.3% |      157138 |         4619.18 |   1.00
reformer                 |    0 |   8 |    1 |          91.55 |   0.3% |   7.4% |      122113 |          738.83 |   1.00
t5                       |    0 |   8 |    1 |         115.39 |   0.5% |  10.3% |      245888 |          934.37 |   0.00
whisper                  |    0 |   8 |    1 |         751.23 |   1.0% |  21.5% |       72295 |         6062.65 |   0.00
lightning                |    0 |   8 |    1 |         761.78 |   0.3% |   9.8% |       52648 |         6147.26 |   0.00
lightning-gpus           |    0 |   1 |    8 |        5367.16 |   1.0% |   9.7% |       81900 |         5367.16 |   1.00
llava-single             |    0 |   8 |    1 |           3.67 |   0.5% |  11.9% |       72991 |           29.74 |   1.00
llama                    |    0 |   8 |    1 |         612.81 |   5.0% |  89.6% |       28844 |         4615.09 |   1.00
llm-full-mp-gpus         |    0 |   1 |    8 |        4985.04 |   2.3% |  12.0% |      138868 |         4985.04 |   1.00
llm-lora-ddp-gpus        |    0 |   1 |    8 |       47545.43 |   1.0% |   3.9% |      147880 |        47545.43 |   1.00
llm-lora-mp-gpus         |    0 |   1 |    8 |        5781.31 |   2.0% |  10.5% |      236748 |         5781.31 |   1.00
llm-lora-single          |    0 |   8 |    1 |        7524.39 |   0.2% |   2.6% |      179404 |        60400.81 |   1.00
pna                      |    0 |   8 |    1 |        9170.15 |   0.7% |  14.8% |       79986 |        74186.69 |   1.00
ppo                      |    0 |   8 |    1 |     7496883.69 |   0.5% |  58.0% |        2307 |     59974873.56 |   1.00
recursiongfn             |    0 |   8 |    1 |       14182.57 |   1.3% |  27.7% |       20000 |       114231.83 |   1.00
rlhf-gpus                |    0 |   1 |    8 |       25193.45 |   2.3% |  12.2% |       41117 |        25193.45 |   0.00
rlhf-single              |    0 |   8 |    1 |        3456.00 |   0.6% |  15.1% |      135695 |        27855.61 |   1.00
focalnet                 |    0 |   8 |    1 |         266.77 |   0.5% |  11.8% |       47849 |         2162.24 |   0.00
torchatari               |    0 |   8 |    1 |        4282.77 |   0.4% |   8.3% |        4997 |        34536.01 |   1.00
convnext_large-fp16      |    0 |   8 |    1 |         404.97 |   1.1% |  17.0% |      128050 |         3329.95 |   0.00
convnext_large-fp32      |    0 |   8 |    1 |         212.31 |   1.1% |  16.6% |      119747 |         1745.54 |   0.00
convnext_large-tf32      |    0 |   8 |    1 |         214.31 |   1.1% |  16.5% |      116787 |         1762.36 |   0.00
convnext_large-tf32-fp16 |    0 |   8 |    1 |         407.50 |   1.1% |  16.9% |      129800 |         3351.39 |   1.00
regnet_y_128gf           |    0 |   8 |    1 |         164.73 |   0.6% |  13.0% |       35155 |         1336.60 |   1.00
resnet152-ddp-gpus       |    0 |   1 |    8 |        5520.13 |   1.0% |   7.3% |       95352 |         5520.13 |   0.00
resnet50                 |    0 |   8 |    1 |        2202.10 |   0.7% |  14.7% |      116265 |        17850.11 |   1.00
resnet50-noio            |    8 |   8 |    1 |            nan |   nan% |   nan% |      256770 |             nan |   0.00
vjepa-gpus               |    0 |   1 |    8 |         167.46 |   4.6% |  36.1% |       93443 |          167.46 |   1.00
vjepa-single             |    0 |   8 |    1 |          33.00 |   1.3% |  29.1% |      222344 |          265.89 |   1.00

Scores
------
Failure rate:       3.03% (FAIL)
Score:            7150.17

Errors
------
8 errors, details in HTML report.

@Delaunay Delaunay force-pushed the pytorch_2.6 branch 2 times, most recently from 32e1966 to 4732075 Compare February 28, 2025 03:17
@Delaunay Delaunay changed the title Pytorch 2.6 Pytorch 2.6+rocm6.2.4 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant