Skip to content

rkarhila-amd/mila_rocm_docker

Repository files navigation

Run tests in prebuilt container:

Clone this reposority with submodules:

git clone --recurse-submodules [email protected]:rkarhila-amd/mila_rocm_docker.git

A branch of milabench called pytorch2.5 is included as a submodule, and you'll need to make sure that milabench code really is at that branch. (I think I messed up royally setting it up.) Because this is still in development, this directory will be mounted inside the running container.

Then put your huggingface token in the marked place in the run script scripts/run_in_rocm_container.sh

Check the mounts in the milabench_container_wrapper.sh and edit to your liking -- Otherwise the script will mount ./results directory into the container. The virtual envs stay in the container, but data and model cache as well as results will be written to the mounted directory on the host.

Then run your tests with wrapper script. For example, to run llama:

./milabench_container_wrapper.sh --select llama

Build the container:

Clone the repo with submodules:

git clone --recurse-submodules [email protected]:rkarhila-amd/mila_rocm_docker.git

If your milabench submodule is not in pytorch2.5 branch, do git submodule update --remote or some other operation to put it in the correct branch.

To build:

docker/build_and_push_mi250mi300_torch251_image.bash

Patch it with the latest test code:

docker build -f docker/Dockerfile-rocm-mi250mi300_torch251_patch -t mila-rocm-docker:torch2.5.1-mi250mi300-2024-11-09_patched .

This will take some time.

While waiting for it to finish, put your huggingface token in the marked place in the run script scripts/run_in_rocm_container.sh.

Change the wrapper script ./milabench_container_wrapper.sh to use your brand new image. and then run your tests with wrapper script. For example, to run llama:

./milabench_container_wrapper.sh --select llama

But don't run just yet. Check the mounts in the wrapper, they tell you where your results will be written.

Status on 2024 Nov 15th

Runs now the "llm-*" tests om MI300:

bench                    | fail |   n | ngpu |       perf |   sem% |   std% | peak_memory |      score | weight
llama                    |    2 |  10 |    1 |     595.02 |   5.0% |  90.1% |       28777 |    3602.48 |   1.00
llm-full-mp-gpus         |    6 |   7 |    8 |     389.01 |   2.5% |  13.4% |       36512 |      55.57 |   1.00
llm-full-mp-nodes        |    2 |   2 |    0 |        nan |   nan% |   nan% |         nan |        nan |   1.00
llm-lora-ddp-gpus        |    4 |   5 |    8 |   29023.93 |   0.8% |   4.4% |       88865 |    5804.79 |   1.00
llm-lora-ddp-nodes       |    2 |   2 |    0 |        nan |   nan% |   nan% |         nan |        nan |   1.00
llm-lora-mp-gpus         |    4 |   5 |    8 |    3437.24 |   2.3% |  12.3% |       59343 |     687.45 |   1.00
llm-lora-single          |   11 |  19 |    1 |    5625.20 |   1.0% |  15.6% |       72048 |   19425.03 |   1.00

Lots of other tests broken. Probably a problem with container rather than code updates. Will be investigated next.

Status on 2024 Nov 14th

Some packages have been updated from original dependencies:

pytorch-triton-rocm     3.0.0          => 3.1.0
sympy                   1.13.3         => 1.13.1
torch                   2.4.1+rocm6.1  => 2.5.1+rocm6.1
torchao                 0.3.1          => 0.6.1
torchtune               0.2.1          => e1caa9f82fea24d728f9b244a9dd1957f5ed7465 
                                          (github commit from Nov 10th 2024)
torchvision             0.19.1+rocm6.1 => 0.19.1+rocm6.1

This forced a rewrite of the llm test recipes.

Tested on 8xMI250. Some tasks fail because they depend on cuda components now available for rocm, others fail because they run out of memory on the GPU.

=================
Benchmark results
=================
System
------
cpu:      AMD EPYC 7713 64-Core Processor
n_cpu:    128
product:  AMD INSTINCT MI250 (MCM) OAM AC MBA
n_gpu:    8
memory:   65520

Breakdown
---------
bench                    | fail |   n | ngpu |       perf |   sem% |   std% | peak_memory |      score | weight
brax                     |    0 |   1 |    8 |   99552.42 |   1.8% |  13.6% |        1442 |   99552.42 |   1.00
diffusion-gpus           |    0 |   1 |    8 |      99.35 |   0.0% |   0.3% |       59368 |      99.35 |   1.00
diffusion-single         |    0 |   8 |    1 |      12.69 |   0.5% |  11.9% |       55583 |     102.99 |   1.00
dimenet                  |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
dinov2-giant-gpus        |    1 |   1 |    8 |        nan |   nan% |   nan% |         nan |        nan |   1.00
dinov2-giant-single      |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
dqn                      |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
bf16                     |    0 |   8 |    1 |     147.62 |   0.4% |   9.9% |        1172 |    1192.48 |   0.00
fp16                     |    0 |   8 |    1 |     149.77 |   0.4% |   9.7% |        1172 |    1209.76 |   0.00
fp32                     |    0 |   8 |    1 |      38.73 |   0.4% |   9.7% |        4916 |     312.85 |   0.00
tf32                     |    0 |   8 |    1 |      38.74 |   0.4% |   9.7% |        1556 |     312.92 |   0.00
bert-fp16                |    0 |   8 |    1 |      48.56 |   2.6% |  40.6% |       52254 |     390.44 |   0.00
bert-fp32                |    0 |   8 |    1 |      28.96 |   2.2% |  33.9% |       57366 |     234.77 |   0.00
bert-tf32                |    0 |   8 |    1 |      29.49 |   2.3% |  36.0% |       57367 |     237.96 |   0.00
bert-tf32-fp16           |    0 |   8 |    1 |      57.64 |   3.2% |  50.5% |       52252 |     465.45 |   3.00
reformer                 |    0 |   8 |    1 |      27.81 |   0.5% |  10.9% |       24937 |     225.42 |   1.00
t5                       |    1 |   8 |    1 |      29.46 |   0.6% |  12.0% |       65487 |     182.92 |   2.00
whisper                  |    0 |   8 |    1 |     245.72 |   0.5% |  12.0% |        9979 |    1993.69 |   1.00
lightning                |    0 |   8 |    1 |     520.91 |   0.3% |   9.8% |       27272 |    4202.85 |   1.00
lightning-gpus           |    0 |   1 |    8 |    4061.71 |   0.3% |   3.2% |       27698 |    4061.71 |   1.00
llava-single             |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
llama                    |    0 |   8 |    1 |     249.82 |   5.0% |  90.0% |       33956 |    1882.89 |   1.00
llm-full-mp-gpus         |    1 |   1 |    8 |        nan |   nan% |   nan% |         nan |        nan |   1.00
llm-lora-ddp-gpus        |    0 |   1 |    8 |   10680.84 |   0.7% |   3.8% |       63926 |   10680.84 |   1.00
llm-lora-mp-gpus         |    1 |   1 |    8 |        nan |   nan% |   nan% |         nan |        nan |   1.00
llm-lora-single          |    0 |   8 |    1 |    1975.04 |   0.7% |  11.1% |       62298 |   16097.93 |   1.00
pna                      |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
ppo                      |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
recursiongfn             |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00
rlhf-gpus                |    0 |   1 |    8 |   14157.26 |   0.3% |   1.8% |       25877 |   14157.26 |   1.00
rlhf-single              |    0 |   8 |    1 |    1886.36 |   0.5% |  10.7% |       29055 |   15255.09 |   1.00
torchatari               |    0 |   8 |    1 |    2180.08 |   0.4% |   9.6% |        3162 |   17616.78 |   1.00
convnext_large-fp16      |    0 |   8 |    1 |      85.51 |   1.0% |  16.1% |       27648 |     702.94 |   0.00
convnext_large-fp32      |    8 |   8 |    1 |        nan |   nan% |   nan% |       53705 |        nan |   0.00
convnext_large-tf32      |    8 |   8 |    1 |        nan |   nan% |   nan% |       43726 |        nan |   0.00
convnext_large-tf32-fp16 |    0 |   8 |    1 |      85.48 |   1.0% |  16.1% |       39027 |     702.69 |   3.00
regnet_y_128gf           |    0 |   8 |    1 |      23.67 |   2.2% |  48.3% |       65404 |     190.05 |   2.00
resnet152-ddp-gpus       |    0 |   1 |    8 |    3343.14 |   0.0% |   0.2% |       36638 |    3343.14 |   0.00
resnet50                 |    0 |   8 |    1 |     938.07 |   2.0% |  43.8% |       14612 |    7585.17 |   1.00
resnet50-noio            |    7 |   8 |    1 |     808.70 |   0.6% |  18.6% |       65509 |     101.09 |   0.00
vjepa-gpus               |    1 |   1 |    8 |        nan |   nan% |   nan% |         nan |        nan |   1.00
vjepa-single             |    8 |   8 |    1 |        nan |   nan% |   nan% |         nan |        nan |   1.00

Questions? Reach out to [email protected].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages