Skip to content

Commit

Permalink
debug: traces for compute engine
Browse files Browse the repository at this point in the history
This shows strace for the inside and outside of
the container.

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Sep 9, 2024
1 parent 93cb53b commit 24a1c01
Show file tree
Hide file tree
Showing 6 changed files with 174,965 additions and 30 deletions.
35 changes: 35 additions & 0 deletions experiments/google/compute-engine/gpu/debug/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Debugging

This is looking at running the osu all_reduce benchmark with flux (as the flux user) and with Singularity.
I did this in two ways:

- `strace -f`
- `strace -f -s 128`

And from the outside and within the container.

## Traces

### [flux-singularity-trace-f.txt](flux-singularity-trace-f.txt)

```bash
strace -f flux run -opmi=pmix --env OMPI_COMM_WORLD_LOCAL_RANK=0 -N 1 -n 8 -g 1 -o cpu-affinity=per-task -o gpu-affinity=per-task singularity exec --nv --bind /usr/local/cuda /opt/containers/metric-osu-gpu_google-gpu.sif /bin/bash -c "/opt/osu-benchmark/build.openmpi/mpi/collective/osu_allreduce -d cuda H H" 2> flux-singularity-trace-f.txt
```

### [flux-singularity-trace-s-f.txt](flux-singularity-trace-s-f.txt)

```bash
strace -f -s 128 flux run -opmi=pmix --env OMPI_COMM_WORLD_LOCAL_RANK=0 -N 1 -n 8 -g 1 -o cpu-affinity=per-task -o gpu-affinity=per-task singularity exec --nv --bind /usr/local/cuda /opt/containers/metric-osu-gpu_google-gpu.sif /bin/bash -c "/opt/osu-benchmark/build.openmpi/mpi/collective/osu_allreduce -d cuda H H" 2> flux-singularity-trace-s-f.txt
```

### [flux-singularity-inside-container-trace-f.txt](flux-singularity-inside-container-trace-f.txt)

```bash
flux run -opmi=pmix --env OMPI_COMM_WORLD_LOCAL_RANK=0 -N 1 -n 8 -g 1 -o cpu-affinity=per-task -o gpu-affinity=per-task singularity exec --nv --bind /usr/local/cuda /opt/containers/metric-osu-gpu_google-gpu.sif /bin/bash -c "strace -f /opt/osu-benchmark/build.openmpi/mpi/collective/osu_allreduce -d cuda H H" 2> flux-singularity-inside-container-trace-f.txt
```

### [flux-singularity-inside-container-trace-s-f.txt](flux-singularity-inside-container-trace-s-f.txt)

```bash
flux run -opmi=pmix --env OMPI_COMM_WORLD_LOCAL_RANK=0 -N 1 -n 8 -g 1 -o cpu-affinity=per-task -o gpu-affinity=per-task singularity exec --nv --bind /usr/local/cuda /opt/containers/metric-osu-gpu_google-gpu.sif /bin/bash -c "strace -f -s 128 /opt/osu-benchmark/build.openmpi/mpi/collective/osu_allreduce -d cuda H H" 2> flux-singularity-inside-container-trace-s-f.txt
```
Loading

0 comments on commit 24a1c01

Please sign in to comment.