Skip to content

Latest commit

 

History

History
226 lines (186 loc) · 12.8 KB

README.md

File metadata and controls

226 lines (186 loc) · 12.8 KB

The Supermarket Fish Problem

This is part of the Performance study and the single-node-benchmark analysis. The analyses afforded generation of a lot of intermediate data and a web interface, and were moved here for better organization.

DOI

Have you ever been to the supermarket and ordered white fish? You may be getting tilapia, flounder, branzino, catfish, cod, haddock, hake, halibut, pollock, sea bass, sole, or whiting. The same is true for cloud CPU architectures. You may know that you are getting some flavor of Intel, but it's unclear if it's Skylake, Icelake, Sandy Bridge, or some other flavor. We did a large performance study in August 2024 that looked across many different environments, clouds, and instance types, and can now reflect on what we found. In the case of finding a potpourri of architectures, we call this the supermarket fish problem.

Under development data processing is underway - a table will be added to each view!

TODO:

  • summary data file for each output file
  • table that summarizes each environment (with counts)
  • flags and bugs should have some kind of venn diagram that crosses spaces
  • sysbench metrics should be plots (not tables)
  • cpuinfo -> cpu MHz and bogomips also needs plots (values are all over the place)
  • Not sure if this is interesting, but data/azure/cyclecloud/cpu/256/node-0/raw/dmidecode has Core Enabled for each of 32 and 64.

Generate

Make some pngs (they render better in react):

for filename in $(find . -name machine.svg)
  do
    echo $filename
    directory=$(dirname $filename)
    outpng="$directory/machine.png"
    echo inkscape $filename -o $outpng
    inkscape $filename -o $outpng
done

To generate data for the gallery:

python 1-generate-gallery.py

Note that I did manually add the index.html/script.js to each directory, and tweaked them (titles, dimensions) for each. This generates the table (requires pip install pandas):

python 2-generate-table.py

Again, I copy pasted the same table snippet into the UI that would read the data generated by the script.

Results

Here are some one off result images:

CPU Clock Speed

We can see that there is a hidden supermarket problem for AWS and clock speed. When a group doesn't show up (e.g., Google and Azure for many) it's because the values are all the same. I think these are the lines we see in the graph without color - they are histograms for one value.

Clock Speed CPU Size 32

web/img/clock-speeds-cpu-size-32.png

Clock Speed CPU Size 64

web/img/clock-speeds-cpu-size-64.png

Clock Speed CPU Size 128

web/img/clock-speeds-cpu-size-128.png

Clock Speed CPU Size 256

web/img/clock-speeds-cpu-size-256.png

Clock Speed GPU

web/img/clock-speeds-gpu.png

Max and Current Speeds

CPU Speeds

CPU Size: 32
  Max speed: 2000.0 for google-gke-cpu
  Max speed: 3725.0 for aws-eks-cpu
  Max speed: 3725.0 for aws-parallel-cluster-cpu
  Max speed: 3525.0 for azure-cyclecloud-cpu
  Max speed: 3525.0 for azure-aks-cpu
CPU Size: 64
  Max speed: 2000.0 for google-gke-cpu
  Max speed: 3725.0 for aws-eks-cpu
  Max speed: 3725.0 for aws-parallel-cluster-cpu
  Max speed: 3525.0 for azure-cyclecloud-cpu
  Max speed: 3525.0 for azure-aks-cpu
CPU Size: 128
  Max speed: 2000.0 for google-gke-cpu
  Max speed: 3725.0 for aws-eks-cpu
  Max speed: 3525.0 for azure-cyclecloud-cpu
  Max speed: 3525.0 for azure-aks-cpu
CPU Size: 256
  Max speed: 2000.0 for google-gke-cpu
  Max speed: 3725.0 for aws-eks-cpu
  Max speed: 3525.0 for azure-cyclecloud-cpu
  Max speed: 3525.0 for azure-aks-cpu
CPU Size: 32
  Current speed: 2000.0 for google-gke-cpu
  Current speed: 2650.0 for aws-eks-cpu
  Current speed: 2650.0 for aws-parallel-cluster-cpu
  Current speed: 1850.0 for azure-cyclecloud-cpu
  Current speed: 1850.0 for azure-aks-cpu
CPU Size: 64
  Current speed: 2000.0 for google-gke-cpu
  Current speed: 2650.0 for aws-eks-cpu
  Current speed: 2650.0 for aws-parallel-cluster-cpu
  Current speed: 1850.0 for azure-cyclecloud-cpu
  Current speed: 1850.0 for azure-aks-cpu
CPU Size: 128
  Current speed: 2000.0 for google-gke-cpu
  Current speed: 2650.0 for aws-eks-cpu
  Current speed: 1850.0 for azure-cyclecloud-cpu
  Current speed: 1850.0 for azure-aks-cpu
CPU Size: 256
  Current speed: 2000.0 for google-gke-cpu
  Current speed: 2650.0 for aws-eks-cpu
  Current speed: 1850.0 for azure-cyclecloud-cpu
  Current speed: 1850.0 for azure-aks-cpu

GPU Speeds

GPU Size: 4
  Current speed: 2000.0 for google-gke-gpu
  Current speed: 2000.0 for google-compute-engine-gpu
  Current speed: 3700.0 for azure-cyclecloud-gpu
  Current speed: 3700.0 for azure-aks-gpu
GPU Size: 8
  Current speed: 2000.0 for google-gke-gpu
  Current speed: 2000.0 for google-compute-engine-gpu
  Current speed: 3500.0 for aws-eks-gpu
  Current speed: 3700.0 for azure-cyclecloud-gpu
  Current speed: 3700.0 for azure-aks-gpu
GPU Size: 16
  Current speed: 2000.0 for google-gke-gpu
  Current speed: 2000.0 for google-compute-engine-gpu
  Current speed: 3500.0 for aws-eks-gpu
  Current speed: 3700.0 for azure-cyclecloud-gpu
GPU Size: 32
  Current speed: 2000.0 for google-gke-gpu
  Current speed: 2000.0 for google-compute-engine-gpu
  Current speed: 3700.0 for azure-cyclecloud-gpu
  Current speed: 3700.0 2300.0 for azure-aks-gpu

Sysbench Plots

web/img/sysbench-cpu-run-cpu_speed_events_per_second-cpu.png web/img/sysbench-cpu-run-cpu_speed_events_per_second-gpu.png web/img/sysbench-cpu-run-latency_ms_95th_percentile-cpu.png web/img/sysbench-cpu-run-latency_ms_95th_percentile-gpu.png web/img/sysbench-cpu-run-latency_ms_avg-cpu.png web/img/sysbench-cpu-run-latency_ms_avg-gpu.png web/img/sysbench-cpu-run-latency_ms_max-cpu.png web/img/sysbench-cpu-run-latency_ms_max-gpu.png web/img/sysbench-cpu-run-latency_ms_min-cpu.png web/img/sysbench-cpu-run-latency_ms_min-gpu.png web/img/sysbench-cpu-run-latency_ms_sum-cpu.png web/img/sysbench-cpu-run-latency_ms_sum-gpu.png web/img/sysbench-cpu-run-total_number_events-cpu.png web/img/sysbench-cpu-run-total_number_events-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-fsyncs_per_second-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-fsyncs_per_second-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_95th_percentile-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_95th_percentile-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_avg-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_avg-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_max-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_max-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_min-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_min-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_sum-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-latency_ms_sum-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-total_number_events-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-total_number_events-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-writes_per_second-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-writes_per_second-gpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-written_mib_per_second-cpu.png web/img/sysbench-fileio-run-file-test-modeseqwr-written_mib_per_second-gpu.png web/img/sysbench-mutex-run-latency_ms_95th_percentile-cpu.png web/img/sysbench-mutex-run-latency_ms_95th_percentile-gpu.png web/img/sysbench-mutex-run-latency_ms_avg-cpu.png web/img/sysbench-mutex-run-latency_ms_avg-gpu.png web/img/sysbench-mutex-run-latency_ms_max-cpu.png web/img/sysbench-mutex-run-latency_ms_max-gpu.png web/img/sysbench-mutex-run-latency_ms_min-cpu.png web/img/sysbench-mutex-run-latency_ms_min-gpu.png web/img/sysbench-mutex-run-latency_ms_sum-cpu.png web/img/sysbench-mutex-run-latency_ms_sum-gpu.png web/img/sysbench-threads-run-latency_ms_95th_percentile-cpu.png web/img/sysbench-threads-run-latency_ms_95th_percentile-gpu.png web/img/sysbench-threads-run-latency_ms_avg-cpu.png web/img/sysbench-threads-run-latency_ms_avg-gpu.png web/img/sysbench-threads-run-latency_ms_max-cpu.png web/img/sysbench-threads-run-latency_ms_max-gpu.png web/img/sysbench-threads-run-latency_ms_min-cpu.png web/img/sysbench-threads-run-latency_ms_min-gpu.png web/img/sysbench-threads-run-latency_ms_sum-cpu.png web/img/sysbench-threads-run-latency_ms_sum-gpu.png web/img/sysbench-threads-run-total_number_events-cpu.png web/img/sysbench-threads-run-total_number_events-gpu.png

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614