Consider switch to Chairmarks.jl #35

MilesCranmer · 2024-03-13T17:49:54Z

Chairmarks.jl (discourse thread here) from @LilithHafner is much faster than BenchmarkTools.jl. With BenchmarkTools.jl, it can definitely take a while to run an entire benchmark suite, especially if you need to use benchmark tuning, so I am very curious about whether we could try to switch AirspeedVelocity.jl to use this!

It would be really nice to get ultra-fast benchmarks for an entire package's suite.

cc @Zentrik @Krastanov in case of interest.

Zentrik · 2024-03-13T17:52:14Z

I haven't played around with ChairMarks too much but setting gctrial to false should reduce start up time for BenchmarkTools ~300ms.

MilesCranmer · 2024-03-13T17:58:18Z

Cross-posting thread on Chairmarks.jl for translating benchmarks: LilithHafner/Chairmarks.jl#70

I haven't played around with ChairMarks too much but setting gctrial to false should reduce start up time for BenchmarkTools ~300ms.

Interesting. Would this hurt accuracy at all though?

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

Zentrik · 2024-03-13T21:58:43Z

Would this hurt accuracy at all though?

I suspect it might, but if the benchmark is 20 minutes long that seems unlikely to matter. Chairmarks essentially run with gctrial=false anyways.

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

I don't know how well BenchmarkTools and Chairmarks perform on 20 minute workloads but at that point aren't you only running your benchmark function once, I wouldn't think either would have much overhead and you could use @time at that point.

LilithHafner · 2024-03-19T16:55:54Z

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

Just a guess here: a perhaps that 20 minute run includes multiple benchmarks, each of which has its own overhead. If on the other hand, it is actually a single 20 minute workload...

I don't know how well BenchmarkTools and Chairmarks perform on 20 minute workloads but at that point aren't you only running the your benchmark function once, I wouldn't think either would have much overhead and you could use @time at that point.

Chairmarks and @time are pretty much equivalent at those timescales, though BenchmarkTools will run thrice (or twice if evals is specified).

julia> @time @time sleep(10)
 10.011087 seconds (4 allocations: 112 bytes)
 10.019674 seconds (444 allocations: 19.547 KiB, 0.08% compilation time)

julia> @time @b sleep(10)
[ Info: Loading Chairmarks ...
 10.037857 seconds (114.04 k allocations: 5.883 MiB, 0.23% compilation time)
10.011 s (4 allocs: 112 bytes, without a warmup)

julia> @time @btime sleep(10)
[ Info: Loading BenchmarkTools ...
  10.011 s (4 allocations: 112 bytes)
 30.654176 seconds (1.90 M allocations: 96.328 MiB, 1.14% gc time, 0.92% compilation time)

julia> @time @btime sleep(10) evals=1
  10.011 s (4 allocations: 112 bytes)
 20.205827 seconds (32.59 k allocations: 1.615 MiB, 0.82% gc time, 0.07% compilation time)

asinghvi17 · 2024-04-11T21:57:17Z

In theory all this would take is a package extension on Chairmarks, which removes run! from whichever code extracts SUITE, and extracts statistics slightly differently...

LilithHafner · 2025-01-09T14:36:36Z

IMO switching to Chairmarks (and dropping support for BenchmarkTools) would be bad. Supporting Chairmarks would be lovely, though.

LilithHafner · 2025-01-10T13:02:13Z

@MilesCranmer, what properties does AirspeedVelocity currently require SUITE to have?

e.g. perhaps something around BenchmarkTools.tune!(SUITE) and JSON3.json(Base.run(SUITE))?

I'm asking because I'd like to figure out how to create a fully functional, compatible object using Chairmarks

MilesCranmer · 2025-01-10T15:50:00Z

It just expects that SUITE is a BenchmarkGroup that it can call tune, run, etc., on. The relevant code is here:

AirspeedVelocity.jl/src/Utils.jl

Lines 137 to 235 in 493a0e0

    
           using BenchmarkTools: @benchmarkable, run, tune!, BenchmarkGroup 
        
           using JSON3: JSON3 
        
           using Pkg: Pkg 
        
           cd($cur_dir) 
        
           # Include benchmark, defining SUITE: 
        
           @info "    [runner] Loading benchmark script: " * $script * "." 
        
           cur_project = Pkg.project().path 
        
           #! format: off 
        
           const _airspeed_velocity_extra_suite = BenchmarkGroup() 
        
           if isempty($filter_benchmarks) || any(x -> contains(x, "time_to_load"), $filter_benchmarks) 
        
               _airspeed_velocity_extra_suite["time_to_load"] = @benchmarkable( 
        
                   @eval(using $(Symbol(spec.name)): $(Symbol(spec.name)) as _AirspeedVelocityTestImport), 
        
                   evals=1, 
        
                   samples=1, 
        
               ) 
        
           end 
        
           const _airspeed_velocity_extra_results = run(_airspeed_velocity_extra_suite) 
        
           #! format: on 
        
           # Safely include, via module: 
        
           module AirspeedVelocityRunner 
        
               import $(Symbol(spec.name)): $(Symbol(spec.name)) as _AirspeedVelocityTestImport2 
        
               import TOML: parsefile as toml_parsefile 
        
               const PACKAGE_VERSION = let 
        
                   try 
        
                       project = toml_parsefile( 
        
                           joinpath(pkgdir(_AirspeedVelocityTestImport2), "Project.toml"), 
        
                       ) 
        
                       VersionNumber(project["version"]) 
        
                   catch 
        
                       @warn "Failed to create `PACKAGE_VERSION`" 
        
                       VersionNumber("0.0.0") 
        
                   end 
        
               end 
        
               # Included benchmark script: 
        
               include($script) 
        
           end 
        
           using .AirspeedVelocityRunner: AirspeedVelocityRunner 
        
           # Assert that SUITE is defined: 
        
           if !isdefined(AirspeedVelocityRunner, :SUITE) 
        
               @error "    [runner] Benchmark script " * $script * " did not define SUITE." 
        
           end 
        
           const filtered_suite = AirspeedVelocityRunner.SUITE 
        
           if !(typeof(filtered_suite) <: BenchmarkGroup) 
        
               @error "    [runner] Benchmark script " * 
        
                   $script * 
        
                   " did not define SUITE as a BenchmarkGroup." 
        
           end 
        
           # Assert that `include` did not change environments: 
        
           if Pkg.project().path != cur_project 
        
               @error "    [runner] Benchmark script " * 
        
                   $script * 
        
                   " changed the active environment. " * 
        
                   "This is not allowed, as it will " * 
        
                   "cause the benchmark to produce incorrect results." 
        
           end 
        
           # Recursively filter the suite to `filter_benchmarks`: 
        
           function filter_suite!(f, suite, prefix="") 
        
               filter!(suite.data) do (k, v) 
        
                   name = prefix * "/" * string(k) 
        
                   if v isa BenchmarkGroup 
        
                       filter_suite!(f, v, name) 
        
                   else 
        
                       f(name) 
        
                   end 
        
               end 
        
               return !isempty(suite.data) 
        
           end 
        
           if !isempty($filter_benchmarks) 
        
               filter_suite!( 
        
                   name -> any(x -> contains(name, x), $filter_benchmarks), filtered_suite 
        
               ) 
        
           end 
        
           if $tune 
        
               @info "    [runner] Tuning benchmarks." 
        
               tune!(filtered_suite) 
        
           end 
        
           @info "    [runner] Running benchmarks for " * $spec_str * "." 
        
           @info "-"^80 
        
           results = run(filtered_suite; verbose=true) 
        
           @info "-"^80 
        
           @info "    [runner] Finished benchmarks for " * $spec_str * "." 
        
           # Combine extra results: 
        
           for (k, v) in _airspeed_velocity_extra_results.data 
        
               results.data[k] = v 
        
           end 
        
           open($results_filename * ".tmp", "w") do io 
        
               JSON3.write(io, results) 
        
           end 
        
           @info "    [runner] Benchmark results saved at " * $results_filename

LilithHafner · 2025-01-10T17:30:55Z

Would this be compatible?

julia> struct Runnable{F}
           f::F
       end

julia> Base.run(r::Runnable; kwargs...) = r.f()

julia> BenchmarkTools.tune!(::Runnable; kwargs...) = nothing

julia> macro chairmarks_benchmarkable(args...)
           :(Runnable(() -> @be $(args...)))
       end
@chairmarks_benchmarkable (macro with 1 method)

julia> SUITE = BenchmarkGroup()
0-element BenchmarkTools.BenchmarkGroup:
  tags: []

julia> SUITE["rand"] = @chairmarks_benchmarkable rand()
Runnable{var"#29#31"}(var"#29#31"())

julia> SUITE["sleep"] = @chairmarks_benchmarkable sleep(.01)
Runnable{var"#33#35"}(var"#33#35"())

julia> SUITE["rand BenchmarkTools"] = @benchmarkable rand()
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> tune!(SUITE)
3-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "sleep" => Runnable{var"#33#35"}(var"#33#35"())
  "rand BenchmarkTools" => Benchmark(evals=1000, seconds=5.0, samples=10000)
  "rand" => Runnable{var"#29#31"}(var"#29#31"())

julia> run(SUITE);

LilithHafner · 2025-01-10T19:34:13Z

No, it hits

...
[ Info:     Finished.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
ERROR: LoadError: MethodError: no method matching _flatten_results!(::OrderedCollections.OrderedDict{String, Any}, ::Vector{Any}, ::String)
The function `_flatten_results!` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  _flatten_results!(::OrderedCollections.OrderedDict, ::Dict{String, Any}, ::Any)
   @ AirspeedVelocity ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:486

Stacktrace:
  [1] _flatten_results!(d::OrderedCollections.OrderedDict{String, Any}, results::Dict{String, Any}, prefix::String)
    @ AirspeedVelocity.Utils ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:496
  [2] flatten_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:505 [inlined]
  [3] load_results(specs::Vector{Pkg.Types.PackageSpec}; input_dir::String)
    @ AirspeedVelocity.Utils ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:537
  [4] load_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:527 [inlined]
  [5] #load_results#28
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:557 [inlined]
  [6] load_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:555 [inlined]
  [7] benchpkg(package_name::String; rev::String, output_dir::String, script::String, exeflags::String, add::String, tune::Bool, url::String, path::String, bench_on::String, filter::String, nsamples_load_time::Int64, dont_print::Bool)
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/AirspeedVelocity/8uSQr/src/BenchPkg.jl:110
  [8] benchpkg
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/BenchPkg.jl:53 [inlined]
  [9] command_main(ARGS::Vector{String})
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/Comonicon/F3QqZ/src/codegen/julia.jl:343
 [10] command_main()
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/Comonicon/F3QqZ/src/codegen/julia.jl:90
 [11] top-level scope
    @ ~/.julia/bin/benchpkg:14
in expression starting at /home/x/.julia/bin/benchpkg:14

I suspect that the leafs of the run(SUITE) must, when written and then read back by JSON3, be a dict with the key "times" which is a vector of numbers. LilithHafner/Chairmarks.jl#160 could provide this.

LilithHafner · 2025-01-10T20:13:39Z

Huzzah! I got this working. With this benchmark.jl

using BenchmarkTools, Chairmarks

struct Runnable{F}
    f::F
end
Base.run(r::Runnable; kwargs...) = r.f()
BenchmarkTools.tune!(::Runnable; kwargs...) = nothing
macro chairmarks_benchmarkable(args...)
    :(Runnable(()->@be $(args...)))
end

SUITE = BenchmarkGroup()

SUITE["BenchmarkTools"] = @benchmarkable rand(1:100, 100000)
SUITE["Chairmarks"] = @chairmarks_benchmarkable rand(1:100, 100000)

and this diff to AirspeedVelocity.jl

diff --git a/src/Utils.jl b/src/Utils.jl
index 8a812d1..70a2e3f 100644
--- a/src/Utils.jl
+++ b/src/Utils.jl
@@ -486,6 +486,14 @@ end
 function _flatten_results!(d::OrderedDict, results::Dict{String,Any}, prefix)
     if "times" in keys(results)
         d[prefix] = compute_summary_statistics(results)
+    elseif "samples" in keys(results) # This branch allows for Chairmarks.jl compatibility
+        samples = results["samples"]
+        results′ = Dict(
+            "times" => 1e9getindex.(samples, "time"),
+            "memory" => mean(getindex.(samples, "bytes")),
+            "allocs" => mean(getindex.(samples, "allocs"))
+        )
+        d[prefix] = compute_summary_statistics(results′)
     elseif "data" in keys(results)
         for (key, value) in results["data"]
             next_prefix = if length(prefix) == 0

I got these results

click to expand

shell> ../../bin/benchpkg --rev dirty,main
[ Info: Downloading package's latest benchmark script, assuming it is in benchmark/benchmarks.jl
Precompiling DeleteMe...
  1 dependency successfully precompiled in 1 seconds. 60 already precompiled.
[ Info: Found benchmark script at /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info: Running benchmarks for DeleteMe@dirty:
[ Info:     Creating temporary environment at /tmp/jl_ycutvq.
[ Info:     Adding packages.
Precompiling project...
  1 dependency successfully precompiled in 1 seconds. 61 already precompiled.
[ Info:     Launching benchmark runner.
[ Info:     [runner] Loading benchmark script: /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info:     [runner] Running benchmarks for DeleteMe@dirty.
[ Info: --------------------------------------------------------------------------------
(1/2) benchmarking "BenchmarkTools"...
done (took 3.387454053 seconds)
(2/2) benchmarking "Chairmarks"...
done (took 0.142918835 seconds)
[ Info: --------------------------------------------------------------------------------
[ Info:     [runner] Finished benchmarks for DeleteMe@dirty.
[ Info:     [runner] Benchmark results saved at /home/x/.julia/dev/DeleteMe/[email protected]
[ Info:     Benchmark runner exited.
[ Info:     Reading results.
[ Info:     Running additional time-to-load tests.
[ Info:     Running time-to-load test 2/nsamples_load_time.
[ Info:     Running time-to-load test 3/nsamples_load_time.
[ Info:     Running time-to-load test 4/nsamples_load_time.
[ Info:     Running time-to-load test 5/nsamples_load_time.
[ Info:     Done time-to-load tests.
[ Info:     Finished.
[ Info: Running benchmarks for DeleteMe@main:
[ Info:     Creating temporary environment at /tmp/jl_c2gp66.
[ Info:     Adding packages.
[ Info:     Launching benchmark runner.
[ Info:     [runner] Loading benchmark script: /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info:     [runner] Running benchmarks for DeleteMe@main.
[ Info: --------------------------------------------------------------------------------
(1/2) benchmarking "BenchmarkTools"...
done (took 3.395862998 seconds)
(2/2) benchmarking "Chairmarks"...
done (took 0.143114087 seconds)
[ Info: --------------------------------------------------------------------------------
[ Info:     [runner] Finished benchmarks for DeleteMe@main.
[ Info:     [runner] Benchmark results saved at /home/x/.julia/dev/DeleteMe/[email protected]
[ Info:     Benchmark runner exited.
[ Info:     Reading results.
[ Info:     Running additional time-to-load tests.
[ Info:     Running time-to-load test 2/nsamples_load_time.
[ Info:     Running time-to-load test 3/nsamples_load_time.
[ Info:     Running time-to-load test 4/nsamples_load_time.
[ Info:     Running time-to-load test 5/nsamples_load_time.
[ Info:     Done time-to-load tests.
[ Info:     Finished.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
|                | dirty              | main             | dirty/main |
|:---------------|:------------------:|:----------------:|:----------:|
| BenchmarkTools | 0.311 ± 0.004 ms   | 0.311 ± 0.004 ms | 1          |
| Chairmarks     | 0.311 ± 0.0097 ms  | 0.31 ± 0.0066 ms | 1          |
| time_to_load   | 0.0353 ± 0.00039 s | 0.035 ± 0.001 s  | 1.01       |

MilesCranmer · 2025-01-10T20:40:36Z

Nice!!

MilesCranmer · 2025-01-29T11:16:12Z

How achievable do you think it would be to have a --backend=Chairmarks flag that simply switches from one to the other?

LilithHafner · 2025-01-29T19:22:41Z

Very achievable. This is already possible with the flag --add https://github.com/lilithhafner/ChairmarksForAirspeedVelocity.jl and using ChairmarksForAirspeedVelocity instead of using BenchmarkTools or using Chairmarks in benchmark/benchmarks.jl. See https://github.com/LilithHafner/DynamicDiscreteSamplers.jl/tree/27ac23cd18c9dab2b2fcfca6e67240fc86acf86d for an example of this in action.

However, I don't think we need to do that. In an ideal world, it would be possible to use whatever backend you want without even needing a command line flag.

When we talk about backend there are two parts I think of

The thing actually running benchmarks (e.g. run(BenchmarkTools.@benchmarkable ...) or Chairmarks.@be ...)
The aggregator datastructure that benchmarks and results are stored in (e.g. BenchmarkTools.BenchmarkGroup)

BenchmarkTools provides 1 & 2.

Chairmarks provides an alternative for 1, ChairmarksForAirspeedVelocity provides 1 from Chairmarks and re-exports BenchmarkGroup to provide 2.

Implementing #73 well should enable support for any backend in the first category.

BenchmarkTools.BenchmarkGroup is a reasonable aggregator and right now there are no alternatives that come to mind, so supporting alternative backends for aggregators doesn't seem pressing for now.

MilesCranmer mentioned this issue Mar 13, 2024

PSA: It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks LilithHafner/Chairmarks.jl#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider switch to Chairmarks.jl #35

Consider switch to Chairmarks.jl #35

MilesCranmer commented Mar 13, 2024 •

edited

Loading

Zentrik commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

Zentrik commented Mar 13, 2024 •

edited

Loading

LilithHafner commented Mar 19, 2024

asinghvi17 commented Apr 11, 2024

LilithHafner commented Jan 9, 2025

LilithHafner commented Jan 10, 2025

MilesCranmer commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

MilesCranmer commented Jan 10, 2025

MilesCranmer commented Jan 29, 2025

LilithHafner commented Jan 29, 2025

Consider switch to Chairmarks.jl #35

Consider switch to Chairmarks.jl #35

Comments

MilesCranmer commented Mar 13, 2024 • edited Loading

Zentrik commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

Zentrik commented Mar 13, 2024 • edited Loading

LilithHafner commented Mar 19, 2024

asinghvi17 commented Apr 11, 2024

LilithHafner commented Jan 9, 2025

LilithHafner commented Jan 10, 2025

MilesCranmer commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

LilithHafner commented Jan 10, 2025

MilesCranmer commented Jan 10, 2025

MilesCranmer commented Jan 29, 2025

LilithHafner commented Jan 29, 2025

MilesCranmer commented Mar 13, 2024 •

edited

Loading

Zentrik commented Mar 13, 2024 •

edited

Loading