Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switch to Chairmarks.jl #35

Open
MilesCranmer opened this issue Mar 13, 2024 · 14 comments
Open

Consider switch to Chairmarks.jl #35

MilesCranmer opened this issue Mar 13, 2024 · 14 comments

Comments

@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 13, 2024

Chairmarks.jl (discourse thread here) from @LilithHafner is much faster than BenchmarkTools.jl. With BenchmarkTools.jl, it can definitely take a while to run an entire benchmark suite, especially if you need to use benchmark tuning, so I am very curious about whether we could try to switch AirspeedVelocity.jl to use this!

It would be really nice to get ultra-fast benchmarks for an entire package's suite.

cc @Zentrik @Krastanov in case of interest.

@Zentrik
Copy link
Contributor

Zentrik commented Mar 13, 2024

I haven't played around with ChairMarks too much but setting gctrial to false should reduce start up time for BenchmarkTools ~300ms.

@MilesCranmer
Copy link
Owner Author

Cross-posting thread on Chairmarks.jl for translating benchmarks: LilithHafner/Chairmarks.jl#70

I haven't played around with ChairMarks too much but setting gctrial to false should reduce start up time for BenchmarkTools ~300ms.

Interesting. Would this hurt accuracy at all though?

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

@Zentrik
Copy link
Contributor

Zentrik commented Mar 13, 2024

Would this hurt accuracy at all though?

I suspect it might, but if the benchmark is 20 minutes long that seems unlikely to matter. Chairmarks essentially run with gctrial=false anyways.

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

I don't know how well BenchmarkTools and Chairmarks perform on 20 minute workloads but at that point aren't you only running your benchmark function once, I wouldn't think either would have much overhead and you could use @time at that point.

@LilithHafner
Copy link
Contributor

Some benchmarks can take ~20 minutes to run with BenchmarkTools though so 300ms might not save much.

Just a guess here: a perhaps that 20 minute run includes multiple benchmarks, each of which has its own overhead. If on the other hand, it is actually a single 20 minute workload...

I don't know how well BenchmarkTools and Chairmarks perform on 20 minute workloads but at that point aren't you only running the your benchmark function once, I wouldn't think either would have much overhead and you could use @time at that point.

Chairmarks and @time are pretty much equivalent at those timescales, though BenchmarkTools will run thrice (or twice if evals is specified).

julia> @time @time sleep(10)
 10.011087 seconds (4 allocations: 112 bytes)
 10.019674 seconds (444 allocations: 19.547 KiB, 0.08% compilation time)

julia> @time @b sleep(10)
[ Info: Loading Chairmarks ...
 10.037857 seconds (114.04 k allocations: 5.883 MiB, 0.23% compilation time)
10.011 s (4 allocs: 112 bytes, without a warmup)

julia> @time @btime sleep(10)
[ Info: Loading BenchmarkTools ...
  10.011 s (4 allocations: 112 bytes)
 30.654176 seconds (1.90 M allocations: 96.328 MiB, 1.14% gc time, 0.92% compilation time)

julia> @time @btime sleep(10) evals=1
  10.011 s (4 allocations: 112 bytes)
 20.205827 seconds (32.59 k allocations: 1.615 MiB, 0.82% gc time, 0.07% compilation time)

@asinghvi17
Copy link

In theory all this would take is a package extension on Chairmarks, which removes run! from whichever code extracts SUITE, and extracts statistics slightly differently...

@LilithHafner
Copy link
Contributor

IMO switching to Chairmarks (and dropping support for BenchmarkTools) would be bad. Supporting Chairmarks would be lovely, though.

@LilithHafner
Copy link
Contributor

@MilesCranmer, what properties does AirspeedVelocity currently require SUITE to have?

e.g. perhaps something around BenchmarkTools.tune!(SUITE) and JSON3.json(Base.run(SUITE))?

I'm asking because I'd like to figure out how to create a fully functional, compatible object using Chairmarks

@MilesCranmer
Copy link
Owner Author

It just expects that SUITE is a BenchmarkGroup that it can call tune, run, etc., on. The relevant code is here:

using BenchmarkTools: @benchmarkable, run, tune!, BenchmarkGroup
using JSON3: JSON3
using Pkg: Pkg
cd($cur_dir)
# Include benchmark, defining SUITE:
@info " [runner] Loading benchmark script: " * $script * "."
cur_project = Pkg.project().path
#! format: off
const _airspeed_velocity_extra_suite = BenchmarkGroup()
if isempty($filter_benchmarks) || any(x -> contains(x, "time_to_load"), $filter_benchmarks)
_airspeed_velocity_extra_suite["time_to_load"] = @benchmarkable(
@eval(using $(Symbol(spec.name)): $(Symbol(spec.name)) as _AirspeedVelocityTestImport),
evals=1,
samples=1,
)
end
const _airspeed_velocity_extra_results = run(_airspeed_velocity_extra_suite)
#! format: on
# Safely include, via module:
module AirspeedVelocityRunner
import $(Symbol(spec.name)): $(Symbol(spec.name)) as _AirspeedVelocityTestImport2
import TOML: parsefile as toml_parsefile
const PACKAGE_VERSION = let
try
project = toml_parsefile(
joinpath(pkgdir(_AirspeedVelocityTestImport2), "Project.toml"),
)
VersionNumber(project["version"])
catch
@warn "Failed to create `PACKAGE_VERSION`"
VersionNumber("0.0.0")
end
end
# Included benchmark script:
include($script)
end
using .AirspeedVelocityRunner: AirspeedVelocityRunner
# Assert that SUITE is defined:
if !isdefined(AirspeedVelocityRunner, :SUITE)
@error " [runner] Benchmark script " * $script * " did not define SUITE."
end
const filtered_suite = AirspeedVelocityRunner.SUITE
if !(typeof(filtered_suite) <: BenchmarkGroup)
@error " [runner] Benchmark script " *
$script *
" did not define SUITE as a BenchmarkGroup."
end
# Assert that `include` did not change environments:
if Pkg.project().path != cur_project
@error " [runner] Benchmark script " *
$script *
" changed the active environment. " *
"This is not allowed, as it will " *
"cause the benchmark to produce incorrect results."
end
# Recursively filter the suite to `filter_benchmarks`:
function filter_suite!(f, suite, prefix="")
filter!(suite.data) do (k, v)
name = prefix * "/" * string(k)
if v isa BenchmarkGroup
filter_suite!(f, v, name)
else
f(name)
end
end
return !isempty(suite.data)
end
if !isempty($filter_benchmarks)
filter_suite!(
name -> any(x -> contains(name, x), $filter_benchmarks), filtered_suite
)
end
if $tune
@info " [runner] Tuning benchmarks."
tune!(filtered_suite)
end
@info " [runner] Running benchmarks for " * $spec_str * "."
@info "-"^80
results = run(filtered_suite; verbose=true)
@info "-"^80
@info " [runner] Finished benchmarks for " * $spec_str * "."
# Combine extra results:
for (k, v) in _airspeed_velocity_extra_results.data
results.data[k] = v
end
open($results_filename * ".tmp", "w") do io
JSON3.write(io, results)
end
@info " [runner] Benchmark results saved at " * $results_filename

@LilithHafner
Copy link
Contributor

Would this be compatible?

julia> struct Runnable{F}
           f::F
       end

julia> Base.run(r::Runnable; kwargs...) = r.f()

julia> BenchmarkTools.tune!(::Runnable; kwargs...) = nothing

julia> macro chairmarks_benchmarkable(args...)
           :(Runnable(() -> @be $(args...)))
       end
@chairmarks_benchmarkable (macro with 1 method)

julia> SUITE = BenchmarkGroup()
0-element BenchmarkTools.BenchmarkGroup:
  tags: []

julia> SUITE["rand"] = @chairmarks_benchmarkable rand()
Runnable{var"#29#31"}(var"#29#31"())

julia> SUITE["sleep"] = @chairmarks_benchmarkable sleep(.01)
Runnable{var"#33#35"}(var"#33#35"())

julia> SUITE["rand BenchmarkTools"] = @benchmarkable rand()
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> tune!(SUITE)
3-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "sleep" => Runnable{var"#33#35"}(var"#33#35"())
  "rand BenchmarkTools" => Benchmark(evals=1000, seconds=5.0, samples=10000)
  "rand" => Runnable{var"#29#31"}(var"#29#31"())

julia> run(SUITE);

@LilithHafner
Copy link
Contributor

No, it hits

...
[ Info:     Finished.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
ERROR: LoadError: MethodError: no method matching _flatten_results!(::OrderedCollections.OrderedDict{String, Any}, ::Vector{Any}, ::String)
The function `_flatten_results!` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  _flatten_results!(::OrderedCollections.OrderedDict, ::Dict{String, Any}, ::Any)
   @ AirspeedVelocity ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:486

Stacktrace:
  [1] _flatten_results!(d::OrderedCollections.OrderedDict{String, Any}, results::Dict{String, Any}, prefix::String)
    @ AirspeedVelocity.Utils ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:496
  [2] flatten_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:505 [inlined]
  [3] load_results(specs::Vector{Pkg.Types.PackageSpec}; input_dir::String)
    @ AirspeedVelocity.Utils ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:537
  [4] load_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:527 [inlined]
  [5] #load_results#28
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:557 [inlined]
  [6] load_results
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/Utils.jl:555 [inlined]
  [7] benchpkg(package_name::String; rev::String, output_dir::String, script::String, exeflags::String, add::String, tune::Bool, url::String, path::String, bench_on::String, filter::String, nsamples_load_time::Int64, dont_print::Bool)
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/AirspeedVelocity/8uSQr/src/BenchPkg.jl:110
  [8] benchpkg
    @ ~/.julia/packages/AirspeedVelocity/8uSQr/src/BenchPkg.jl:53 [inlined]
  [9] command_main(ARGS::Vector{String})
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/Comonicon/F3QqZ/src/codegen/julia.jl:343
 [10] command_main()
    @ AirspeedVelocity.BenchPkg ~/.julia/packages/Comonicon/F3QqZ/src/codegen/julia.jl:90
 [11] top-level scope
    @ ~/.julia/bin/benchpkg:14
in expression starting at /home/x/.julia/bin/benchpkg:14

I suspect that the leafs of the run(SUITE) must, when written and then read back by JSON3, be a dict with the key "times" which is a vector of numbers. LilithHafner/Chairmarks.jl#160 could provide this.

@LilithHafner
Copy link
Contributor

Huzzah! I got this working. With this benchmark.jl

using BenchmarkTools, Chairmarks

struct Runnable{F}
    f::F
end
Base.run(r::Runnable; kwargs...) = r.f()
BenchmarkTools.tune!(::Runnable; kwargs...) = nothing
macro chairmarks_benchmarkable(args...)
    :(Runnable(()->@be $(args...)))
end

SUITE = BenchmarkGroup()

SUITE["BenchmarkTools"] = @benchmarkable rand(1:100, 100000)
SUITE["Chairmarks"] = @chairmarks_benchmarkable rand(1:100, 100000)

and this diff to AirspeedVelocity.jl

diff --git a/src/Utils.jl b/src/Utils.jl
index 8a812d1..70a2e3f 100644
--- a/src/Utils.jl
+++ b/src/Utils.jl
@@ -486,6 +486,14 @@ end
 function _flatten_results!(d::OrderedDict, results::Dict{String,Any}, prefix)
     if "times" in keys(results)
         d[prefix] = compute_summary_statistics(results)
+    elseif "samples" in keys(results) # This branch allows for Chairmarks.jl compatibility
+        samples = results["samples"]
+        results′ = Dict(
+            "times" => 1e9getindex.(samples, "time"),
+            "memory" => mean(getindex.(samples, "bytes")),
+            "allocs" => mean(getindex.(samples, "allocs"))
+        )
+        d[prefix] = compute_summary_statistics(results′)
     elseif "data" in keys(results)
         for (key, value) in results["data"]
             next_prefix = if length(prefix) == 0

I got these results

click to expand
shell> ../../bin/benchpkg --rev dirty,main
[ Info: Downloading package's latest benchmark script, assuming it is in benchmark/benchmarks.jl
Precompiling DeleteMe...
  1 dependency successfully precompiled in 1 seconds. 60 already precompiled.
[ Info: Found benchmark script at /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info: Running benchmarks for DeleteMe@dirty:
[ Info:     Creating temporary environment at /tmp/jl_ycutvq.
[ Info:     Adding packages.
Precompiling project...
  1 dependency successfully precompiled in 1 seconds. 61 already precompiled.
[ Info:     Launching benchmark runner.
[ Info:     [runner] Loading benchmark script: /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info:     [runner] Running benchmarks for DeleteMe@dirty.
[ Info: --------------------------------------------------------------------------------
(1/2) benchmarking "BenchmarkTools"...
done (took 3.387454053 seconds)
(2/2) benchmarking "Chairmarks"...
done (took 0.142918835 seconds)
[ Info: --------------------------------------------------------------------------------
[ Info:     [runner] Finished benchmarks for DeleteMe@dirty.
[ Info:     [runner] Benchmark results saved at /home/x/.julia/dev/DeleteMe/[email protected]
[ Info:     Benchmark runner exited.
[ Info:     Reading results.
[ Info:     Running additional time-to-load tests.
[ Info:     Running time-to-load test 2/nsamples_load_time.
[ Info:     Running time-to-load test 3/nsamples_load_time.
[ Info:     Running time-to-load test 4/nsamples_load_time.
[ Info:     Running time-to-load test 5/nsamples_load_time.
[ Info:     Done time-to-load tests.
[ Info:     Finished.
[ Info: Running benchmarks for DeleteMe@main:
[ Info:     Creating temporary environment at /tmp/jl_c2gp66.
[ Info:     Adding packages.
[ Info:     Launching benchmark runner.
[ Info:     [runner] Loading benchmark script: /home/x/.julia/packages/DeleteMe/J9aeM/benchmark/benchmarks.jl.
[ Info:     [runner] Running benchmarks for DeleteMe@main.
[ Info: --------------------------------------------------------------------------------
(1/2) benchmarking "BenchmarkTools"...
done (took 3.395862998 seconds)
(2/2) benchmarking "Chairmarks"...
done (took 0.143114087 seconds)
[ Info: --------------------------------------------------------------------------------
[ Info:     [runner] Finished benchmarks for DeleteMe@main.
[ Info:     [runner] Benchmark results saved at /home/x/.julia/dev/DeleteMe/[email protected]
[ Info:     Benchmark runner exited.
[ Info:     Reading results.
[ Info:     Running additional time-to-load tests.
[ Info:     Running time-to-load test 2/nsamples_load_time.
[ Info:     Running time-to-load test 3/nsamples_load_time.
[ Info:     Running time-to-load test 4/nsamples_load_time.
[ Info:     Running time-to-load test 5/nsamples_load_time.
[ Info:     Done time-to-load tests.
[ Info:     Finished.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
[ Info: Loading results from ./[email protected]
[ Info: Flattening results.
|                | dirty              | main             | dirty/main |
|:---------------|:------------------:|:----------------:|:----------:|
| BenchmarkTools | 0.311 ± 0.004 ms   | 0.311 ± 0.004 ms | 1          |
| Chairmarks     | 0.311 ± 0.0097 ms  | 0.31 ± 0.0066 ms | 1          |
| time_to_load   | 0.0353 ± 0.00039 s | 0.035 ± 0.001 s  | 1.01       |

@MilesCranmer
Copy link
Owner Author

Nice!!

@MilesCranmer
Copy link
Owner Author

How achievable do you think it would be to have a --backend=Chairmarks flag that simply switches from one to the other?

@LilithHafner
Copy link
Contributor

Very achievable. This is already possible with the flag --add https://github.com/lilithhafner/ChairmarksForAirspeedVelocity.jl and using ChairmarksForAirspeedVelocity instead of using BenchmarkTools or using Chairmarks in benchmark/benchmarks.jl. See https://github.com/LilithHafner/DynamicDiscreteSamplers.jl/tree/27ac23cd18c9dab2b2fcfca6e67240fc86acf86d for an example of this in action.


However, I don't think we need to do that. In an ideal world, it would be possible to use whatever backend you want without even needing a command line flag.

When we talk about backend there are two parts I think of

  1. The thing actually running benchmarks (e.g. run(BenchmarkTools.@benchmarkable ...) or Chairmarks.@be ...)
  2. The aggregator datastructure that benchmarks and results are stored in (e.g. BenchmarkTools.BenchmarkGroup)

BenchmarkTools provides 1 & 2.

Chairmarks provides an alternative for 1, ChairmarksForAirspeedVelocity provides 1 from Chairmarks and re-exports BenchmarkGroup to provide 2.

Implementing #73 well should enable support for any backend in the first category.

BenchmarkTools.BenchmarkGroup is a reasonable aggregator and right now there are no alternatives that come to mind, so supporting alternative backends for aggregators doesn't seem pressing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants