Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution order changes benchmark results #3766

Open
snnsnn opened this issue Dec 24, 2024 · 9 comments · May be fixed by #3777
Open

Execution order changes benchmark results #3766

snnsnn opened this issue Dec 24, 2024 · 9 comments · May be fixed by #3777

Comments

@snnsnn
Copy link

snnsnn commented Dec 24, 2024

Hi there! 👋

It appears that the execution order of items in the benchmark suite significantly impacts the reported outcomes. When running the same benchmark multiple times, the fastest library changes depending on the order in which the benchmarks are executed. Below are three separate runs of the handle-event benchmark, with varying results:

Run 1

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
worktop x 140,112 ops/sec ±3.25% (88 runs sampled)
custom-implementation x 365,763 ops/sec ±2.15% (88 runs sampled)
Hono x 317,173 ops/sec ±4.31% (83 runs sampled)
itty-router x 321,088 ops/sec ±11.34% (72 runs sampled)
Fastest is custom-implementation

Run 2

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
custom-implementation x 374,533 ops/sec ±1.89% (90 runs sampled)
worktop x 149,032 ops/sec ±1.58% (93 runs sampled)
Hono x 320,424 ops/sec ±4.13% (84 runs sampled)
itty-router x 307,371 ops/sec ±18.18% (72 runs sampled)
Fastest is custom-implementation

Run 3

$:hono/benchmarks/handle-event$ node index.mjs 
Hello hey
itty-router x 312,020 ops/sec ±17.62% (74 runs sampled)
custom-implementation x 105,017 ops/sec ±37.42% (36 runs sampled)
worktop x 16,306 ops/sec ±65.94% (30 runs sampled)
Hono x 82,493 ops/sec ±43.59% (34 runs sampled)
Fastest is itty-router

Libraries earlier in the execution order tend to perform better, while those executed later show significant performance degradation, likely due to resource constraints or runtime factors like garbage collection.

Could you consider implementing a solution to improve the accuracy and reliability of these benchmarks? It would ensure fair comparisons and provide more actionable performance insights.

Thank you!

@yusukebe
Copy link
Member

yusukebe commented Dec 24, 2024

Hi @snnsnn

I can't produce those behaviors on my machine(Apple M1 Pro/Mem: 32 GB). From fastest to slowest, they are Hono, sunder, itty-router, and worktop. The result is always the same even if I change the order of execution.

@EdamAme-x
Copy link
Contributor

I have previously asked the author of mitata about this issue.

It is possible for the results to vary depending on the order of execution.
However, that is in an environment with significantly less memory, and in an environment with sufficient memory, most of the overhead is shaved off by GC, etc., so the results rarely change depending on the order.

He said.

@snnsnn
Copy link
Author

snnsnn commented Dec 24, 2024

@yusukebe The runtime i used node v20.16.0 (x64-linux) with 16GB RAM, it is probably GC. The way I run the benchmarks is cd into benchmarks/handle-event folder and run pnpm run start after installing dependencies and modifying files. I removed sunder as it is not maintained and had module related issues.

It is probably GC related because I try to run the benchmark with deno v2.1.4, I ran out of memory:

<--- JS stacktrace --->
#
# Fatal JavaScript out of memory: Reached heap limit
#
==== C stack trace ===============================

    deno(+0x2d8b133) [0x5c91184c4133]
    deno(+0x2d8a9fb) [0x5c91184c39fb]
    deno(+0x2d85f18) [0x5c91184bef18]
    deno(+0x2ddc1ac) [0x5c91185151ac]
    deno(+0x2f90467) [0x5c91186c9467]
    deno(+0x2f8e613) [0x5c91186c7613]
    deno(+0x2f85211) [0x5c91186be211]
    deno(+0x2f85ce2) [0x5c91186bece2]
    deno(+0x2f6b55a) [0x5c91186a455a]
    deno(+0x33e0d22) [0x5c9118b19d22]
    deno(+0x426ad36) [0x5c91199a3d36]
Trace/breakpoint trap (core dumped)

With bun, the benchmark failed to execute due to error:

bun index.js 
64 |   }
65 | };
66 | var tryDecodeURI = (str) => tryDecode(str, decodeURI);
67 | var getPath = (request) => {
68 |   const url = request.url;
69 |   const start = url.indexOf("/", 8);
                     ^
TypeError: undefined is not an object (evaluating 'url.indexOf')
      at getPath (/home/user/hono/dist/utils/url.js:69:17)
      at #dispatch (/home/user/hono/dist/hono-base.js:170:18)
      at /home/user/hono/benchmarks/handle-event/index.js:131:19
      at fn (/home/user/hono/benchmarks/handle-event/index.js:130:12)
      at /home/user/hono/benchmarks/handle-event/index.js:143:1

Edit: Updating bun to v1.1.42 fixed the issue.

@yusukebe
Copy link
Member

I am unable to reproduce them. So I cannot do anything regarding this issue.

@snnsnn
Copy link
Author

snnsnn commented Dec 26, 2024

I have previously asked the author of mitata about this issue.

It is possible for the results to vary depending on the order of execution.
However, that is in an environment with significantly less memory, and in an environment with sufficient memory, most of the overhead is shaved off by GC, etc., so the results rarely change depending on the order.

He said.

The handle-event benchmarks uses benchmark, not mitata:

import Benchmark from 'benchmark'

@snnsnn
Copy link
Author

snnsnn commented Dec 26, 2024

@yusukebe Perhaps these tests could be migrated to the latest version of mitata, which provides configurations to mitigate external factors like garbage collection (GC): Mitata#garbage-collection-pressure.

Additionally, using more realistic test cases and increasing the number of tests to better reflect real-world scenarios might yield more reliable results. I’m not sure if using tsx impacts the benchmarks, but running the tests against compiled outputs might help eliminate any inconsistencies.

On the topic of realistic data, some benchmarks use GitHub API paths for testing. For instance, this Rust-based example could provide useful paths: Matchit Benchmark. Extracting similar paths could enhance the relevance of the tests.

At the moment, I’m on a tight schedule and don’t have in-depth knowledge of Hono’s routers. I came across these tests while searching for a fast router implementation in JavaScript. Please feel free to close the issue or keep it open in case someone else decides to tackle these improvements.

Thank you for the great work on Hono!

Best regards

@EdamAme-x EdamAme-x linked a pull request Dec 26, 2024 that will close this issue
@EdamAme-x
Copy link
Contributor

Hi @snnsnn
I sent PR about this
Can you check this?

@snnsnn
Copy link
Author

snnsnn commented Dec 26, 2024

Tests run successfully as PR'd, but the dependencies are not updated to their latest versions.

> [email protected] start
> bun --experimental-specifier-resolution=node index.js

Hello hey
Hello hey
clk: ~4.18 GHz
cpu: AMD Ryzen 7 5800H with Radeon Graphics
runtime: bun 1.1.42 (x64-linux)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
Hello hey
Hello hey
Hono                           2.54 µs/iter   2.50 µs   █▂                 
                      (1.67 µs … 639.07 µs)   6.04 µs ▁███▅▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁
itty-router                    3.13 µs/iter   3.13 µs ▄█                   
                        (2.88 µs … 4.24 µs)   4.17 µs ██▆█▇▂▄▁▂▁▂▂▁▂▁▁▂▁▁▁▂
sunder                         3.17 µs/iter   3.24 µs    ▄ █▄▂             
                        (2.95 µs … 3.61 µs)   3.59 µs ▇▇▃█▇███▃▇▅▃▅▁▁▃▁▁▁▃▅
worktop                        4.66 µs/iter   4.79 µs  ▂  ▂▂█      ▂       
                        (4.29 µs … 5.77 µs)   5.24 µs ▇█▄▁███▄▇▁▇▁▁█▄▄▁▁▁▁▄

Updating the competing libraries to their latest versions would make the comparisons more fair and accurate. However, updating them currently causes the tests to break, leaving only Hono operational. Additionally, the start command can’t be run with node v22.12.0 due to missing .js extensions in import statements, probably the --experimental-specifier-resolution=node flag seems unsupported in this version.

> [email protected] start
> bun --experimental-specifier-resolution=node index.js

Hello hey
clk: ~2.14 GHz
cpu: AMD Ryzen 7 5800H with Radeon Graphics
runtime: bun 1.1.42 (x64-linux)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
Hono                           2.57 µs/iter   2.65 µs ▃ █  ▂               
                        (2.37 µs … 3.36 µs)   3.23 µs █▇█▆▅█▇▇▂▄▂▁▁▁▁▁▁▁▁▂▂

To improve the benchmarks, it might help to clean up the test cases by running them in complete isolation. Using a different request and event for each case, along with mitata’s gc('inner') to trigger garbage collection before every iteration, could lead to more consistent results. Using a single handler per case could also reduce variability from different handlers, focusing the benchmark on routing logic. Adding a more realistic number of routes would make the tests closer to real-world scenarios. See this link for routes from different websites https://github.com/DuskSystems/wayfind/tree/main/benches

@EdamAme-x
Copy link
Contributor

EdamAme-x commented Dec 26, 2024

It may be necessary to move the benchmark to another repository and connect it with a submodule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants