-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transform hierarchy propagation benchmark #9442
base: main
Are you sure you want to change the base?
Conversation
Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So. What do you see when you run the benchmarks?
If you run several times the benchmark without changing anything, how much difference is there between runs? How noisy is it?
group.warm_up_time(std::time::Duration::from_secs(2)); | ||
group.measurement_time(std::time::Duration::from_secs(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could be extracted into constants.
group.bench_function("transform_init", |b| { | ||
// Building the World (in setup) takes a lot of time, so we shouldn't do that on every | ||
// iteration. | ||
// Unfortunately, we can't re-use an App directly in iter() the World would no longer be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm there is no better ways of setting things up yet.
unsafe { cell.world_mut() }.run_schedule(ResetSchedule); | ||
|
||
cell | ||
}, | ||
|cell| { | ||
unsafe { cell.world_mut() }.run_schedule(bevy_app::Main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add safety comments here. This requires both closure to never run concurrently. Do you know this for a fact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the case. However, I've thought a little over this and simply replaced the version using iter_batched/UnsafeWorldCell with iter_custom, which doesn't require using UnsafeWorldCell here. Only downside is that it's harder to use custom Measurement
types with iter_custom, but they aren't currently used in any bevy benchmark.
if enable_update { | ||
app | ||
.add_plugins(TimePlugin) | ||
// Updating transforms *must* be done before `CoreSet::PostUpdate` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as a CoreSet
anymore. Where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the code I simply copied over from the stress test, so it's probably an oversight from that change:
bevy/examples/stress_tests/transform_hierarchy.rs
Lines 188 to 189 in 505b9a5
// Updating transforms *must* be done before `CoreSet::PostUpdate` | |
// or the hierarchy will momentarily be in an invalid state. |
I removed the comment in the benchmark (but not in the example).
// Run Main schedule once to ensure initial updates are done | ||
app.update(); | ||
|
||
b.iter(move || { app.update(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also measures the update
system runtime right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the names are a little misleading here: App::update
is simply what usually gets called by the app.runner
(the closure set by the ScheduleRunnerPlugin/winit), and in this case it simply runs the main schedule since we the benchmark app doesn't have any subapps.
And the initial updates I'm referring to in the first comment are both whatever happens in the Startup
schedule and the first time the propagations are run in PostUpdate
. I didn't specify that in the comment as I'm trying to keep the benchmark implementation-agnostic, perhaps the propagation systems will be run in a different Schedule in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just realized I misread your comment: Yes, the update system runtime is included here.
IMO we should get rid of the |
3fd0c14
to
228a93c
Compare
Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to benchmark the transform propagation. The stress_test example didn't help at all. I ran locally the benchmark and I found it very slow. I found it was not noisy at all, so that's also pretty good.
What I'd like to see:
- Do not rely on
App::update
for updates, but rather build a schedule with the two transform systems and measure their aggregate runtime. Note that this requires manually advancing the world tick (outside of measurement). - Make the
update
system useset_changed()
instead of changing the value. - Remove the
reference
bench.
I would really like to keep the benchmarks focused. It both makes the benchmarks faster and easier to interpret.
Does this make sense?
"chain", | ||
Cfg { | ||
test_case: TestCase::Tree { | ||
depth: 2500, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly certain this needs to be multiplied by 5. Ideally we have about the same number of Entity
per test case, in order to compare more easily the transform propagation behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that idea as well, but due to the recursive implementation of propagation updates, setting the depth too high will exhaust the stack.
On my system, the benchmark simply crashes when setting depth to 50000. 5000 works, but I haven't tested any further depth values yet.
|
||
/// This benchmark tries to measure the cost of the initial transform propagation, | ||
/// i.e. the first time transform propagation runs after we just added all our entities. | ||
fn transform_init(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt naming: I think it's a bit misleading to call it "initial propagation". I'd call it "full propagation". transform_complete_propagation
/// update component with some per-component value | ||
#[derive(Component)] | ||
struct UpdateValue(f32); | ||
|
||
/// update positions system | ||
fn update(time: Res<Time>, mut query: Query<(&mut Transform, &mut UpdateValue)>) { | ||
for (mut t, mut u) in &mut query { | ||
u.0 += time.delta_seconds() * 0.1; | ||
set_translation(&mut t.translation, u.0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateValue
could be a simple marker component and update
call t.set_changed()
. IMO this is better, as it avoids the overhead of the trigonometric functions, setting the update flag on UpdateValue
, loading/storing values to table storage.
We are only interested in how transform propagation behaves. Any other code adding to the runtime is noise.
@nicopap Thanks for all your feedback, it's been very helpful. I've refactored the overall benchmark structure once more in order to hopefully make the benchmarks more useful. I didn't follow all your suggestions precisely, but I think I addressed most of the issues. Please let me know what you think. Overview of the changes:
Some open questions:
@nicopap If I understood you correctly, you're suggesting that the |
Here are some example results showing the results from Note: These benchmarks were done on a potato CPU compared to current models (Intel i5-7200U) which I manually downclocked a bit to ensure thermal throttling don't influence the results too much. As a result, the absolute numbers might look a little worse than you'd expect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Nice job. To me the only change necessary is getting rid of UPDATE_BENCH_POSTUPDATE_ONLY
.
Also we run cargo fmt
on source code to keep a consistent style.
// Measures hierarchy propagation systems when some transforms are updated. | ||
group.bench_with_input(id("updates"), &(cfg, TransformUpdates::Enabled), inner_update_bench); | ||
|
||
// Measures hierarchy propagation systems when there are no changes | ||
// during the Update schedule. | ||
group.bench_with_input(id("noop"), &(cfg, TransformUpdates::Disabled), inner_update_bench); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest replacing:
"updates"
with"transform_updates_enabled"
"noop"
with"transform_updates_disabled"
So the relationship between the benchmark result and the benchmark source code is a bit more evident.
@@ -0,0 +1,4 @@ | |||
pub mod init; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// since the benchmark implementation is a little fragile and rather slow (see comments below). | ||
/// They're included here nevertheless in case they're useful. | ||
fn transform_init(c: &mut Criterion) { | ||
let mut group = c.benchmark_group("transform_init"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this benchmark is useful. It gives an idea of the behavior on worst case situations (which do exists in games, eg: when spawning a new level, or complex models).
The computational cost of removing many entities was a factor in reverting a change (see #5423 (comment)). This means we care about computational cost of this sort of things.
I think that "full recomputation" or something similarly descriptive could be a better name though.
std::hint::black_box({ | ||
last.run(&mut app.world); | ||
app.world.clear_trackers(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the blackbox is necessary here.
fn inner_update_bench(b: &mut Bencher<WallTime>, bench_cfg: &(&Cfg, TransformUpdates)) { | ||
const UPDATE_BENCH_POSTUPDATE_ONLY: bool = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a weird way of selecting benchmarks. And inconsistent with how the transform_init
benchmark works.
In my opinion "reference" benchmarks should be removed from this PR, since (from my testing) they only add a constant overhead.
But it's OK if they stay in the PR as long as they are consistently declared :P
In my opinion, yes. See #7433 where I was unable to discover how that example is meant to be useful. Doesn't need to be done here, just wanted to link the issue up. |
Objective
Solution
setup
is using a&mut World
directly instead of running as aStartup
system.Review Considerations
include!
or similar methods.UnsafeWorldCell
. I think it should be fine, but someone with a better understanding ofUnsafeWorldCell
might want to take a look at it.