-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization/perf oddity with allocations #128854
Comments
I know this is a "minimal reduction" but is there an example where this impacts actual programs? |
For reference, the actual cause, as far as I can tell, is this line in the // Make sure we don't accidentally allow omitting the allocator shim in
// stable code until it is actually stabilized.
core::ptr::read_volatile(&__rust_no_alloc_shim_is_unstable); It, of course, can't be optimized out because it's volatile. That line doesn't appear in |
Just noticed a helpful rundown of the cause, why, and perf potential if fixed (@Kobzol's perf run) in this zulip conversation |
Would you count this as fixed if we make the codegen match by regressing the good case? #130497 |
…=<try> read_volatile __rust_no_alloc_shim_is_unstable in alloc_zeroed rust-lang#128854 (comment) r? `@ghost`
…=bjorn3 read_volatile __rust_no_alloc_shim_is_unstable in alloc_zeroed It was pointed out in rust-lang#128854 (comment) that the magic volatile read was probably missing from `alloc_zeroed`. I can't find any mention of `alloc_zeroed` on rust-lang#86844, so it looks like this was just missed initially.
Bold move @saethlin 😆. I've updated bug description/repro. Y'all do whatever you want with this, just an observation I made a while back looking into Mojo's claims being faster than Rust. |
detailsI am prototyping on a design for DataFusion scalar function vectorization without boilerplateL apache/datafusion#12635 Benchmarking the code below shows that the second function is 6x slower than the first on my machine. // Simple Result type. The Result wrappers get optimized away nicely
type Result<T> = std::result::Result<T, String>;
fn simple_sum(a: i32, b: i32, c: i32, d: i32) -> Result<i32> {
Ok(a + b + c + d)
}
fn curried_sum(a: i32, b: i32, c: i32, d: i32) -> Result<i32> {
// the arithemtics gets inlined nicely
Ok(fn_fn_fn_fn(a)?(b)?(c)?(d)?)
}
fn fn_fn_fn_fn(a: i32) -> Result<Box<dyn Fn(i32) -> Result<Box<dyn Fn(i32) -> Result<Box<dyn Fn(i32) -> Result<i32>>>>>>>
{
Ok(Box::new(move |b| Ok(Box::new(move |c| Ok(Box::new(move |d| Ok(a + b + c + d)))))))
} These functional-style functions are going to be used via template functions. The odd syntax is really important to make the templating work. I.e. it's easy to support no lazy eval (no |
I definitely see a different instruction sequence that is fixed by removing the volatile read, but on x86_64 these microbenchmark to the same throughput. So this is a nice use case and I personally hate the |
@saethlin thank you for your response! Details in https://gist.github.com/findepi/89497d13a3a249a1d2d1b6d7c2f8b927 Footnotes
|
Consider the following minimized example:
Expected generated output (Rust 1.81.0):
Actual output (Rust Nightly):
Godbolt: https://www.godbolt.org/z/x458Pv8P5
The text was updated successfully, but these errors were encountered: