Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime SIMD detection? #84

Open
nullchinchilla opened this issue Feb 16, 2021 · 4 comments
Open

Runtime SIMD detection? #84

nullchinchilla opened this issue Feb 16, 2021 · 4 comments

Comments

@nullchinchilla
Copy link

Is it possible for this crate to implement runtime SIMD detection, so that portable binaries with SIMD code inside can be published?

@FallingSnow
Copy link
Contributor

I believe it should be possible. The aes crate uses cpufeatures to detect simd support.

@FallingSnow
Copy link
Contributor

I spent a good amount of time working on this. First using rust's nightly feature portable-simd, then using rust's simd abstractions. It turns out that doing runtime detection is seriously hamstrung by inlining. It's currently not possible to force inlining when you don't even know which function to inline. All this results in code that is much slower than build time SIMD acceleration.

@AndersTrier
Copy link

AndersTrier commented Nov 26, 2023

@FallingSnow

It turns out that doing runtime detection is seriously hamstrung by inlining. It's currently not possible to force inlining when you don't even know which function to inline. All this results in code that is much slower than build time SIMD acceleration.

Yes, that's a bit annoying, but you can still do something like:

foo() {
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    {
        if is_x86_feature_detected!("avx2") {
            return foo_avx2();
        }
    }
    foo_fallback()
}

#[target_feature(enable = "avx2")]
unsafe fn foo_avx2() {
    bar();
    baz();
}

#[inline(always)]
fn bar() { 
    unsafe {
        let clr_mask = _mm256_set1_epi8(0x0f);
        [...]
    }
}

#[inline(always)]
fn baz() { ... }

bar() and baz() will get inlined and compiled with target_feature(enable = "avx2").

That's the approach I've taken with the Reed-Solomon library I just published: https://crates.io/crates/reed-solomon-simd
It does runtime selection of SIMD implementation on both AArch64 (Neon) and x86(-64) (SSSE3 and AVX2) with fallback to plain Rust. I don't see any noticeable performance penalty for doing runtime selection.

Feel free to draw some inspiration from that implementation. The relevant code is here: https://github.com/AndersTrier/reed-solomon-simd/tree/master/src/engine

@FallingSnow
Copy link
Contributor

Yeah looking at your code it looks possible. We'd need to change the architecture here to something similar to yours. Where there is an underlying abstraction at the highest level (basically as soon as your create a ReedSolomon), similar to your engine abstraction. I was trying to swap out the right calls at the call level rather than an entire reed solomon pipeline, which was a mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants