Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blake2-rfc is slightly faster than the portable implementation #7

Open
oconnor663 opened this issue Nov 1, 2018 · 4 comments
Open

Comments

@oconnor663
Copy link
Owner

https://github.com/cesarb/blake2-rfc

I measure it to be about 2% faster than portable.rs. Not yet sure why, though it might be using some SIMD under the covers, or maybe getting optimized to SSE2 by the compiler.

However, the relationship is reversed if I set RUSTFLAGS="-C target-cpu=native -C target-feature=-avx2". No idea why. Again, still a small difference. Notably, both implementations tank their performance if I allow them to use AVX2.

@oconnor663
Copy link
Owner Author

I thought it might be because blake2-rfc was getting autovectorized, but looking at the output of cargo asm that doesn't seem to be the case. So I'm still not sure where the difference comes from.

@oconnor663
Copy link
Owner Author

When I try it on ARM I get the opposite result. Should look at 32-bit ARM at some point.

@LuoZijun
Copy link

@oconnor663 i got same performance (vs blake2-rfc)

# code copy from https://github.com/shadowsocks/crypto2/tree/dev/src/hash/blake2b
git clone https://github.com/LuoZijun/test_blake2b/
cargo bench

@Ujang360
Copy link

Ujang360 commented Jan 1, 2023

@oconnor663 I tried with ARM Neoverse N1, blake2-rfc is slightly faster.

The code:
https://github.com/gemtek-indonesia/blake2b256-bench/blob/249cac1bf8788c224f45990d607c4b510a92c862/src/main.rs#L103-L134

And compiled it with:

RUSTFLAGS="-C target-cpu=native -C codegen-units=1" cargo build --release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants