Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add BLAKE3 hash #13194

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

add BLAKE3 hash #13194

wants to merge 27 commits into from

Conversation

divinity76
Copy link
Contributor

@divinity76 divinity76 commented Jan 19, 2024

BLAKE3 is a very fast cryptographically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

Check this speed chart:

and this bench.php run where BLAKE3 is much faster than every other secure hash, and beats several non-cryptographic hashes (AMD Ryzen 9 7950x):

$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880

Quoting Google Bard:

In summary, BLAKE3 is a modern and highly efficient cryptographic hash function that offers security, flexibility, and ease of use. Its impressive speed and adaptability make it a promising choice for a wide range of applications in the digital age.

  • on x86_64 BLAKE3 is slightly faster than KangarooTwelve, but on ARM BLAKE3 is significantly faster than KangarooTwelve.
  • AFAIK the PHP project doesn't like git submodules (for example Dmitry Stogov wanted the new JIT engine to be a submodule but the PHP developers refused), so instead I made ext/hash/blake3/fetch_upstream_blake3.sh
  • I have not added SSE2/AVX2/AVX512/etc optimized builds to MSVC because I don't have a MSVC system to test on, so I just added the bare-minimum portable implementation to MSVC builds. (feel free to fix it, I don't want to.)

BLAKE3 is a very fast cryptograpically secure hash.
It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

Check this speed chart: https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/master/media/speed.svg
and this bench.php run where BLAKE3 is much faster than every secure hash, and beats several non-cryptographic hashes:
$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880
@devnexen
Copy link
Member

That s quite an addition ... worth discussing.

.. don't know what I'm supposed to do with it, but compiler complains about it missing.
@iluuu1994
Copy link
Member

This is indeed a lot of code. I think this should be discussed on the list, and possibly voted on.

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 19, 2024

asked on php-internals mailing list: https://marc.info/?l=php-internals&m=170568974400302&w=2
(and fixed MSVC and ARM+Neon builds, the latter tested on Oracle Cloud ARM64, which is notably different from Apple's ARM)

#define PHP_BLAKE3_CTX blake3_hasher
// help: is V correct?
//#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760"
#define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760"
Copy link
Contributor Author

@divinity76 divinity76 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone double-check this? because i couldn't wrap my head around it, and this is just my best guess based on

typedef struct {
uint32_t cv[8];
uint64_t chunk_counter;
uint8_t buf[BLAKE3_BLOCK_LEN];
uint8_t buf_len;
uint8_t blocks_compressed;
uint8_t flags;
} blake3_chunk_state;
typedef struct {
uint32_t key[8];
blake3_chunk_state chunk;
uint8_t cv_stack_len;
// The stack size is MAX_DEPTH + 1 because we do lazy merging. For example,
// with 7 chunks, we have 3 entries in the stack. Adding an 8th chunk
// requires a 4th entry, rather than merging everything down to 1, because we
// don't know whether more input is coming. This is different from how the
// reference implementation does things.
uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];
} blake3_hasher;

@divinity76 divinity76 mentioned this pull request Jan 20, 2024
@divinity76
Copy link
Contributor Author

divinity76 commented Jan 25, 2024

post on the mailing list worth re-posting here:
GCC11.4, even with -march=native -mtune=native, which is not commonly used in PHP,
the compiler didn't stand a chance against the hand-optimized assembly versions
image

wrote some benchmarks, but the TL;DR is:
portable -O2 usually used by PHP managed 1126MB/s,
portable -O2 -march=native managed 533MB/s (wtf? gcc obviously got
something wrong here),
hand-written -O2 SSE2 managed 3144MB/s,
hand-written -O2 SSE41 managed 3332MB/s,
hand-written -O2 avx2 managed 6554MB/s,
hand-writen -O2 AVX512 managed 8913MB/s,
on my AMD Ryzen 9 7950x,
benchmarking code:
https://gist.github.com/divinity76/5729472dd5d77e94cd0acb245aac2226
raw output:

array(6) {
  ["O2-portable-march"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(29295)
    ["mb_per_second"]=>
    float(533.3674688513398)
  }
  ["O2-portable"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(13876)
    ["mb_per_second"]=>
    float(1126.0449697319111)
  }
  ["O2-sse2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4969)
    ["mb_per_second"]=>
    float(3144.4958744214127)
  }
  ["O2-sse41"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4688)
    ["mb_per_second"]=>
    float(3332.977815699659)
  }
  ["O2-avx2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(2384)
    ["mb_per_second"]=>
    float(6554.1107382550335)
  }
  ["O2-avx512"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(1753)
    ["mb_per_second"]=>
    float(8913.291500285226)
  }
}

Edit:
tested ARM Neon optimizations on Oracle Cloud's cheapest ARM VPS:
VM.Standard.A1.Flex, Ubuntu 22.04, GCC11.4,
results:
image

-O2 portable: 596MB/s
-O2 -march=native portable: 601MB/s
-O2 ARM Neon optimized implementation: 1138MB/s

Again, even with -march=native, the compiler cannot make the portable
implementation nearly as fast as the hand-optimized cpu-specific
implementation.

ext/hash/config.m4 Outdated Show resolved Hide resolved
ext/hash/config.w32 Outdated Show resolved Hide resolved
divinity76 and others added 2 commits January 29, 2024 20:18
Co-authored-by: Peter Kokot <[email protected]>
Co-authored-by: Peter Kokot <[email protected]>
ext/hash/config.m4 Outdated Show resolved Hide resolved
ext/hash/config.m4 Outdated Show resolved Hide resolved
divinity76 and others added 5 commits February 3, 2024 15:43
petk: This detects x86 32-bit processor (can be x86, i386 or similar, or "amd" on FreeBSD.

Co-authored-by: Peter Kokot <[email protected]>
petk: This detects the x86_64 processor. x86_64 is detected as word amd64 on FreeBSD. The $host_os is in Autoconf used for the target CPU when cross-compiling or as a host CPU when building on the same machine as the program will be also run.

Co-authored-by: Peter Kokot <[email protected]>
patches to specifically address a gcc -Werror=logical-op issue explained in BLAKE3-team/BLAKE3#380
and a gcc -Wunused-function issue explained in BLAKE3-team/BLAKE3#382
and optimized upstream git checkout to only fetch the files we want.
divinity76 and others added 3 commits February 8, 2024 12:29
vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input,
 from 13920 nanoseconds down to 13800 nanoseconds (approx)

ref BLAKE3-team/BLAKE3#384
@divinity76
Copy link
Contributor Author

@bukka @petk added LICENSE headers and reference: 1954b37 - I prefer a reference in README.REDIST.BINS rather than in-lining it.

@cypherbits
Copy link

Having this would be good

@cypherbits
Copy link

More than 2 years since the first attempt to get Blake3 into PHP, why is it so hard?

@divinity76
Copy link
Contributor Author

divinity76 commented Oct 22, 2024

More than 2 years since the first attempt to get Blake3 into PHP, why is it so hard?

the main problem this round was that I just lost steam. Got occupied with other stuff and forgot about this. You reminded me though!

@vodnicearv
Copy link

BLAKE3 is very nice, we need that algorithm in php

@leonardocustodio
Copy link

Seems like this will never be merged, such a shame as a lot of blockchain are starting to use blake3 as standard

@divinity76
Copy link
Contributor Author

Fwiw this should be rebased against the newest blake3 release, iirc (on phone now can't check, but iirc) this pr currently contains 3 patches to blake3 with a dedicated patch file, but all 3 patches have been accepted upstream and is part of the newest Blake3 official release

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 22, 2025

Scratch that, BLAKE3-team/BLAKE3#382 is still not resolved upstream,
so if we want to compile php cleanly on MacOS, we still need at least 1 patch for blake3.

the other 2 are resolved upstream tho.

Edit: also had to add BLAKE3-team/BLAKE3#443

caused
```
/Users/runner/work/php-src/php-src/ext/hash/blake3/upstream_blake3/c/blake3_dispatch.c:237:26: error: unused variable 'features' [-Werror,-Wunused-variable]
  const enum cpu_feature features = get_cpu_features();
```
on 32bit x86-32 linux builds.
@divinity76
Copy link
Contributor Author

divinity76 commented Jan 22, 2025

I'm requesting re-review.

@remicollet
Copy link
Member

-1 from me

I think it is a terrible idea to bundle more crypto stuff in PHP
For such serious things, we should rely on external libraries (OpenSSL, sodium...) provided by project specialized in crypto.
We already have bundled far too much things.

Is there an RFC for this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants