-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array_find: Remove unnecessary refcounting #17536
Conversation
In a post on LinkedIn [1], Bohuslav Šimek reported that the native implementation of `array_find()` was about 3× slower than the equivalent userland implementation. While I was not able to verify this claim, due to a lack of reproducer provided, I could confirm that the native `array_find()` was indeed slower than the equivalent userland implementation. For the following example script: <?php function my_array_find(array $array, callable $callback): mixed { foreach ($array as $key => $value) { if ($callback($value, $key)) { return $value; } } return null; } $array = range(1, 10000); $result = 0; for ($i = 0; $i < 5000; $i++) { $result += array_find($array, static function ($item) { return $item === 5000; }); } var_dump($result); with the `array_find()` call appropriately replaced for each case, a PHP-8.4 release build provided the following results: Benchmark 1: /tmp/before native.php Time (mean ± σ): 765.9 ms ± 7.9 ms [User: 761.1 ms, System: 4.4 ms] Range (min … max): 753.2 ms … 774.7 ms 10 runs Benchmark 2: /tmp/before userland.php Time (mean ± σ): 588.0 ms ± 17.9 ms [User: 583.6 ms, System: 4.1 ms] Range (min … max): 576.0 ms … 633.3 ms 10 runs Summary /tmp/before userland.php ran 1.30 ± 0.04 times faster than /tmp/before native.php Running `native.php` with perf reports that a third of the time is spent in `zend_call_function()` and another 20% in `execute_ex()`, however `php_array_find()` comes next at 14%: # Samples: 3K of event 'cpu_core/cycles/' # Event count (approx.): 2895247444 # # Overhead Command Shared Object Symbol # ........ ....... ................. ........................................... # 32.47% before before [.] zend_call_function 20.63% before before [.] execute_ex 14.06% before before [.] php_array_find 7.89% before before [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER 7.31% before before [.] zend_init_func_execute_data 6.50% before before [.] zend_copy_extra_args which was surprising, because the function doesn’t too all that much. Looking at the implementation, the refcounting stood out and it turns out that it is not actually necessary. The `array` is passed by value to `array_find()` and thus cannot magically change within the callback. This also means that the array will continue to hold a reference to string keys and values, preventing these values from being collected. The refcounting inside of `php_array_find()` thus will never do anything useful. Comparing the updated implementation against the original implementation shows that this change results in a 1.14× improvement: Benchmark 1: /tmp/before native.php Time (mean ± σ): 775.4 ms ± 29.6 ms [User: 771.6 ms, System: 3.5 ms] Range (min … max): 740.2 ms … 844.4 ms 10 runs Benchmark 2: /tmp/after native.php Time (mean ± σ): 677.3 ms ± 16.7 ms [User: 673.9 ms, System: 3.1 ms] Range (min … max): 655.9 ms … 705.0 ms 10 runs Summary /tmp/after native.php ran 1.14 ± 0.05 times faster than /tmp/before native.php Comparing the native implementation against the userland implementation with the new implementation shows that while the native implementation still is slower, the difference reduced to 15% (down from 30%): Benchmark 1: /tmp/after native.php Time (mean ± σ): 670.4 ms ± 9.3 ms [User: 666.7 ms, System: 3.4 ms] Range (min … max): 657.1 ms … 689.0 ms 10 runs Benchmark 2: /tmp/after userland.php Time (mean ± σ): 576.7 ms ± 7.6 ms [User: 572.5 ms, System: 3.7 ms] Range (min … max): 563.9 ms … 588.1 ms 10 runs Summary /tmp/after userland.php ran 1.16 ± 0.02 times faster than /tmp/after native.php Looking at the updated perf results shows that `php_array_find()` now only takes up 8% of the time: # Samples: 2K of event 'cpu_core/cycles/' # Event count (approx.): 2540947159 # # Overhead Command Shared Object Symbol # ........ ....... .................... ........................................... # 34.77% after after [.] zend_call_function 18.57% after after [.] execute_ex 12.28% after after [.] zend_copy_extra_args 10.91% after after [.] zend_init_func_execute_data 8.77% after after [.] php_array_find 6.70% after after [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER 4.68% after after [.] zend_is_identical [1] https://www.linkedin.com/posts/bohuslav-%C5%A1imek-kambo_the-surprising-performance-of-php-84-activity-7287044532280414209-6WnA
This change has no effect on performance, but greatly improves readability of the implementation.
do we have a similar problem in |
looking more closely I see you you adjust |
Yes, they all share the same internal implementation. As an example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refcount changes look right to me.
Argument passing increases the refcount of array
, so it's not possible to dtor it until the function returns. Also, the refcount would necessarily be higher than 1 if anything else had access to the array, so it's not possible to mutate it.
another question from someone without enough php-src knowledge: could php-src ship with a plain php based implementation instead of a C implementation in these cases? |
It probably could, but there is no precedent or existing infrastructure for that. Personally I think it would be a desirable goal independent of this specific case [1], because it also would simplify implementation of the standard library in cases where performance is less of a concern (e.g. high-level APIs). [1] I plan to follow up with a second PR that further improves the performance of calling userland code from native code, making the native implementation of array_find() equal to the userland implementation, but will need to do some final tests. |
I somehow remember the HHVM guys did this back then. Can't find the related blog posts though |
This syncs the implementation with the updated implementation of `array_find()` in php#17536. For the following test script: <?php $array = range(1, 8000); $result = 0; for ($i = 0; $i < 4000; $i++) { $result += count(array_filter($array, static function ($item) { return $item <= 4000; })); } var_dump($result); This change results in: Benchmark 1: /tmp/before array_filter.php Time (mean ± σ): 696.9 ms ± 16.3 ms [User: 692.9 ms, System: 3.5 ms] Range (min … max): 681.6 ms … 731.5 ms 10 runs Benchmark 2: /tmp/after array_filter.php Time (mean ± σ): 637.5 ms ± 5.6 ms [User: 633.6 ms, System: 3.8 ms] Range (min … max): 630.2 ms … 648.6 ms 10 runs Summary /tmp/after array_filter.php ran 1.09 ± 0.03 times faster than /tmp/before array_filter.php Or as reported by perf: # Samples: 2K of event 'cpu_core/cycles/' # Event count (approx.): 2567197440 # # Overhead Command Shared Object Symbol # ........ ....... .................... ........................................................ # 37.02% before before [.] zend_call_function 15.50% before before [.] execute_ex 10.60% before before [.] zif_array_filter 9.43% before before [.] zend_hash_index_add_new 9.13% before before [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER 8.46% before before [.] zend_init_func_execute_data 3.78% before before [.] zval_add_ref 3.47% before before [.] zval_ptr_dtor 1.17% before before [.] zend_is_true vs # Samples: 2K of event 'cpu_core/cycles/' # Event count (approx.): 2390202140 # # Overhead Command Shared Object Symbol # ........ ....... .................... ........................................................ # 36.87% after after [.] zend_call_function 20.46% after after [.] execute_ex 8.22% after after [.] zend_init_func_execute_data 7.94% after after [.] zend_hash_index_add_new 7.89% after after [.] zif_array_filter 6.28% after after [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER 3.95% after after [.] zval_add_ref 2.23% after after [.] zend_is_true
This syncs the implementation with the updated implementation of `array_find()` in #17536. For the following test script: <?php $array = range(1, 8000); $result = 0; for ($i = 0; $i < 4000; $i++) { $result += count(array_filter($array, static function ($item) { return $item <= 4000; })); } var_dump($result); This change results in: Benchmark 1: /tmp/before array_filter.php Time (mean ± σ): 696.9 ms ± 16.3 ms [User: 692.9 ms, System: 3.5 ms] Range (min … max): 681.6 ms … 731.5 ms 10 runs Benchmark 2: /tmp/after array_filter.php Time (mean ± σ): 637.5 ms ± 5.6 ms [User: 633.6 ms, System: 3.8 ms] Range (min … max): 630.2 ms … 648.6 ms 10 runs Summary /tmp/after array_filter.php ran 1.09 ± 0.03 times faster than /tmp/before array_filter.php Or as reported by perf: # Samples: 2K of event 'cpu_core/cycles/' # Event count (approx.): 2567197440 # # Overhead Command Shared Object Symbol # ........ ....... .................... ........................................................ # 37.02% before before [.] zend_call_function 15.50% before before [.] execute_ex 10.60% before before [.] zif_array_filter 9.43% before before [.] zend_hash_index_add_new 9.13% before before [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER 8.46% before before [.] zend_init_func_execute_data 3.78% before before [.] zval_add_ref 3.47% before before [.] zval_ptr_dtor 1.17% before before [.] zend_is_true vs # Samples: 2K of event 'cpu_core/cycles/' # Event count (approx.): 2390202140 # # Overhead Command Shared Object Symbol # ........ ....... .................... ........................................................ # 36.87% after after [.] zend_call_function 20.46% after after [.] execute_ex 8.22% after after [.] zend_init_func_execute_data 7.94% after after [.] zend_hash_index_add_new 7.89% after after [.] zif_array_filter 6.28% after after [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER 3.95% after after [.] zval_add_ref 2.23% after after [.] zend_is_true
My branch is based on PHP-8.4 and this PR can probably cleanly be switched to apply to that branch, but this change likely is not acceptable outside of master?
In a post on LinkedIn [1], Bohuslav Šimek reported that the native
implementation of
array_find()
was about 3× slower than the equivalentuserland implementation. While I was not able to verify this claim, due to a
lack of reproducer provided, I could confirm that the native
array_find()
wasindeed slower than the equivalent userland implementation. For the following
example script:
with the
array_find()
call appropriately replaced for each case, a PHP-8.4release build provided the following results:
Running
native.php
with perf reports that a third of the time is spent inzend_call_function()
and another 20% inexecute_ex()
, howeverphp_array_find()
comes next at 14%:which was surprising, because the function doesn’t too all that much. Looking
at the implementation, the refcounting stood out and it turns out that it is
not actually necessary. The
array
is passed by value toarray_find()
andthus cannot magically change within the callback. This also means that the
array will continue to hold a reference to string keys and values, preventing
these values from being collected. The refcounting inside of
php_array_find()
thus will never do anything useful.
Comparing the updated implementation against the original implementation shows
that this change results in a 1.14× improvement:
Comparing the native implementation against the userland implementation with
the new implementation shows that while the native implementation still is
slower, the difference reduced to 15% (down from 30%):
Looking at the updated perf results shows that
php_array_find()
now onlytakes up 8% of the time:
[1] https://www.linkedin.com/posts/bohuslav-%C5%A1imek-kambo_the-surprising-performance-of-php-84-activity-7287044532280414209-6WnA