Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_find: Remove unnecessary refcounting #17536

Merged
merged 3 commits into from
Jan 21, 2025

Conversation

TimWolla
Copy link
Member

My branch is based on PHP-8.4 and this PR can probably cleanly be switched to apply to that branch, but this change likely is not acceptable outside of master?


In a post on LinkedIn [1], Bohuslav Šimek reported that the native
implementation of array_find() was about 3× slower than the equivalent
userland implementation. While I was not able to verify this claim, due to a
lack of reproducer provided, I could confirm that the native array_find() was
indeed slower than the equivalent userland implementation. For the following
example script:

<?php

function my_array_find(array $array, callable $callback): mixed {
    foreach ($array as $key => $value) {
        if ($callback($value, $key)) {
            return $value;
        }
    }

    return null;
}

$array = range(1, 10000);

$result = 0;
for ($i = 0; $i < 5000; $i++) {
    $result += array_find($array, static function ($item) {
            return $item === 5000;
    });
}
var_dump($result);

with the array_find() call appropriately replaced for each case, a PHP-8.4
release build provided the following results:

Benchmark 1: /tmp/before native.php
  Time (mean ± σ):     765.9 ms ±   7.9 ms    [User: 761.1 ms, System: 4.4 ms]
  Range (min … max):   753.2 ms … 774.7 ms    10 runs

Benchmark 2: /tmp/before userland.php
  Time (mean ± σ):     588.0 ms ±  17.9 ms    [User: 583.6 ms, System: 4.1 ms]
  Range (min … max):   576.0 ms … 633.3 ms    10 runs

Summary
  /tmp/before userland.php ran
    1.30 ± 0.04 times faster than /tmp/before native.php

Running native.php with perf reports that a third of the time is spent in
zend_call_function() and another 20% in execute_ex(), however
php_array_find() comes next at 14%:

# Samples: 3K of event 'cpu_core/cycles/'
# Event count (approx.): 2895247444
#
# Overhead  Command  Shared Object      Symbol
# ........  .......  .................  ...........................................
#
    32.47%  before   before             [.] zend_call_function
    20.63%  before   before             [.] execute_ex
    14.06%  before   before             [.] php_array_find
     7.89%  before   before             [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
     7.31%  before   before             [.] zend_init_func_execute_data
     6.50%  before   before             [.] zend_copy_extra_args

which was surprising, because the function doesn’t too all that much. Looking
at the implementation, the refcounting stood out and it turns out that it is
not actually necessary. The array is passed by value to array_find() and
thus cannot magically change within the callback. This also means that the
array will continue to hold a reference to string keys and values, preventing
these values from being collected. The refcounting inside of php_array_find()
thus will never do anything useful.

Comparing the updated implementation against the original implementation shows
that this change results in a 1.14× improvement:

Benchmark 1: /tmp/before native.php
  Time (mean ± σ):     775.4 ms ±  29.6 ms    [User: 771.6 ms, System: 3.5 ms]
  Range (min … max):   740.2 ms … 844.4 ms    10 runs

Benchmark 2: /tmp/after native.php
  Time (mean ± σ):     677.3 ms ±  16.7 ms    [User: 673.9 ms, System: 3.1 ms]
  Range (min … max):   655.9 ms … 705.0 ms    10 runs

Summary
  /tmp/after native.php ran
    1.14 ± 0.05 times faster than /tmp/before native.php

Comparing the native implementation against the userland implementation with
the new implementation shows that while the native implementation still is
slower, the difference reduced to 15% (down from 30%):

Benchmark 1: /tmp/after native.php
  Time (mean ± σ):     670.4 ms ±   9.3 ms    [User: 666.7 ms, System: 3.4 ms]
  Range (min … max):   657.1 ms … 689.0 ms    10 runs

Benchmark 2: /tmp/after userland.php
  Time (mean ± σ):     576.7 ms ±   7.6 ms    [User: 572.5 ms, System: 3.7 ms]
  Range (min … max):   563.9 ms … 588.1 ms    10 runs

Summary
  /tmp/after userland.php ran
    1.16 ± 0.02 times faster than /tmp/after native.php

Looking at the updated perf results shows that php_array_find() now only
takes up 8% of the time:

# Samples: 2K of event 'cpu_core/cycles/'
# Event count (approx.): 2540947159
#
# Overhead  Command  Shared Object         Symbol
# ........  .......  ....................  ...........................................
#
    34.77%  after    after                 [.] zend_call_function
    18.57%  after    after                 [.] execute_ex
    12.28%  after    after                 [.] zend_copy_extra_args
    10.91%  after    after                 [.] zend_init_func_execute_data
     8.77%  after    after                 [.] php_array_find
     6.70%  after    after                 [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
     4.68%  after    after                 [.] zend_is_identical

[1] https://www.linkedin.com/posts/bohuslav-%C5%A1imek-kambo_the-surprising-performance-of-php-84-activity-7287044532280414209-6WnA

In a post on LinkedIn [1], Bohuslav Šimek reported that the native
implementation of `array_find()` was about 3× slower than the equivalent
userland implementation. While I was not able to verify this claim, due to a
lack of reproducer provided, I could confirm that the native `array_find()` was
indeed slower than the equivalent userland implementation. For the following
example script:

    <?php

    function my_array_find(array $array, callable $callback): mixed {
        foreach ($array as $key => $value) {
            if ($callback($value, $key)) {
                return $value;
            }
        }

        return null;
    }

    $array = range(1, 10000);

    $result = 0;
    for ($i = 0; $i < 5000; $i++) {
    	$result += array_find($array, static function ($item) {
    		return $item === 5000;
    	});
    }
    var_dump($result);

with the `array_find()` call appropriately replaced for each case, a PHP-8.4
release build provided the following results:

    Benchmark 1: /tmp/before native.php
      Time (mean ± σ):     765.9 ms ±   7.9 ms    [User: 761.1 ms, System: 4.4 ms]
      Range (min … max):   753.2 ms … 774.7 ms    10 runs

    Benchmark 2: /tmp/before userland.php
      Time (mean ± σ):     588.0 ms ±  17.9 ms    [User: 583.6 ms, System: 4.1 ms]
      Range (min … max):   576.0 ms … 633.3 ms    10 runs

    Summary
      /tmp/before userland.php ran
        1.30 ± 0.04 times faster than /tmp/before native.php

Running `native.php` with perf reports that a third of the time is spent in
`zend_call_function()` and another 20% in `execute_ex()`, however
`php_array_find()` comes next at 14%:

    # Samples: 3K of event 'cpu_core/cycles/'
    # Event count (approx.): 2895247444
    #
    # Overhead  Command  Shared Object      Symbol
    # ........  .......  .................  ...........................................
    #
        32.47%  before   before             [.] zend_call_function
        20.63%  before   before             [.] execute_ex
        14.06%  before   before             [.] php_array_find
         7.89%  before   before             [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
         7.31%  before   before             [.] zend_init_func_execute_data
         6.50%  before   before             [.] zend_copy_extra_args

which was surprising, because the function doesn’t too all that much. Looking
at the implementation, the refcounting stood out and it turns out that it is
not actually necessary. The `array` is passed by value to `array_find()` and
thus cannot magically change within the callback. This also means that the
array will continue to hold a reference to string keys and values, preventing
these values from being collected. The refcounting inside of `php_array_find()`
thus will never do anything useful.

Comparing the updated implementation against the original implementation shows
that this change results in a 1.14× improvement:

    Benchmark 1: /tmp/before native.php
      Time (mean ± σ):     775.4 ms ±  29.6 ms    [User: 771.6 ms, System: 3.5 ms]
      Range (min … max):   740.2 ms … 844.4 ms    10 runs

    Benchmark 2: /tmp/after native.php
      Time (mean ± σ):     677.3 ms ±  16.7 ms    [User: 673.9 ms, System: 3.1 ms]
      Range (min … max):   655.9 ms … 705.0 ms    10 runs

    Summary
      /tmp/after native.php ran
        1.14 ± 0.05 times faster than /tmp/before native.php

Comparing the native implementation against the userland implementation with
the new implementation shows that while the native implementation still is
slower, the difference reduced to 15% (down from 30%):

    Benchmark 1: /tmp/after native.php
      Time (mean ± σ):     670.4 ms ±   9.3 ms    [User: 666.7 ms, System: 3.4 ms]
      Range (min … max):   657.1 ms … 689.0 ms    10 runs

    Benchmark 2: /tmp/after userland.php
      Time (mean ± σ):     576.7 ms ±   7.6 ms    [User: 572.5 ms, System: 3.7 ms]
      Range (min … max):   563.9 ms … 588.1 ms    10 runs

    Summary
      /tmp/after userland.php ran
        1.16 ± 0.02 times faster than /tmp/after native.php

Looking at the updated perf results shows that `php_array_find()` now only
takes up 8% of the time:

    # Samples: 2K of event 'cpu_core/cycles/'
    # Event count (approx.): 2540947159
    #
    # Overhead  Command  Shared Object         Symbol
    # ........  .......  ....................  ...........................................
    #
        34.77%  after    after                 [.] zend_call_function
        18.57%  after    after                 [.] execute_ex
        12.28%  after    after                 [.] zend_copy_extra_args
        10.91%  after    after                 [.] zend_init_func_execute_data
         8.77%  after    after                 [.] php_array_find
         6.70%  after    after                 [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
         4.68%  after    after                 [.] zend_is_identical

[1] https://www.linkedin.com/posts/bohuslav-%C5%A1imek-kambo_the-surprising-performance-of-php-84-activity-7287044532280414209-6WnA
ext/standard/array.c Outdated Show resolved Hide resolved
This change has no effect on performance, but greatly improves readability of
the implementation.
@TimWolla TimWolla requested a review from Girgias January 21, 2025 09:23
@staabm
Copy link
Contributor

staabm commented Jan 21, 2025

do we have a similar problem in array_find_key?

@staabm
Copy link
Contributor

staabm commented Jan 21, 2025

do we have a similar problem in array_find_key?

looking more closely I see you you adjust php_array_find which is used in array_find_key, array_all, array_any etc.
I guess this means these also get faster, right?

@TimWolla
Copy link
Member Author

I guess this means these also get faster, right?

Yes, they all share the same internal implementation. As an example, array_any() is effectively just array_find_key() !== null.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refcount changes look right to me.

Argument passing increases the refcount of array, so it's not possible to dtor it until the function returns. Also, the refcount would necessarily be higher than 1 if anything else had access to the array, so it's not possible to mutate it.

@staabm
Copy link
Contributor

staabm commented Jan 21, 2025

another question from someone without enough php-src knowledge:

could php-src ship with a plain php based implementation instead of a C implementation in these cases?

@TimWolla
Copy link
Member Author

could php-src ship with a plain php based implementation instead of a C implementation in these cases?

It probably could, but there is no precedent or existing infrastructure for that. Personally I think it would be a desirable goal independent of this specific case [1], because it also would simplify implementation of the standard library in cases where performance is less of a concern (e.g. high-level APIs).

[1] I plan to follow up with a second PR that further improves the performance of calling userland code from native code, making the native implementation of array_find() equal to the userland implementation, but will need to do some final tests.

@staabm
Copy link
Contributor

staabm commented Jan 21, 2025

It probably could, but there is no precedent or existing infrastructure for that.

I somehow remember the HHVM guys did this back then. Can't find the related blog posts though

@TimWolla TimWolla merged commit c39d112 into php:master Jan 21, 2025
10 checks passed
@TimWolla TimWolla deleted the array-find-optimization branch January 21, 2025 11:27
TimWolla added a commit to TimWolla/php-src that referenced this pull request Jan 21, 2025
This syncs the implementation with the updated implementation of `array_find()`
in php#17536. For the following test script:

    <?php

    $array = range(1, 8000);

    $result = 0;
    for ($i = 0; $i < 4000; $i++) {
    	$result += count(array_filter($array, static function ($item) {
    		return $item <= 4000;
    	}));
    }
    var_dump($result);

This change results in:

    Benchmark 1: /tmp/before array_filter.php
      Time (mean ± σ):     696.9 ms ±  16.3 ms    [User: 692.9 ms, System: 3.5 ms]
      Range (min … max):   681.6 ms … 731.5 ms    10 runs

    Benchmark 2: /tmp/after array_filter.php
      Time (mean ± σ):     637.5 ms ±   5.6 ms    [User: 633.6 ms, System: 3.8 ms]
      Range (min … max):   630.2 ms … 648.6 ms    10 runs

    Summary
      /tmp/after array_filter.php ran
        1.09 ± 0.03 times faster than /tmp/before array_filter.php

Or as reported by perf:

    # Samples: 2K of event 'cpu_core/cycles/'
    # Event count (approx.): 2567197440
    #
    # Overhead  Command  Shared Object         Symbol
    # ........  .......  ....................  ........................................................
    #
        37.02%  before   before                [.] zend_call_function
        15.50%  before   before                [.] execute_ex
        10.60%  before   before                [.] zif_array_filter
         9.43%  before   before                [.] zend_hash_index_add_new
         9.13%  before   before                [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
         8.46%  before   before                [.] zend_init_func_execute_data
         3.78%  before   before                [.] zval_add_ref
         3.47%  before   before                [.] zval_ptr_dtor
         1.17%  before   before                [.] zend_is_true

vs

    # Samples: 2K of event 'cpu_core/cycles/'
    # Event count (approx.): 2390202140
    #
    # Overhead  Command  Shared Object         Symbol
    # ........  .......  ....................  ........................................................
    #
        36.87%  after    after                 [.] zend_call_function
        20.46%  after    after                 [.] execute_ex
         8.22%  after    after                 [.] zend_init_func_execute_data
         7.94%  after    after                 [.] zend_hash_index_add_new
         7.89%  after    after                 [.] zif_array_filter
         6.28%  after    after                 [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
         3.95%  after    after                 [.] zval_add_ref
         2.23%  after    after                 [.] zend_is_true
TimWolla added a commit that referenced this pull request Jan 21, 2025
This syncs the implementation with the updated implementation of `array_find()`
in #17536. For the following test script:

    <?php

    $array = range(1, 8000);

    $result = 0;
    for ($i = 0; $i < 4000; $i++) {
    	$result += count(array_filter($array, static function ($item) {
    		return $item <= 4000;
    	}));
    }
    var_dump($result);

This change results in:

    Benchmark 1: /tmp/before array_filter.php
      Time (mean ± σ):     696.9 ms ±  16.3 ms    [User: 692.9 ms, System: 3.5 ms]
      Range (min … max):   681.6 ms … 731.5 ms    10 runs

    Benchmark 2: /tmp/after array_filter.php
      Time (mean ± σ):     637.5 ms ±   5.6 ms    [User: 633.6 ms, System: 3.8 ms]
      Range (min … max):   630.2 ms … 648.6 ms    10 runs

    Summary
      /tmp/after array_filter.php ran
        1.09 ± 0.03 times faster than /tmp/before array_filter.php

Or as reported by perf:

    # Samples: 2K of event 'cpu_core/cycles/'
    # Event count (approx.): 2567197440
    #
    # Overhead  Command  Shared Object         Symbol
    # ........  .......  ....................  ........................................................
    #
        37.02%  before   before                [.] zend_call_function
        15.50%  before   before                [.] execute_ex
        10.60%  before   before                [.] zif_array_filter
         9.43%  before   before                [.] zend_hash_index_add_new
         9.13%  before   before                [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
         8.46%  before   before                [.] zend_init_func_execute_data
         3.78%  before   before                [.] zval_add_ref
         3.47%  before   before                [.] zval_ptr_dtor
         1.17%  before   before                [.] zend_is_true

vs

    # Samples: 2K of event 'cpu_core/cycles/'
    # Event count (approx.): 2390202140
    #
    # Overhead  Command  Shared Object         Symbol
    # ........  .......  ....................  ........................................................
    #
        36.87%  after    after                 [.] zend_call_function
        20.46%  after    after                 [.] execute_ex
         8.22%  after    after                 [.] zend_init_func_execute_data
         7.94%  after    after                 [.] zend_hash_index_add_new
         7.89%  after    after                 [.] zif_array_filter
         6.28%  after    after                 [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
         3.95%  after    after                 [.] zval_add_ref
         2.23%  after    after                 [.] zend_is_true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants