Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory overhead of DatePeriod via virtual properties #15598

Merged
merged 7 commits into from
Sep 27, 2024

Conversation

kocsismate
Copy link
Member

@kocsismate kocsismate commented Aug 27, 2024

Related to #11644 and #13988. Test for high memory consumption: https://3v4l.org/QW9Tm

@kocsismate kocsismate requested review from nielsdos and Girgias August 27, 2024 11:07
@kocsismate kocsismate requested a review from derickr as a code owner August 27, 2024 11:07
@kocsismate kocsismate marked this pull request as draft August 27, 2024 11:36
@kocsismate kocsismate force-pushed the dateinterval-virtual-properties branch 2 times, most recently from c70de9d to 9144b30 Compare August 27, 2024 14:53
@kocsismate kocsismate marked this pull request as ready for review August 28, 2024 05:36
ext/date/php_date.c Outdated Show resolved Hide resolved
@@ -33,18 +33,25 @@ object(DatePeriod)#%d (%d) {
["start"]=>
object(DateTime)#%d (%d) {
["date"]=>
string(26) "2022-07-14 00:00:00.000000"
string(26) "2024-08-28 00:00:00.000000"
Copy link
Member Author

@kocsismate kocsismate Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unserialization doesn't revert the already done changes, therefore if an error happens at include_start_date (due to the wrong type) then the expected result is that the new values are applied for current, start, end etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a change in behaviour then? I don't quite understand why you need to make this changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behavior was a by-product of storing properties in two places. The following happened: internal struct members of DatePeriod were assigned a value during unserialization, until an incompatible type of the include_start_date key caused a failure. That's why regular object properties were left intact, since https://github.com/php/php-src/pull/15598/files#diff-975ac482adc74b487c6ac9d8dde32153e4b8a803ef8ec41e8d6594f0a042db69L5846 was skipped. The test reflected the old object property values then, while the internal struct contained the new values.

The new behavior is consistent with the comment here:

/* this function does no rollback on error */
(/* this function does no rollback on error */). If there is an error during serialization, there is no "rollback", so the test reflects the new values for those properties which were affected by the unserialization.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your usage of virtual seems right. I'm not the codeowner of this component so I can't comment much on the other aspects. Ultimately it's up to Derick to decide what to do with this.

return;
}

zend_class_entry *ce_ptr = ce;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this out of the else block, but is there a reason for this except for coding style? I would rather not have code motion especially since it's unrelated to this PR. Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: #15598 (comment) Although I commented this to the wrong line, because line 320 (before my changes) was missing a return and therefore zend_throw_error was invoked twice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unrelated to the actual change though (memory reduction); I also don't think the code style should be updated in the same PR/commit, and even if you did, I would prefer that variable declarations be kept at the top of functions.

@@ -2038,7 +1999,7 @@ static int date_object_compare_timezone(zval *tz1, zval *tz2) /* {{{ */

if (!o1->initialized || !o2->initialized) {
zend_throw_error(date_ce_date_object_error, "Trying to compare uninitialized DateTimeZone objects");
return 1;
return ZEND_UNCOMPARABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an unrelated cleanup to this PR and should perhaps be done separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it then need a rebase — because it shouldn't show up in this PR again then surely?

@@ -3066,6 +3027,7 @@ PHP_METHOD(DateTime, __wakeup)

if (!php_date_initialize_from_hash(&dateobj, myht)) {
zend_throw_error(NULL, "Invalid serialization data for DateTime object");
RETURN_THROWS();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: I would not do these cleanups/clarifications in the code if it's not strictly necessary in this PR. Otherwise it shows up in the git history as if it's related and that's extra noise + confusion tbh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this done in another PR? Perhaps this needs a rebase then first.

@@ -4656,6 +4623,10 @@ static void php_date_interval_initialize_from_hash(zval **return_value, php_inte
Z_STRVAL_P(date_str),
err->error_messages[0].position,
err->error_messages[0].character ? err->error_messages[0].character : ' ', err->error_messages[0].message);

timelib_time_dtor(time);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a real fix, should perhaps be backported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my original plan but I haven't had time yet: #15598 (comment) Will do it soon.


static HashTable *date_period_get_properties_for(zend_object *object, zend_prop_purpose purpose)
{
switch (purpose) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the implementation: why this switch is necessary here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy-pasted this piece of code. I'm not sure why Nikita added the switch there... but after a closer examination, it seemed that ZEND_PROP_PURPOSE_GET_OBJECT_VARS is missing from the cases 🤔 I don't see the reason... but in any case, I now removed the switch case so that the custom handler kicks in in all cases now.

return;
}

zend_class_entry *ce_ptr = ce;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unrelated to the actual change though (memory reduction); I also don't think the code style should be updated in the same PR/commit, and even if you did, I would prefer that variable declarations be kept at the top of functions.

@@ -3066,6 +3027,7 @@ PHP_METHOD(DateTime, __wakeup)

if (!php_date_initialize_from_hash(&dateobj, myht)) {
zend_throw_error(NULL, "Invalid serialization data for DateTime object");
RETURN_THROWS();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this done in another PR? Perhaps this needs a rebase then first.

@@ -5934,20 +5903,47 @@ PHP_METHOD(DatePeriod, __wakeup)
ZEND_PARSE_PARAMETERS_NONE();

period_obj = Z_PHPPERIOD_P(object);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't just add whitespace changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll get rid of the unnecessary changes once I rebase this PR on another batch of return fixes.

@@ -33,18 +33,25 @@ object(DatePeriod)#%d (%d) {
["start"]=>
object(DateTime)#%d (%d) {
["date"]=>
string(26) "2022-07-14 00:00:00.000000"
string(26) "2024-08-28 00:00:00.000000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a change in behaviour then? I don't quite understand why you need to make this changes.

["var3":protected]=>
int(3)
["var4":protected]=>
int(4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confused me. I thought these were now added where they originally were not. GH diff is confusing sometimes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the diff in this case was very confusing for me as well... But in fact, the only thing that changed is the order of the properties...

zend_throw_error(date_ce_date_object_error, "Object of type %s not been correctly initialized by calling parent::__construct() in its constructor", ZSTR_VAL(ce->name));
}
zend_throw_error(date_ce_date_object_error, "Object of type %s (inheriting %s) has not been correctly initialized by calling parent::__construct() in its constructor", ZSTR_VAL(ce->name), ZSTR_VAL(ce_ptr->name));
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be RETURN_THROWS(), not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, RETURN_THROWS() requires a return_value variable in scope (thus it can be mainly used inside PHP_FUNCTIONs/PHP_METHODs)

}
zend_throw_error(date_ce_date_object_error, "Object of type %s (inheriting %s) has not been correctly initialized by calling parent::__construct() in its constructor", ZSTR_VAL(ce->name), ZSTR_VAL(ce_ptr->name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also misses a RETURN_THROWS() (yes, I know it's the last line in the function, but still)

Copy link
Member Author

@kocsismate kocsismate Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add them separately for sure. I agree that the RETURN_THROWS() are useful

@@ -2038,7 +1999,7 @@ static int date_object_compare_timezone(zval *tz1, zval *tz2) /* {{{ */

if (!o1->initialized || !o2->initialized) {
zend_throw_error(date_ce_date_object_error, "Trying to compare uninitialized DateTimeZone objects");
return 1;
return ZEND_UNCOMPARABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it then need a rebase — because it shouldn't show up in this PR again then surely?

ext/date/php_date.stub.php Show resolved Hide resolved
@kocsismate kocsismate force-pushed the dateinterval-virtual-properties branch from 79f50f4 to 042bfa6 Compare September 3, 2024 19:36
kocsismate added a commit that referenced this pull request Sep 14, 2024
@kocsismate kocsismate force-pushed the dateinterval-virtual-properties branch 2 times, most recently from bbc6cf7 to ceb6a95 Compare September 14, 2024 20:23
@nielsdos
Copy link
Member

Note that you'll need an unset handler that forbids unsetting these virtual properties, otherwise it's going to crash because there's no backing storage. I just did something similar for DOM: #15891

@kocsismate
Copy link
Member Author

Note that you'll need an unset handler that forbids unsetting these virtual properties, otherwise it's going to crash because there's no backing storage. I just did something similar for DOM: #15891

Thanks for the heads up, I've just implemented it.

@kocsismate kocsismate force-pushed the dateinterval-virtual-properties branch from f611bf7 to bd6162c Compare September 17, 2024 06:59
@kocsismate
Copy link
Member Author

@nielsdos @derickr Can we still merge it into RC1? Or if it has already been tagged (haven't checked it), could you please take another look so that it can be merged into RC2?

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was at work when RC1 got tagged.
I'm not so sure that the rules allow optimizations to be merged still after RC1 is released.
From the PoV of how virtual properties works this LGTM. But wait for Derick to confirm. Also you still need to answer https://github.com/php/php-src/pull/15598/files#r1742364030

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke too soon.
The following will crash because when the object is uninitialized it executes the standard has_property object handler, but that can't handle virtuals of course:

<?php

$rc = new ReflectionClass('DatePeriod');
$di = $rc->newInstanceWithoutConstructor();
var_dump(isset($di->recurrences));

@kocsismate kocsismate force-pushed the dateinterval-virtual-properties branch from bd6162c to 176d285 Compare September 26, 2024 21:14
@kocsismate
Copy link
Member Author

The following will crash because when the object is uninitialized it executes the standard has_property object handler, but that can't handle virtuals of course:

Very smart observation, thank you! I've just force pushed a new commit adding the missing functionality along with a rebase to recent master because I think this optimization can wait until PHP 8.5 indeed.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see one mistake anymore

ext/date/php_date.c Show resolved Hide resolved
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong anymore

@kocsismate kocsismate merged commit 181ea64 into php:master Sep 27, 2024
10 checks passed
@kocsismate kocsismate deleted the dateinterval-virtual-properties branch September 27, 2024 20:54
jorgsowa pushed a commit to jorgsowa/php-src that referenced this pull request Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants