-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disambiguate XPaths for children of BODY with id
, class
, or role
attributes
#1797
Disambiguate XPaths for children of BODY with id
, class
, or role
attributes
#1797
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## trunk #1797 +/- ##
==========================================
+ Coverage 59.58% 59.79% +0.20%
==========================================
Files 84 84
Lines 6522 6573 +51
==========================================
+ Hits 3886 3930 +44
- Misses 2636 2643 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Oh, I just discovered that the Image block with a lightbox is involved here. With the changes in this PR applied, WordPress outputs a <div
class="wp-lightbox-overlay zoom"
data-wp-interactive="core/image"
data-wp-context='{}'
data-wp-bind--role="state.roleAttribute"
data-wp-bind--aria-label="state.currentImage.ariaLabel"
data-wp-bind--aria-modal="state.ariaModal"
data-wp-class--active="state.overlayEnabled"
data-wp-class--show-closing-animation="state.showClosingAnimation"
data-wp-watch="callbacks.setOverlayFocus"
data-wp-on--keydown="actions.handleKeydown"
data-wp-on-async--touchstart="actions.handleTouchStart"
data-wp-on--touchmove="actions.handleTouchMove"
data-wp-on-async--touchend="actions.handleTouchEnd"
data-wp-on-async--click="actions.hideLightbox"
data-wp-on-async-window--resize="callbacks.setOverlayStyles"
data-wp-on-async-window--scroll="actions.handleScroll"
tabindex="-1"
>
<button type="button" aria-label="Close" style="fill: var(--wp--preset--color--contrast)" class="close-button">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="20" height="20" aria-hidden="true" focusable="false"><path d="m13.06 12 6.47-6.47-1.06-1.06L12 10.94 5.53 4.47 4.47 5.53 10.94 12l-6.47 6.47 1.06 1.06L12 13.06l6.47 6.47 1.06-1.06L13.06 12Z"></path></svg>
</button>
<div class="lightbox-image-container">
<figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
<img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[2][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.currentImage.currentSrc">
</figure>
</div>
<div class="lightbox-image-container">
<figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
<img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.enlargedSrc">
</figure>
</div>
<div class="scrim" style="background-color: var(--wp--preset--color--base)" aria-hidden="true"></div>
<style data-wp-text="state.overlayStyles"></style>
</div> Note the XPath for the
Without the disambiguation, this would be:
This could be confused with an Image block appearing somewhere especially when Important If this change is made to add the disambiguation predicates to To ease this migration, we may want to add an performance/plugins/optimization-detective/class-od-element.php Lines 56 to 64 in 6ca5c4b
Then anywhere that we compare XPaths we should use this helper function as opposed to using Update: This is not so straightforward. For example, This is also the case for code that uses Up until now, the XPath has been very constrained in its format, so it has been a reliably fixed string for comparisons. Changing the XPath construction has impacts that need to be carefully evaluated. |
@westonruter I'm not sure I fully understand.
Are you saying this is only inserted with this change? Why that? OD shouldn't alter which DOM elements are inserted by any features right?
So are you saying without this PR merged, there would be a risk of a duplicate? If so, it would be a good indicator that the risk we assessed as small on the other PR isn't actually that small and we should cater for it.
Why that? Would it invalidate even URL metrics that have an index (how they're currently stored, without the recent |
No, I'm saying that this was already being added to the page by Gutenberg. I wasn't aware that Gutenberg was outputting lightbox elements at
Correct.
What I mean is that URL Metrics currently stored use the node index, for example |
@westonruter Makes sense. So maybe we need to leave old data intact for now then and only migrate when we have all the context we need (i.e. in a request to the relevant URL). Since #1790 is not in any release yet, at least we don't have to worry about migrating from that. But we probably should fix it before this upcoming release. This complicates things of course, but given we already found an exception in Core itself, this is probably going to happen more. Any plugin may add some |
@felixarntz Well, I mean, "migration" will happen eventually as URL Metrics turn stale and get purged from the site. It would be ideal if a change to the XPath structure didn't result in this, however. It would be as if someone installed the plugin anew without any data collected. |
…oid needless computation
… add/od-xpath-body-children-disambiguation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
@felixarntz Finally ready for review. I had to ruminate on this one a while, in particular the "transitional XPath format" aspect to prevent old URL Metrics from becoming invalid. I've updated the description to describe the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter I think this looks pretty good, though I have a few important questions on the direction in regards to the transition. Most importantly, it seems to me that we need to wait for the transition to be completed until we actually enabled URL metrics for logged-out users (which was one of the main reasons to change the xpath format in the first place).
* the same relative position, which does not seem likely. Additionally, as noted above, themes almost always wrap | ||
* the page content in an overall `DIV` or else they use semantic HTML tags like `HEADER` and `FOOTER`, so the risk | ||
* is low. | ||
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block | |
* these `DIV` elements have an `IMG` as the first child. This is also an issue in sites using the Image block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See c259b59
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block | ||
* because it outputs a `DIV.wp-lightbox-overlay.zoom` in `wp_footer`, resulting in there being a real possibility | ||
* for XPaths to not be unique in the page. Therefore, en lieu of node index being added to children of `BODY`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image lightbox is a very specific example, but I would include a more general mention of any additional divs
inserted via wp_footer
for instance (which are often for modals, like image lightbox is an example for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See c259b59
if ( isset( $this->open_stack_tags[1] ) && 'BODY' === $this->open_stack_tags[1] && 2 === $level ) { | ||
$attributes = $this->get_disambiguating_attributes(); | ||
} | ||
$this->open_stack_attributes[] = $attributes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not really aligned with what the $open_stack_tags
property does (as this one only acts for the specific case of a body
child element), maybe we should use another name? Could be something very specific like body_child_element_attributes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1797 (comment)
@@ -348,6 +381,7 @@ public function next_token(): bool { | |||
} | |||
|
|||
$popped_tag_name = array_pop( $this->open_stack_tags ); | |||
array_pop( $this->open_stack_attributes ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not need to be conditional (e.g. based on the parent tag and level)? Are we certain this wouldn't remove the attributes when it shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be conditional, no, since I'm pushing an empty array onto the stack when not at BODY > *
:
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 354 to 359 in 468b6a7
// For children of the BODY, capture disambiguating comments. See the get_xpath() method for where this data is used. | |
$attributes = array(); | |
if ( isset( $this->open_stack_tags[1] ) && 'BODY' === $this->open_stack_tags[1] && 2 === $level ) { | |
$attributes = $this->get_disambiguating_attributes(); | |
} | |
$this->open_stack_attributes[] = $attributes; |
I decided to do this this way for two reasons:
- It's consistent with how the
open_stack_tags
variable is manipulated:
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 318 to 319 in 468b6a7
$this->open_stack_tags = array(); | |
$this->open_stack_attributes = array(); |
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 345 to 346 in 468b6a7
array_splice( $this->open_stack_tags, (int) $i ); | |
array_splice( $this->open_stack_attributes, (int) $i ); |
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 333 to 334 in 468b6a7
array_pop( $this->open_stack_tags ); | |
array_pop( $this->open_stack_attributes ); |
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 502 to 503 in 468b6a7
$this->open_stack_tags = $this->bookmarked_open_stacks[ $bookmark_name ]['tags']; | |
$this->open_stack_attributes = $this->bookmarked_open_stacks[ $bookmark_name ]['attributes']; |
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 521 to 523 in 468b6a7
$this->bookmarked_open_stacks[ $name ] = array( | |
'tags' => $this->open_stack_tags, | |
'attributes' => $this->open_stack_attributes, |
performance/plugins/optimization-detective/class-od-html-tag-processor.php
Lines 563 to 565 in 468b6a7
foreach ( $this->open_stack_tags as $i => $breadcrumb_tag_name ) { | |
yield array( $breadcrumb_tag_name, $this->open_stack_indices[ $i ], $this->open_stack_attributes[ $i ] ); | |
} |
If we only captured only at the 3rd level (BODY > *
) then additional logic would be needed in all these places to figure out if we're at that level.
- We may want to capture these attributes at additional levels in the future for further disambiguation.
$replacements = array( | ||
|
||
/* | ||
* Convert the original XPath format for elements in the BODY. | ||
* | ||
* Example: | ||
* /*[1][self::HTML]/*[2][self::BODY]/*[1][self::DIV]/*[1][self::IMG] | ||
* => | ||
* /HTML/BODY/DIV/*[1][self::IMG] | ||
*/ | ||
'#^/\*\[1]\[self::HTML]/\*\[2]\[self::BODY]/\*\[\d+]\[self::([a-zA-Z0-9:_-]+)]#' => '/HTML/BODY/$1', | ||
|
||
/* | ||
* Convert the original XPath format for elements in the HEAD. | ||
* | ||
* Example: | ||
* /*[1][self::HTML]/*[1][self::HEAD]/*[1][self::META] | ||
* => | ||
* /HTML/HEAD/*[1][self::META] | ||
*/ | ||
'#^/\*\[1\]\[self::HTML\]/\*\[1\]\[self::HEAD]#' => '/HTML/HEAD', | ||
|
||
/* | ||
* Convert the new XPath format for elements in the BODY. | ||
* | ||
* Note that the new XPath format for elements in the HEAD does not need to be converted to the | ||
* transitional format since disambiguating attributes are not used in the HEAD. | ||
* | ||
* Example: | ||
* /HTML/BODY/DIV[@id='page']/*[1][self::IMG] | ||
* => | ||
* /HTML/BODY/DIV/*[1][self::IMG] | ||
*/ | ||
'#^(/HTML/BODY/\w+)\[@[^\]]+?]#' => '$1', | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand how this transitional xpath helps. Is it so that all the old xpaths and new xpaths are using the same format temporarily (i.e. without disambiguation of body child div
s)? Wouldn't that be a risk (because of the lack of disambiguation) and because of that it would be better to for now still only record URL metrics for logged-out users? If I understand correctly, I would think we should only enable the URL metric recording for logged-out users once the transition is completed, otherwise we would have the risk that we (per this PR) deemed to risky to leave unaddressed.
We'll eventually remove this so that the new xpaths format should be used exclusively, right? Maybe worth adding a TODO
here too, as I assume this means all this logic and the $transitional_xpath
property can be removed again in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it so that all the old xpaths and new xpaths are using the same format temporarily (i.e. without disambiguation of body child
div
s)?
Yes, that's right.
Wouldn't that be a risk (because of the lack of disambiguation) and because of that it would be better to for now still only record URL metrics for logged-out users?
I don't think whether the user is logged-in matters so much because the presence of the Admin Bar still won't impact the XPaths since at that level the transitional XPath is /HTML/BODY/DIV
.
If I understand correctly, I would think we should only enable the URL metric recording for logged-out users once the transition is completed, otherwise we would have the risk that we (per this PR) deemed to risky to leave unaddressed.
Once the transition is complete, all stored URL Metrics will have the disambiguating attribute predicate present on the XPath so at this point we will be even more safe to share XPaths between logged-in and logged-out users. Elements inside of DIV#page
will have XPaths that start with /HTML/BODY/DIV[@id='page']
.
We'll eventually remove this so that the new xpaths format should be used exclusively, right? Maybe worth adding a
TODO
here too, as I assume this means all this logic and the$transitional_xpath
property can be removed again in the near future.
That's right. The new format will be used exclusively, probably in the next release. And yes, all of the $transitional_xpath
will be removed at that time as well. Basically 94843c0 will be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added todos in 8c526e5
* @return string XPath. | ||
*/ | ||
public function get_xpath(): string { | ||
public function get_xpath( bool $transitional_format = true ): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wary of introducing a boolean parameter here, as it is of temporary nature - plus it's a code smell to have a parameter to change what a method does entirely, it should instead be a new method then.
A better path I think would be to introduce a separate method that we could later deprecate again - that gets in the way much less than having to deprecate a parameter of a method that will still be used long-term.
Since the default expectation for the transitional period here is to return the transitional format, I think we should replace the logic of this method entirely with what you have in the if ( $transitional_format )
clause, and move the original logic to some new method like get_xpath_to_store()
or something. The one place in the code that currently calls this with parameter false
could then call the new temporary method, and once we are done with the transition, we could deprecate this method and move its logic back into this method, with fewer annoying side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8c526e5
Co-authored-by: felixarntz <[email protected]>
…ethod Add todos to remove transitional XPath logic Co-authored-by: felixarntz <[email protected]>
The transitional logic here basically maintains the behavior to what was introduced in #1790, where there is no index and no disambiguating attribute predicates added to children of Related: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter LGTM!
I just noticed if we follow WordPress version convention, the upcoming OD plugin release will be 1.0.0. Is that the plan? Given the changed xpath format and the improved stability of the plugin, I feel like it's great timing actually.
@@ -200,6 +200,7 @@ function od_get_current_url_metrics_etag( OD_Tag_Visitor_Registry $tag_visitor_r | |||
} | |||
|
|||
$data = array( | |||
'xpath_version' => 2, // Bump whenever a major change to the XPath format occurs so that new URL Metrics are proactively gathered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Will this need another bump once we switch from the transition format to the new format? Or is this only relevant for the new format, which now already is being stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe set it to 1
, just because it better aligns with the current/upcoming version? And because it wasn't there before its mere presence should trigger an invalidation of existing metrics anyway, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this should be considered more like $wp_db_version
where it is just an incremented integer. Originally I had it set to OPTIMIZATION_DETECTIVE_VERSION
but there's no need for the ETag to change with each version unless there's something fundamentally changed about the format, like with the change here to the XPath structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this need another bump once we switch from the transition format to the new format? Or is this only relevant for the new format, which now already is being stored?
We won't need to bump it after the transitional period because at that point the URL Metrics will all have the new XPaths stored, so then it's just a matter of actually using them as opposed to the transitional form (🐵).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe set it to
1
, just because it better aligns with the current/upcoming version? And because it wasn't there before its mere presence should trigger an invalidation of existing metrics anyway, correct?
Right, it could be anything since anything would be different than nothing. But I thought 2
made sense since it's the second version after the initial format we had been using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would slightly prefer 1
(you could argue the previous one was 0
, just like the release versions have been 0.*
). But either works for me.
We could, but I was thinking to hold off one more release perhaps, until WordCamp Asia, to make the 1.0.0 version. So I think the next version we can have be |
We're obviously not tied to the regular WordPress versioning approach, but we have been doing it that way for all the performance plugins so far (after |
Not quite, as for Modern Image Formats we went from 1.1.1 to 2.0.0 when it was no longer strictly about WebP. I don't feel strongly, however.
Not specifically because of WordCamp Asia, but more because this is the point at which the documentation and code I'd consider stable as it getting increased exposure. |
That's true, though I meant specifically the other part of the versioning approach, to not have a minor version segment greater than a single digit. If the plugin wasn't in a 1.0.0 worthy state, I would definitely reconsider this here. But I feel like it is a good time to make it 1.0.0. Alternatively, why don't we make it
From that perspective, it may be a better approach to release another beta or an RC for that event, as increased exposure would uncover potential problems that would ideally be addressed before a 1.0.0 stable. I think having a 1.0.0 beta or RC release ready for such a major event is still an excellent milestone and invites users to give feedback. |
Doing a 1.0.0 beta release makes sense to me. We'd just have to make sure our tooling supports such version identifiers. |
@westonruter Looks like in some places in our tooling we cater for version suffixes like I think it should be rather trivial to modify the relevant regexes that don't support the suffix yet. Then we could do a |
This is a follow-up to #1790 which fixed #1787.
It is now possible for a generated XPath may not uniquely identify an element in a page. Consider the following example:
DIV
for header and footer,<div id="header" role="banner">
and<div id="footer" role="contentinfo">
, which are direct children of theBODY
.DIV
:In this case, both of the
IMG
tags here will have the same XPath:/HTML/BODY/DIV/*[1][self::IMG]
.I did a search for
<div[^>]*role=["']?banner
in WPDirectory and the only example I could find with a header and footer being direct children ofBODY
is the Prose theme (2,000 active installs and not updated in 12 years): https://themes.trac.wordpress.org/browser/prose/1.1/header.phpGranted, this search assumes that a theme is using the right ARIA roles. It wouldn't account for someone using
<div id="header">
.As noted in #1790 (comment):
This is experimented on in this PR, by adding the
id
,class
, androle
attributes to the XPaths generated for children of theBODY
element. So in the example above, instead of both elements having the same XPath of/HTML/BODY/DIV/*[1][self::IMG]
they would then be distinct:/HTML/BODY/DIV[@id='header'][@role='banner']/*[1][self::IMG]
/HTML/BODY/DIV[@id='footer'][@role='contentinfo']/*[1][self::IMG]
The key changes in this PR are found in:
plugins/optimization-detective/class-od-html-tag-processor.php
.plugins/optimization-detective/class-od-element.php
plugins/optimization-detective/optimization.php
plugins/optimization-detective/storage/data.php
Included in in the changes (94843c0) is the introduction of a "transitional XPath" format so that both of these XPaths:
/*[1][self::HTML]/*[2][self::BODY]/*[1][self::DIV]/*[1][self::IMG]
/HTML/BODY/DIV[@id=\'page\']/*[1][self::IMG]
Are converted to
/HTML/BODY/DIV/*[1][self::IMG]
during the optimization process. Note the the new format is what is added to thedata-od-xpath
attributes and it is stored in theod_url_metrics
custom post type. However, when a tag visitor accesses$processor->get_xpath()
or$element->get_xpath()
the XPaths are normalized to this transitional format. This is to prevent all of the URL Metrics currently captured in the wild from immediately becoming invalid. See theplugins/image-prioritizer/tests/test-cases/common-lcp-image-with-old-xpath-format
test case to show this in action.👉🏻 Note that this transitional format omits the disambiguating attribute for the sake of not throwing away existing URL Metrics. I consider the impact of a non-unique XPath conflict to be less than the negative performance implications of stopping all optimizations immediately upon updating until new URL Metrics are collected.
To further accelerate the collection of new URL Metrics with the new format, an XPath version is added to the data used to compute the current URL Metrics ETag (c47a613 & 2922d4c).