Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate XPaths for children of BODY with id, class, or role attributes #1797

Merged
merged 18 commits into from
Jan 23, 2025

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 13, 2025

This is a follow-up to #1790 which fixed #1787.

It is now possible for a generated XPath may not uniquely identify an element in a page. Consider the following example:

  1. A theme uses a DIV for header and footer, <div id="header" role="banner"> and <div id="footer" role="contentinfo">, which are direct children of the BODY.
  2. There is an element of the same tag name in the exact same relative position in both of these DIV:
<html>
  <body>
    <div id="header" role="banner">
      <img src="header-image.jpg">
    </div>
    <div id="footer" role="contentinfo">
      <img src="footer-logo.png">
    </div>
  </body>
</html>

In this case, both of the IMG tags here will have the same XPath: /HTML/BODY/DIV/*[1][self::IMG].

I did a search for <div[^>]*role=["']?banner in WPDirectory and the only example I could find with a header and footer being direct children of BODY is the Prose theme (2,000 active installs and not updated in 12 years): https://themes.trac.wordpress.org/browser/prose/1.1/header.php

Granted, this search assumes that a theme is using the right ARIA roles. It wouldn't account for someone using <div id="header">.

As noted in #1790 (comment):

we could instead look at other ways to disambiguate children of the BODY, such as to capture the id, class, and role attributes to include with the XPath for that level alone.

This is experimented on in this PR, by adding the id, class, and role attributes to the XPaths generated for children of the BODY element. So in the example above, instead of both elements having the same XPath of /HTML/BODY/DIV/*[1][self::IMG] they would then be distinct:

  1. /HTML/BODY/DIV[@id='header'][@role='banner']/*[1][self::IMG]
  2. /HTML/BODY/DIV[@id='footer'][@role='contentinfo']/*[1][self::IMG]

The key changes in this PR are found in:

  • plugins/optimization-detective/class-od-html-tag-processor.php.
  • plugins/optimization-detective/class-od-element.php
  • plugins/optimization-detective/optimization.php
  • plugins/optimization-detective/storage/data.php

Included in in the changes (94843c0) is the introduction of a "transitional XPath" format so that both of these XPaths:

  • Old format: /*[1][self::HTML]/*[2][self::BODY]/*[1][self::DIV]/*[1][self::IMG]
  • New format: /HTML/BODY/DIV[@id=\'page\']/*[1][self::IMG]

Are converted to /HTML/BODY/DIV/*[1][self::IMG] during the optimization process. Note the the new format is what is added to the data-od-xpath attributes and it is stored in the od_url_metrics custom post type. However, when a tag visitor accesses $processor->get_xpath() or $element->get_xpath() the XPaths are normalized to this transitional format. This is to prevent all of the URL Metrics currently captured in the wild from immediately becoming invalid. See the plugins/image-prioritizer/tests/test-cases/common-lcp-image-with-old-xpath-format test case to show this in action.

👉🏻 Note that this transitional format omits the disambiguating attribute for the sake of not throwing away existing URL Metrics. I consider the impact of a non-unique XPath conflict to be less than the negative performance implications of stopping all optimizations immediately upon updating until new URL Metrics are collected.

To further accelerate the collection of new URL Metrics with the new format, an XPath version is added to the data used to compute the current URL Metrics ETag (c47a613 & 2922d4c).

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) labels Jan 13, 2025
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 75.36232% with 17 lines in your changes missing coverage. Please review.

Project coverage is 59.79%. Comparing base (6803bc9) to head (4f2779a).
Report is 54 commits behind head on trunk.

Files with missing lines Patch % Lines
...mization-detective/class-od-html-tag-processor.php 66.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            trunk    #1797      +/-   ##
==========================================
+ Coverage   59.58%   59.79%   +0.20%     
==========================================
  Files          84       84              
  Lines        6522     6573      +51     
==========================================
+ Hits         3886     3930      +44     
- Misses       2636     2643       +7     
Flag Coverage Δ
multisite 59.79% <75.36%> (+0.20%) ⬆️
single 35.81% <0.00%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonruter
Copy link
Member Author

westonruter commented Jan 13, 2025

Oh, I just discovered that the Image block with a lightbox is involved here. With the changes in this PR applied, WordPress outputs a div.wp-lightbox-overlay.zoom element at wp_footer via block_core_image_print_lightbox_overlay():

<div
        class="wp-lightbox-overlay zoom"
        data-wp-interactive="core/image"
        data-wp-context='{}'
        data-wp-bind--role="state.roleAttribute"
        data-wp-bind--aria-label="state.currentImage.ariaLabel"
        data-wp-bind--aria-modal="state.ariaModal"
        data-wp-class--active="state.overlayEnabled"
        data-wp-class--show-closing-animation="state.showClosingAnimation"
        data-wp-watch="callbacks.setOverlayFocus"
        data-wp-on--keydown="actions.handleKeydown"
        data-wp-on-async--touchstart="actions.handleTouchStart"
        data-wp-on--touchmove="actions.handleTouchMove"
        data-wp-on-async--touchend="actions.handleTouchEnd"
        data-wp-on-async--click="actions.hideLightbox"
        data-wp-on-async-window--resize="callbacks.setOverlayStyles"
        data-wp-on-async-window--scroll="actions.handleScroll"
        tabindex="-1"
>
    <button type="button" aria-label="Close" style="fill: var(--wp--preset--color--contrast)" class="close-button">
        <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="20" height="20" aria-hidden="true" focusable="false"><path d="m13.06 12 6.47-6.47-1.06-1.06L12 10.94 5.53 4.47 4.47 5.53 10.94 12l-6.47 6.47 1.06 1.06L12 13.06l6.47 6.47 1.06-1.06L13.06 12Z"></path></svg>
    </button>
    <div class="lightbox-image-container">
        <figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
            <img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class=&#039;wp-lightbox-overlay zoom&#039;]/*[2][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.currentImage.currentSrc">
        </figure>
    </div>
    <div class="lightbox-image-container">
        <figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
            <img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class=&#039;wp-lightbox-overlay zoom&#039;]/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.enlargedSrc">
        </figure>
    </div>
    <div class="scrim" style="background-color: var(--wp--preset--color--base)" aria-hidden="true"></div>
    <style data-wp-text="state.overlayStyles"></style>
</div>

Note the XPath for the IMG in the overlay:

/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

Without the disambiguation, this would be:

/HTML/BODY/DIV/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

This could be confused with an Image block appearing somewhere especially when DIV is used to wrap the header, main content, footer, or sidebar.

Important

If this change is made to add the disambiguation predicates to /HTML/BODY/DIV this will result in invalidating all existing XPaths for currently stored URL Metrics.

To ease this migration, we may want to add an od_is_xpath_equal( $xpath1, $xpath2 ) helper function which would check to see if a provided XPath contains the old node index up to the children of the BODY. If so, the comparison can fall back to stripping out both the node indices as well as any attribute predicates. The logic added to the OD_Element constructor to convert the XPath could be moved to this new helper function:

// Convert old XPath scheme.
$xpath = preg_replace(
'#^/\*\[1\]\[self::HTML\]/\*\[2\]\[self::BODY\]/\*\[\d+\]\[self::([a-zA-Z0-9:_-]+)\]#',
'/HTML/BODY/$1',
$this->data['xpath']
);
if ( is_string( $xpath ) && '' !== $xpath ) {
$this->data['xpath'] = $xpath;
}

Then anywhere that we compare XPaths we should use this helper function as opposed to using ===.

Update: This is not so straightforward. For example, \OD_URL_Metric_Group_Collection::is_element_positioned_in_any_initial_viewport() is provided an XPath as the argument. The XPath is then fetched from a hash lookup of the mappings returned by \OD_URL_Metric_Group_Collection::get_all_elements_positioned_in_any_initial_viewport().

This is also the case for code that uses \OD_URL_Metric_Group_Collection::get_xpath_elements_map() or \OD_URL_Metric_Group_Collection::get_all_element_max_intersection_ratios(), and so on.

Up until now, the XPath has been very constrained in its format, so it has been a reliably fixed string for comparisons. Changing the XPath construction has impacts that need to be carefully evaluated.

@felixarntz
Copy link
Member

@westonruter I'm not sure I fully understand.

Oh, I just discovered that the Image block with a lightbox is involved here. With the changes in this PR applied, WordPress outputs a div.wp-lightbox-overlay.zoom element at wp_footer via block_core_image_print_lightbox_overlay():

Are you saying this is only inserted with this change? Why that? OD shouldn't alter which DOM elements are inserted by any features right?

Note the XPath for the IMG in the overlay:

/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

Without the disambiguation, this would be:

/HTML/BODY/DIV/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

This could be confused with an Image block appearing somewhere especially when DIV is used to wrap the header, main content, footer, or sidebar.

So are you saying without this PR merged, there would be a risk of a duplicate? If so, it would be a good indicator that the risk we assessed as small on the other PR isn't actually that small and we should cater for it.

Important

If this change is made to add the disambiguation predicates to /HTML/BODY/DIV this will result in invalidating all existing XPaths for currently stored URL Metrics.

Why that? Would it invalidate even URL metrics that have an index (how they're currently stored, without the recent trunk changes)? As far as I understand, the URL metrics with an index don't have that problem, so I'd argue the disambiguation only applies to the URL metrics without indices on the body children.

@westonruter
Copy link
Member Author

Are you saying this is only inserted with this change? Why that? OD shouldn't alter which DOM elements are inserted by any features right?

No, I'm saying that this was already being added to the page by Gutenberg. I wasn't aware that Gutenberg was outputting lightbox elements at wp_footer, so it breaks the premise I had that all content is inside of DIV.wp-site-blocks or DIV#page.

So are you saying without this PR merged, there would be a risk of a duplicate? If so, it would be a good indicator that the risk we assessed as small on the other PR isn't actually that small and we should cater for it.

Correct.

Why that? Would it invalidate even URL metrics that have an index (how they're currently stored, without the recent trunk changes)? As far as I understand, the URL metrics with an index don't have that problem, so I'd argue the disambiguation only applies to the URL metrics without indices on the body children.

What I mean is that URL Metrics currently stored use the node index, for example /*[1][self::HTML]/*[2][self::BODY]/*[3][self::DIV]/*[2][self::IMG]. Prior to this PR, with #1790, such old XPaths were migrated to the new schema via OD_Element so this would instead be /HTML/BODY/DIV/*[2][self::IMG]. However, if we start introducing element attributes to disambiguate /HTML/BODY/DIV then we won't be able to so easily migrate since we won't know what the attributes are on /HTML/BODY/DIV. Like, we wouldn't know how to convert /*[1][self::HTML]/*[2][self::BODY]/*[3][self::DIV] to /HTML/BODY/DIV[@id='page']. That is, unless we capture the attributes of this element as we iterate over the document and then use them just in time.

@felixarntz
Copy link
Member

@westonruter Makes sense. So maybe we need to leave old data intact for now then and only migrate when we have all the context we need (i.e. in a request to the relevant URL). Since #1790 is not in any release yet, at least we don't have to worry about migrating from that. But we probably should fix it before this upcoming release.

This complicates things of course, but given we already found an exception in Core itself, this is probably going to happen more. Any plugin may add some div in wp_footer, e.g. for modals etc.

@westonruter
Copy link
Member Author

Makes sense. So maybe we need to leave old data intact for now then and only migrate when we have all the context we need (i.e. in a request to the relevant URL). Since #1790 is not in any release yet, at least we don't have to worry about migrating from that. But we probably should fix it before this upcoming release.

@felixarntz Well, I mean, "migration" will happen eventually as URL Metrics turn stale and get purged from the site. It would be ideal if a change to the XPath structure didn't result in this, however. It would be as if someone installed the plugin anew without any data collected.

… add/od-xpath-body-children-disambiguation
@westonruter westonruter marked this pull request as ready for review January 22, 2025 21:06
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <[email protected]>
Co-authored-by: felixarntz <[email protected]>
Co-authored-by: swissspidy <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@westonruter
Copy link
Member Author

@felixarntz Finally ready for review. I had to ruminate on this one a while, in particular the "transitional XPath format" aspect to prevent old URL Metrics from becoming invalid. I've updated the description to describe the changes.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I think this looks pretty good, though I have a few important questions on the direction in regards to the transition. Most importantly, it seems to me that we need to wait for the transition to be completed until we actually enabled URL metrics for logged-out users (which was one of the main reasons to change the xpath format in the first place).

* the same relative position, which does not seem likely. Additionally, as noted above, themes almost always wrap
* the page content in an overall `DIV` or else they use semantic HTML tags like `HEADER` and `FOOTER`, so the risk
* is low.
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block
* these `DIV` elements have an `IMG` as the first child. This is also an issue in sites using the Image block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c259b59

Comment on lines 115 to 117
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block
* because it outputs a `DIV.wp-lightbox-overlay.zoom` in `wp_footer`, resulting in there being a real possibility
* for XPaths to not be unique in the page. Therefore, en lieu of node index being added to children of `BODY`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image lightbox is a very specific example, but I would include a more general mention of any additional divs inserted via wp_footer for instance (which are often for modals, like image lightbox is an example for).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c259b59

if ( isset( $this->open_stack_tags[1] ) && 'BODY' === $this->open_stack_tags[1] && 2 === $level ) {
$attributes = $this->get_disambiguating_attributes();
}
$this->open_stack_attributes[] = $attributes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not really aligned with what the $open_stack_tags property does (as this one only acts for the specific case of a body child element), maybe we should use another name? Could be something very specific like body_child_element_attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -348,6 +381,7 @@ public function next_token(): bool {
}

$popped_tag_name = array_pop( $this->open_stack_tags );
array_pop( $this->open_stack_attributes );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need to be conditional (e.g. based on the parent tag and level)? Are we certain this wouldn't remove the attributes when it shouldn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be conditional, no, since I'm pushing an empty array onto the stack when not at BODY > *:

// For children of the BODY, capture disambiguating comments. See the get_xpath() method for where this data is used.
$attributes = array();
if ( isset( $this->open_stack_tags[1] ) && 'BODY' === $this->open_stack_tags[1] && 2 === $level ) {
$attributes = $this->get_disambiguating_attributes();
}
$this->open_stack_attributes[] = $attributes;

I decided to do this this way for two reasons:

  1. It's consistent with how the open_stack_tags variable is manipulated:

$this->open_stack_tags = array();
$this->open_stack_attributes = array();

array_splice( $this->open_stack_tags, (int) $i );
array_splice( $this->open_stack_attributes, (int) $i );

array_pop( $this->open_stack_tags );
array_pop( $this->open_stack_attributes );

$this->open_stack_tags = $this->bookmarked_open_stacks[ $bookmark_name ]['tags'];
$this->open_stack_attributes = $this->bookmarked_open_stacks[ $bookmark_name ]['attributes'];

$this->bookmarked_open_stacks[ $name ] = array(
'tags' => $this->open_stack_tags,
'attributes' => $this->open_stack_attributes,

foreach ( $this->open_stack_tags as $i => $breadcrumb_tag_name ) {
yield array( $breadcrumb_tag_name, $this->open_stack_indices[ $i ], $this->open_stack_attributes[ $i ] );
}

If we only captured only at the 3rd level (BODY > *) then additional logic would be needed in all these places to figure out if we're at that level.

  1. We may want to capture these attributes at additional levels in the future for further disambiguation.

Comment on lines +140 to +174
$replacements = array(

/*
* Convert the original XPath format for elements in the BODY.
*
* Example:
* /*[1][self::HTML]/*[2][self::BODY]/*[1][self::DIV]/*[1][self::IMG]
* =>
* /HTML/BODY/DIV/*[1][self::IMG]
*/
'#^/\*\[1]\[self::HTML]/\*\[2]\[self::BODY]/\*\[\d+]\[self::([a-zA-Z0-9:_-]+)]#' => '/HTML/BODY/$1',

/*
* Convert the original XPath format for elements in the HEAD.
*
* Example:
* /*[1][self::HTML]/*[1][self::HEAD]/*[1][self::META]
* =>
* /HTML/HEAD/*[1][self::META]
*/
'#^/\*\[1\]\[self::HTML\]/\*\[1\]\[self::HEAD]#' => '/HTML/HEAD',

/*
* Convert the new XPath format for elements in the BODY.
*
* Note that the new XPath format for elements in the HEAD does not need to be converted to the
* transitional format since disambiguating attributes are not used in the HEAD.
*
* Example:
* /HTML/BODY/DIV[@id='page']/*[1][self::IMG]
* =>
* /HTML/BODY/DIV/*[1][self::IMG]
*/
'#^(/HTML/BODY/\w+)\[@[^\]]+?]#' => '$1',
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how this transitional xpath helps. Is it so that all the old xpaths and new xpaths are using the same format temporarily (i.e. without disambiguation of body child divs)? Wouldn't that be a risk (because of the lack of disambiguation) and because of that it would be better to for now still only record URL metrics for logged-out users? If I understand correctly, I would think we should only enable the URL metric recording for logged-out users once the transition is completed, otherwise we would have the risk that we (per this PR) deemed to risky to leave unaddressed.

We'll eventually remove this so that the new xpaths format should be used exclusively, right? Maybe worth adding a TODO here too, as I assume this means all this logic and the $transitional_xpath property can be removed again in the near future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it so that all the old xpaths and new xpaths are using the same format temporarily (i.e. without disambiguation of body child divs)?

Yes, that's right.

Wouldn't that be a risk (because of the lack of disambiguation) and because of that it would be better to for now still only record URL metrics for logged-out users?

I don't think whether the user is logged-in matters so much because the presence of the Admin Bar still won't impact the XPaths since at that level the transitional XPath is /HTML/BODY/DIV.

If I understand correctly, I would think we should only enable the URL metric recording for logged-out users once the transition is completed, otherwise we would have the risk that we (per this PR) deemed to risky to leave unaddressed.

Once the transition is complete, all stored URL Metrics will have the disambiguating attribute predicate present on the XPath so at this point we will be even more safe to share XPaths between logged-in and logged-out users. Elements inside of DIV#page will have XPaths that start with /HTML/BODY/DIV[@id='page'].

We'll eventually remove this so that the new xpaths format should be used exclusively, right? Maybe worth adding a TODO here too, as I assume this means all this logic and the $transitional_xpath property can be removed again in the near future.

That's right. The new format will be used exclusively, probably in the next release. And yes, all of the $transitional_xpath will be removed at that time as well. Basically 94843c0 will be reverted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added todos in 8c526e5

* @return string XPath.
*/
public function get_xpath(): string {
public function get_xpath( bool $transitional_format = true ): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wary of introducing a boolean parameter here, as it is of temporary nature - plus it's a code smell to have a parameter to change what a method does entirely, it should instead be a new method then.

A better path I think would be to introduce a separate method that we could later deprecate again - that gets in the way much less than having to deprecate a parameter of a method that will still be used long-term.

Since the default expectation for the transitional period here is to return the transitional format, I think we should replace the logic of this method entirely with what you have in the if ( $transitional_format ) clause, and move the original logic to some new method like get_xpath_to_store() or something. The one place in the code that currently calls this with parameter false could then call the new temporary method, and once we are done with the transition, we could deprecate this method and move its logic back into this method, with fewer annoying side effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8c526e5

westonruter and others added 2 commits January 22, 2025 14:18
…ethod

Add todos to remove transitional XPath logic

Co-authored-by: felixarntz <[email protected]>
@westonruter
Copy link
Member Author

Most importantly, it seems to me that we need to wait for the transition to be completed until we actually enabled URL metrics for logged-out users (which was one of the main reasons to change the xpath format in the first place).

The transitional logic here basically maintains the behavior to what was introduced in #1790, where there is no index and no disambiguating attribute predicates added to children of BODY. However, the key change is that it starts collecting URL Metrics with the new format so that once the transitional period is over we can go live with the XPaths with the disambiguating predicates. During this transitional period, there is a chance for multiple XPaths to be equal in a given document as outlined in the problem statement of this PR. However, I deem this to not be extremely likely to occur since the two element trees would have to be identical. It seems worthwhile to risk a possible conflict while not invalidating all existing URL Metrics which have the old XPath format.

Related:

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter LGTM!

I just noticed if we follow WordPress version convention, the upcoming OD plugin release will be 1.0.0. Is that the plan? Given the changed xpath format and the improved stability of the plugin, I feel like it's great timing actually.

@@ -200,6 +200,7 @@ function od_get_current_url_metrics_etag( OD_Tag_Visitor_Registry $tag_visitor_r
}

$data = array(
'xpath_version' => 2, // Bump whenever a major change to the XPath format occurs so that new URL Metrics are proactively gathered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Will this need another bump once we switch from the transition format to the new format? Or is this only relevant for the new format, which now already is being stored?

Copy link
Member

@felixarntz felixarntz Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe set it to 1, just because it better aligns with the current/upcoming version? And because it wasn't there before its mere presence should trigger an invalidation of existing metrics anyway, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this should be considered more like $wp_db_version where it is just an incremented integer. Originally I had it set to OPTIMIZATION_DETECTIVE_VERSION but there's no need for the ETag to change with each version unless there's something fundamentally changed about the format, like with the change here to the XPath structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this need another bump once we switch from the transition format to the new format? Or is this only relevant for the new format, which now already is being stored?

We won't need to bump it after the transitional period because at that point the URL Metrics will all have the new XPaths stored, so then it's just a matter of actually using them as opposed to the transitional form (🐵).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe set it to 1, just because it better aligns with the current/upcoming version? And because it wasn't there before its mere presence should trigger an invalidation of existing metrics anyway, correct?

Right, it could be anything since anything would be different than nothing. But I thought 2 made sense since it's the second version after the initial format we had been using.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would slightly prefer 1 (you could argue the previous one was 0, just like the release versions have been 0.*). But either works for me.

@westonruter
Copy link
Member Author

I just noticed if we follow WordPress version convention, the upcoming OD plugin release will be 1.0.0. Is that the plan? Given the changed xpath format and the improved stability of the plugin, I feel like it's great timing actually.

We could, but I was thinking to hold off one more release perhaps, until WordCamp Asia, to make the 1.0.0 version. So I think the next version we can have be 0.10.0 as a sort of RC.

@westonruter westonruter merged commit 762e52a into trunk Jan 23, 2025
20 checks passed
@westonruter westonruter deleted the add/od-xpath-body-children-disambiguation branch January 23, 2025 00:08
@felixarntz
Copy link
Member

felixarntz commented Jan 23, 2025

I just noticed if we follow WordPress version convention, the upcoming OD plugin release will be 1.0.0. Is that the plan? Given the changed xpath format and the improved stability of the plugin, I feel like it's great timing actually.

We could, but I was thinking to hold off one more release perhaps, until WordCamp Asia, to make the 1.0.0 version. So I think the next version we can have be 0.10.0 as a sort of RC.

We're obviously not tied to the regular WordPress versioning approach, but we have been doing it that way for all the performance plugins so far (after {x}.9.0 comes {x+1}.0.0). I think especially because of where the Optimization Detective plugin is at this point, I would say it would be a great point to use 1.0.0. I don't think we should tie it to a specific event.

@westonruter
Copy link
Member Author

We're obviously not tied to the regular WordPress versioning approach, but we have been doing it that way for all the performance plugins so far (after {x}.9.0 comes {x+1}.0.0). I think especially because of where the Optimization Detective plugin is at this point, I would say it would be a great point to use 1.0.0.

Not quite, as for Modern Image Formats we went from 1.1.1 to 2.0.0 when it was no longer strictly about WebP. I don't feel strongly, however.

I don't think we should tie it to a specific event.

Not specifically because of WordCamp Asia, but more because this is the point at which the documentation and code I'd consider stable as it getting increased exposure.

@felixarntz
Copy link
Member

Not quite, as for Modern Image Formats we went from 1.1.1 to 2.0.0 when it was no longer strictly about WebP. I don't feel strongly, however.

That's true, though I meant specifically the other part of the versioning approach, to not have a minor version segment greater than a single digit.

If the plugin wasn't in a 1.0.0 worthy state, I would definitely reconsider this here. But I feel like it is a good time to make it 1.0.0.

Alternatively, why don't we make it 1.0.0-beta.1? Of course WordPress.org would still ship this to all users on previous versions, but in this particular scenario I think that is totally fine, because anything pre-1.0.0 can be considered a pre release anyway, just like any -beta release.

Not specifically because of WordCamp Asia, but more because this is the point at which the documentation and code I'd consider stable as it getting increased exposure.

From that perspective, it may be a better approach to release another beta or an RC for that event, as increased exposure would uncover potential problems that would ideally be addressed before a 1.0.0 stable. I think having a 1.0.0 beta or RC release ready for such a major event is still an excellent milestone and invites users to give feedback.

@westonruter
Copy link
Member Author

Doing a 1.0.0 beta release makes sense to me. We'd just have to make sure our tooling supports such version identifiers.

@felixarntz
Copy link
Member

@westonruter Looks like in some places in our tooling we cater for version suffixes like -beta, while in others we don't: https://github.com/search?q=repo%3AWordPress%2Fperformance+%5Cd%2B%5C.%5Cd%2B%5C.%5Cd%2B&type=code

I think it should be rather trivial to modify the relevant regexes that don't support the suffix yet. Then we could do a 1.0.0-beta for Optimization Detective next week.

@westonruter westonruter mentioned this pull request Jan 26, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
2 participants