Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a regression test for url_search_params::sort taken from WPT #861

Closed
wants to merge 1 commit into from

Conversation

npaun
Copy link
Contributor

@npaun npaun commented Jan 30, 2025

While working on integrating Web Platform Tests as a test suite for Cloudflare's worked, I discovered a situation where ada-url's implementation of url_search_params::sort appears not to be following the URL spec.

The URL spec provides the following requirements for URLSearchParams.sort:

The sort() method steps are:

  1. Sort all tuples in this’s list, if any, by their names. Sorting must be done by comparison of code units. The relative order between tuples with equal names must be preserved.

https://url.spec.whatwg.org/#example-searchparams-sort

WPT's urlsearchparams-sort.any.js test uses the following example to demonstrate this:

{
"input": "ffi&🌈", // 🌈 > code point, but < code unit because two code units
"output": [["🌈", ""], ["ffi", ""]]
},
https://github.com/web-platform-tests/wpt/blob/master/url/urlsearchparams-sort.any.js#L11

So the spec requires "comparsion of code units" but as far as I can tell, it isn't very clear about what encoding should be used to perform the comparsion. Based on the WPT test, and checking other implementations like Node and Chromium, the encoding used is UTF-16. This makes sense, as this matches the behaviour of Javascript's String type.

I can fix this by implementing something like ada::idna::utf8_to_utf32 to do a conversion from utf8 to utf16 before sorting the keys, but please let me know if you have an alternative approach in mind - I'd be happy to implement it that way instead.

@anonrig anonrig requested a review from lemire January 30, 2025 22:29
@npaun npaun force-pushed the npaun/urlsearchparams-sort-wpt-bug branch from aa8bdf3 to fe2efd1 Compare January 30, 2025 22:41
@lemire
Copy link
Member

lemire commented Jan 30, 2025

@npaun Sorting in UTF-32 and UTF-8 should be equivalent.

Regarding UTF-16, it will differ from UTF-32/UTF-8 but only when surrogate pairs are involved (I believe this is correct).

We can avoid conversion and memory allocation.

@lemire
Copy link
Member

lemire commented Jan 31, 2025

I am going to close this PR and reopen another one with a fix.

@lemire lemire closed this Jan 31, 2025
@lemire lemire mentioned this pull request Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants