-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undocumented/Unspecified Behavior #67
Comments
Awesome! What was the goal of putting together a new implementation of keep-sorted? In what ways did you generalize it? What happens if you use https://github.com/google/re2j for regular expressions in Java? I suspect that would make some of those differences go away.
This is a remnant of a one-off google3 bug fix :) keep-sorted/keepsorted/block.go Lines 94 to 101 in 8f99129
Uhhh, I see that snippet in comments.in, not ignore_prefixes.in: keep-sorted/goldens/comments.in Line 78 in 8f99129
That sounds like a recent bug fix: #55 As a human, I'm not sure I would actually want '
Indeed, it's not documented. There's also an outstanding issue to make some improvements in that area: #33
Assuming the string handling you're talking about is how
https://github.com/google/keep-sorted?tab=readme-ov-file#usage mentions that nested keep-sorted blocks are supported. What else would you like to see documented about them? Taking a step back, I agree that there are some spots where the documentation could be better, but we do need to take the intended audience of the document into account. For instance, I don't think documenting the differences between re2 and other regex engines would be a good use of the README for its intended reader. Perhaps we could have a separate document tailored to an audience that's more interested in implementation details like that, but I would argue that the golden files themselves might be a reasonable enough document to capture nuances like that |
Goal was mostly to avoid waiting for #37 and to implement support for a By generalization i meant something along the lines of going from the generic option descriptions and not hardcoding a lot of the language construct logic.
The goal was less so to match exactly on the regexp front. I figured escaping { should work in both impls so was hoping it would be a quick fix given the feature is new. For now i'll just do some pre-processing on some of the goldens to escape it before i run the testcase against it. It's basically just that and the new line one so not a huge difference.
Thanks, I'll do that too then i guess. Reverse engineering whitespace/blank line handling from the test goldens was like 50% of the effort and this was the only one i couldn't figure out so figured it must be a google3 one-off and i intentionally did not look at the source when building my solution.
I agree, though some tl;dr of how keep-sorted treeats whitespace might be useful to document.
Funnily enough my impl will have the same behavior even as noted ni that issue though there were no tests or docs covering those comment edge-cases. I think i might just drop trailing comments from the sort key and use the sort key to track commas instead of the original source 🤔
I saw that but was confused because:
seems to go against what the tests do (in the tests the \n is ignored i think). But also, i assumed that comment implied it would break if i had unbalanced braces inside string literals (and was preparing for any backslash about people not being able to use brackets in comments or strings i was going to force sort some files), but only realized it handled them when i ran against the goldens. So it's a good thing, just unexpected.
Sorry, i missed it. I read all the options, but not the intro sections.
Agreed, my goal was more to just add a
I agree for this case but i think a more general section with quick FYIs around things like whitespace handling and trailing commas and trailing new lines, and what special language constructs are one-offed might be useful. Basically my goal is that given a golden |
re2j did work fwiw and all the golden tests now pass in my implementation 👍 |
Both of those are things I'd like to have in this version of keep-sorted too. If you have the bandwidth I'd love to have contributions for either or both of those :)
https://github.com/google/keep-sorted?tab=readme-ov-file#regular-expressions does link out to http://godoc/pkg/regexp/syntax/, which actually doesn't mention re2. http://godoc/pkg/regexp/ mentions re2, and perhaps we could just document explicitly that we're using re2 If I'm hearing you correctly, it sounds like these would be the relevant pieces of documentation to add:
|
Yep, with the final checkbox item being how it treats identical items post processing. It seems like it uses the original unprocessed line as the sort tie-breaker (i thought i saw some example using stable story but i can no longer find it so i was likely mistaken earlier): by_regex (uses original string to tie-break): keep-sorted/goldens/by_regex.in Line 36 in 8d64151
FOO_ after BAR_: keep-sorted/goldens/by_regex.out Line 36 in 8d64151
FYI: godoc is a google-internal DNS entry (go.dev is the canonical external redirect domain I think). |
I hacked together a quick greenfield + generalized implementation of keep-sorted in another language based on the README documentation and wanted to verify compatibility with this official go implementation. I found some interesting cases where the go implementation has some unspecified/inconsistent outputs/parsing from the goldens:
by_regex.in:
keep-sorted-test start block=yes newline_separated=yes by_regex=(\w+)\(\)\s+{
java.util.regex.PatternSyntaxException: Illegal repetition near index 13 (\w+)\(\)\s+{
groups.in:
Nested keep-sorted without indentation
ignore_prefixes.in:
Additional prefixes aren't counted as part of the comment:
I also hit a few other "weird" examples, e.g.
by_regex.in: Prefix order
seems to have ambiguity in terms of the regex anchoring\w+_(\w+)
could match eitherDO_MORE
orMORE_STUFF
inDO_MORE_STUFF
(looks like go matches the former, and java the latter), But in those examples, there's a second undocumented behavior: in the case of ties (e.g. multiple init/final) after the regex substitution, the original string seems to be used as a tie-breaker (based on the goldens. i didn't look at the source) rather than using a stable sort and retaning the original order (foo ends up after bar even though in the original foo comes before bar).I suspect some of these are just remnants of one-off google3 bug-fixes/feature additions to make it work in a lot of real-woorld cases (which sadly seems to add more language syntax/lexing knowledge than i expected) but just wanted to see if they could be documented (or fixed to aslign with the docs).
The text was updated successfully, but these errors were encountered: