-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Tactic/Linter): lint unwanted unicode #16215
base: master
Are you sure you want to change the base?
Conversation
PR summary 063b3b2533
|
File | Base Count | Head Count | Change |
---|---|---|---|
Mathlib.Tactic.Linter.TextBased | 3 | 16 | +13 (+433.33%) |
Import changes for all files
Files | Import difference |
---|---|
There are 4226 files with changed transitive imports taking up over 178601 characters: this is too many to display! | |
You can run scripts/import_trans_difference.sh all locally to see the whole output. |
Declarations diff
+ ASCII.allowed
+ ASCII.allowedWhitespace
+ ASCII.printable
+ UnicodeVariant.emoji
+ UnicodeVariant.text
+ emojis
+ findBadUnicode
+ findBadUnicodeAux
+ isAllowedCharacter
+ nonEmojis
+ othersInMathlib
+ printCodepointHex
+ removeQuotations
+ unicodeLinter
+ withVSCodeAbbrev
You can run this locally as follows
## summary with just the declaration names:
./scripts/declarations_diff.sh <optional_commit>
## more verbose report:
./scripts/declarations_diff.sh long <optional_commit>
The doc-module for script/declarations_diff.sh
contains some details about this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments. Overall, this linter
- flags interesting things
- does not seem to turn up something worrying,
both of which is great.
Would you like to take a look at the "spaces" the linter flags? (I haven't checked what this is about exactly; just looking at it in the text editor is probably not useful...) |
Co-authored-by: grunweg <[email protected]>
That seems to be ' ': Unicode U+202F (category Zs: Separator, space) I think we'll remove them from Mathlib but let's see... |
I have removed all non-breaking spaces (U+00a0) from Mathlib. This might be excessive:
|
Thank you! I think splitting the removal of non-breaking spaces into a separate PR (do you know git cherry-pick?) makes sense, and would be happy to nominate it for the maintainers to look. Reading the zulip discussion, changing the spaces does not appear controversial to me. Update: for me, splitting the space changes is positive anyway, as it's a large enough change (200 lines), completely mechanic and very different from the rest of this PR. That would be true regardless of whether the change is controversial. |
2e7d073
to
9e51102
Compare
Removes all non-breaking spaces (U+00a0) from Mathlib. This is in preparation for using the "unicode linter" (#16215), which would (as of now) disallow non-printing characters except for space (U+0020) and linefeed (U+000A).
Removes all non-breaking spaces (U+00a0) from Mathlib. This is in preparation for using the "unicode linter" (#16215), which would (as of now) disallow non-printing characters except for space (U+0020) and linefeed (U+000A).
Removes all non-breaking spaces (U+00a0) from Mathlib. This is in preparation for using the "unicode linter" (#16215), which would (as of now) disallow non-printing characters except for space (U+0020) and linefeed (U+000A).
Removes all non-breaking spaces (U+00a0) from Mathlib. This is in preparation for using the "unicode linter" (#16215), which would (as of now) disallow non-printing characters except for space (U+0020) and linefeed (U+000A).
Preliminary cleanup for the "unwanted unicode linter" added in #16215.
@joneugster LGTM |
↗︎ E ↘ | ||
↗ E ↘ | ||
1 → N ↓ G → 1 | ||
↘︎ E' ↗︎️ | ||
↘ E' ↗ | ||
|
||
For additive groups: | ||
↗︎ E ↘ | ||
↗ E ↘ | ||
0 → N ↓ G → 0 | ||
↘︎ E' ↗︎️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are genuinely intended to be non-breaking; the right solution is to put this in a code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry these are about the text-variant-selector of the arrows characters, i.e. there were invisible \uFE0E
following the diagonal arrows to prevent them form being displayed as emojis.
There are no spaces changed here.
If they are supposed to have the variant-selector (which is a fair choice) then, they can be added to the list of text-symbols defined here and the variant-selectors will be fixed automatically by lake exe lint-style --fix
@@ -116,7 +116,7 @@ def suggestSteps (pos : Array Lean.SubExpr.GoalsLocation) (goalType : Expr) (par | |||
@[server_rpc_method] | |||
def CalcPanel.rpc := mkSelectionPanelRPC suggestSteps | |||
"Please select subterms." | |||
"Calc 🔍" | |||
"Calc 🔍️" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are also intended to be non-breaking; I think using the unicode escape would be appropriate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, there is no space change, but the magnifying glass received a emoji-variant-selector because its specified in the list of emojis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, there are currently no non-breaking spaces in any parts of mathlib except in docstrings about condensed mathematics & cat. theory because Dagur's IDE automatically adds them after each closing `
And the dependency PRs fixing whitespace are mainly because Dagur's quite busy and commiting a lot :D
Mathlib/Tactic/Linter/TextBased.lean
Outdated
Obtained using Julia code: | ||
```julia | ||
filter(!isascii, unique( <all text in JSON file> )) |> repr | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this with Lean!
'»', '⁅', '⁆', '‖', '₊', '⌊', '⌋', '⌈', '⌉', 'α', 'β', | ||
'χ', '↓', 'ε', 'γ', '∩', 'μ', '¬', '∘', 'Π', '▸', '→', | ||
'↑', '∨', '×', '⁻', '¹', '∼', '·', '⋆', '¿', '₁', '₂', | ||
'₃', '₄', '₅', '₆', '₇', '₈', '₉', '₀', '←', 'Ø', '⅋', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[lint-style (comment with "bot fix style" to have the bot commit all style suggestions)] reported by reviewdog 🐶
'₃', '₄', '₅', '₆', '₇', '₈', '₉', '₀', '←', 'Ø', '⅋', | |
'₃', '₄', '₅', '₆', '₇', '₈', '₉', '₀', '← ', 'Ø', '⅋', |
Add a text-based style linter that checks all unicode characters.
Discussed at Zulip
The current proposal checks each character against a whitelist. There is also the question of what the whitelist should contain.
A reasonable whitelist might be "all unicode currently in Mathlib" ∪ Wikipedia/Mathematical_operators_and_symbols_in_Unicode ∪ EdAyers/mathlib3/docs/unicode.md. Currently implemented is a subset of this list closer to what's currently present in mathlib.