Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Tactic/Linter): add unicode linter for unicode variant-selectors #17129

Open
wants to merge 66 commits into
base: master
Choose a base branch
from

Conversation

joneugster
Copy link
Collaborator

@joneugster joneugster commented Sep 25, 2024

Unicode characters can be followed by one of the two "variant-selectors" \FE0E (text) or \FE0F (emoji).
These appear to the user as a single unicode character and they might copy them by accident. For example ✅️ (\u2705\uFE0F), (\u2705) and ✅︎ (\u2705\uFE0E) are three variants of the "same" unicode character. The one without variant-selector might display as either emoji or not, depending on the user's device and font.

Add unicode linter ensuring variant-selector only appear on specified characters and also ensuring these characters always have the correct variant selector to avoid confusions.

Everything flagged by this linter can be fixed fully automatically with lake exe lint-style --fix or by commenting bot fix style on the PR.


This is the first PR of a series of PRs about unicode characters in Mathlib. The second PR will be addressing non-breaking spaces.

Zulip discussions:

Generally, the discussions seemed to show favour for the first two PRs in this series (handling variant selectors & dealing with invisible chars. such as   (\u00A0)) while there seems to be still room for discussion about what to do with general unicode characters (whitelist, blacklist, ...)

Related PRs:

Open in Gitpod

adomasbaliuka and others added 30 commits August 28, 2024 12:41
Using whitelist to ignore characters, using further lists to check emoji/text variant selector use
The effect depends on the font that is used.
Copy link

github-actions bot commented Sep 25, 2024

PR summary b9c6d10e3a

Import changes exceeding 2%

% File
+5.00% Mathlib.Tactic.Linter.TextBased

Import changes for modified files

No significant changes to the import graph

Import changes for all files
Files Import difference
Mathlib.Tactic.Linter.TextBased Mathlib.Tactic 1
Mathlib.Tactic.Linter.UnicodeLinter (new file) 18

Declarations diff

+ UnicodeVariant.emoji
+ UnicodeVariant.text
+ emojis
+ findBadUnicode
+ findBadUnicodeAux
+ nonEmojis
+ printCodepointHex
+ removeQuotations
+ unicodeLinter

You can run this locally as follows
## summary with just the declaration names:
./scripts/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/declarations_diff.sh long <optional_commit>

The doc-module for script/declarations_diff.sh contains some details about this script.


No changes to technical debt.

You can run this locally as

./scripts/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@joneugster joneugster added the t-linter Linter label Sep 25, 2024
@joneugster joneugster changed the title feat(Tactic/Linter): add unicode linter for invisible variant-selector characters feat(Tactic/Linter): add unicode linter for unicode variant-selectors Sep 25, 2024
@joneugster
Copy link
Collaborator Author

bot fix style

@leanprover-community-bot-assistant leanprover-community-bot-assistant added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Sep 30, 2024
@github-actions github-actions bot added the large-import Automatically added label for PRs with a significant increase in transitive imports label Oct 9, 2024
@joneugster joneugster removed the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Oct 9, 2024
@leanprover-community-bot-assistant leanprover-community-bot-assistant added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Oct 31, 2024
@leanprover-community-bot-assistant leanprover-community-bot-assistant removed the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Nov 20, 2024
@leanprover-community-bot-assistant leanprover-community-bot-assistant added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Nov 22, 2024
@leanprover-community-bot-assistant leanprover-community-bot-assistant removed the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
large-import Automatically added label for PRs with a significant increase in transitive imports t-linter Linter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants