supernormalize
is a JavaScript library that agressively normalizes text to a standard form. Use cases include:
- Mitigating homoglyph attacks
- Normalizing text for comparison
- Preparation for indexing text in a search engine
- Preparation for blacklisting text
The library performs the following steps:
- Remove all marks (i.e. diacritics) and perform compatibility normalization
- Convert the text to lowercase
- Normalize homoglyphs using a mapping based on this list from the Unicode Consortium in version 15.1.0 (the used list does not include homoglyphs that are already normalized in steps 1 and 2)
- Replace all whitespace characters with a single space and trim the text
npm install supernormalize
import { supernormalize } from "supernormalize";
const text = "⋿╳⍺rñ⍴lé";
const normalizedText = supernormalize(text);
console.log(normalizedText); // 'examp1e'
Input | Output | Note |
---|---|---|
⋿╳⍺rñ⍴lé |
examp1e |
Below rules can be combined |
𝕋𝕙𝕚𝕤 𝕚𝕤 𝕒 𝕥𝕖𝕤𝕥! |
th1s 1s a test! |
Homoglyphs are normalized to a common form |
D̴̝̼̅i̴̱̐͊́a̵̢͎͒͝ĉ̵͓̈́̽r̶͂͝ͅi̷͔͜͝ṭ̴͋͆͘i̵͔̅c̷̛͉̪͂͊s̵̞̝̲͊ |
d1acr1t1cs |
Diacritics are removed |
AАΑ |
aaa |
Latin, Cyrillic, and Greek characters are normalized to the same form |
rn |
m |
Multiletter homoglyphs are normalized |
ffi… |
ff1... |
Ligatures are normalized to letters |
\tHELLO WORLD \n |
he110 w0r1d |
Whitespace and casing is normalized |
Normalizes the given text performing the steps described above.
Converts the given text to lowercase.
Removes all marks (i.e. diacritics) and performs compatibility normalization on the given text.
Normalizes homoglyphs using a mapping based on this list from the Unicode Consortium.
Replaces all whitespace characters with a single space. Trim the text.
This project is licensed under the MIT License - see the LICENSE file for details.