Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII-specific Str functions #7473

Open
smores56 opened this issue Jan 6, 2025 · 2 comments
Open

Add ASCII-specific Str functions #7473

smores56 opened this issue Jan 6, 2025 · 2 comments
Assignees
Labels
builtins Relates to roc builtins like Bool, List, Str ... good first issue Good for newcomers

Comments

@smores56
Copy link
Collaborator

smores56 commented Jan 6, 2025

We want to add 3 new ASCII-specific Str functions to our standard library. They were proposed here. Copied from that gist, they are listed here with their type signatures and proposed docs:

Str.with_ascii_lowercased : Str -> Str

Returns a version of the string with all ASCII characters lowercased. Non-ASCII characters are left unmodified. For example:

expect "CAFÉ".with_ascii_lowercased() == "cafÉ"

This function is useful for things like command-line options and environment variables where you know in advance that you're dealing with a hardcoded string containing only ASCII characters. It has better performance than lowercasing operations which take Unicode into account.

That said, strings received from user input can always contain non-ASCII Unicode characters, and lowercasing Unicode works differently in different languages. For example, the string "I" lowercases to "i" in English and to "ı" (a dotless i) in Turkish. These rules can also change in each Unicode release, so we have a separate unicode package for Unicode capitalization that can be upgraded independently from the language's builtins.

To do a case-insensitive comparison of the ASCII characters in a string, use caseless_ascii_equals.

Str.with_ascii_uppercased : Str -> Str

Returns a version of the string with all ASCII characters uppercased. Non-ASCII characters are left unmodified. For example:

expect "café".with_ascii_uppercased() == "CAFé"

This function is useful for things like command-line options and environment variables where you know in advance that you're dealing with a hardcoded string containing only ASCII characters. It has better performance than uppercasing operations which take Unicode into account.

That said, strings received from user input can always contain non-ASCII Unicode characters, and uppercasing Unicode works differently in different languages. For example, the string "i" uppercases to "I" in English and to "İ" (a dotted I) in Turkish. These rules can also change in each Unicode release, so we have a separate unicode package for Unicode capitalization that can be upgraded independently from the language's builtins.

To do a case-insensitive comparison of the ASCII characters in a string, use caseless_ascii_equals.

Str.caseless_ascii_equals : Str, Str -> Bool

Returns True if all the ASCII characters in the string are the same when ignoring differences in capitalization. Non-ASCII characters must all be exactly the same, including capitalization. For example:

expect "café".caseless_ascii_equals("CAFé")

expect !"café".caseless_ascii_equals("CAFÉ")

The first call returns True because all the ASCII characters are the same when ignoring differences in capitalization, and the only non-ASCII character (é) is the same in both strings. The second call returns False because é and É are not ASCII characters, and they are different.

This function is useful for things like command-line options and environment variables where you know in advance that you're dealing with a hardcoded string containing only ASCII characters. It has better performance than case-insensitive comparisons which take Unicode into account.

That said, strings received from user input can always contain non-ASCII Unicode characters, and case-insensitive Unicode comparisons work differently in different languages. For example, the strings "i" and "I" are the same in English when ignoring capitalization. In Turkish, the case-insensitive equivalent of "i" is not "I" but rather "İ" (a dotted I), and the case-insensitive equivalent of "I" is not "i" but rather "ı" (a dotless i). These rules can also change in each Unicode release, so we have a separate unicode package for Unicode capitalization that can be upgraded independently from the language's builtins.

To convert a string's ASCII characters to uppercase or lowercase, use with_ascii_uppercased and with_ascii_lowercased.

@smores56 smores56 added good first issue Good for newcomers builtins Relates to roc builtins like Bool, List, Str ... labels Jan 6, 2025
@HajagosNorbert
Copy link
Contributor

Hi! I want to do this, can you assign me?

@smores56
Copy link
Collaborator Author

smores56 commented Jan 6, 2025

Good luck, and may the odds be ever in your favor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
builtins Relates to roc builtins like Bool, List, Str ... good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants