Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow creation of CFStrings from non-8bit c-strings #5165

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jmschonfeld
Copy link
Contributor

It is invalid to create a CFString from a c-string of a non-8bit encoding. Encodings that are not 8bit encodings may have embedded null bytes that represent a portion of a scalar (for example, the leading byte of an ASCII scalar represented in UTF-16) which means these encodings are not suitable for c-string representation. Currently, we do not enforce this and creation can sometimes fail, sometimes truncate the provided text, and other times create a malformed CFString which may behave unexpectedly when calling CFStringGetCStringPtr on it (it can vend a pointer of the wrong encoding). This change enforces this requirement by trapping at runtime if an invalid encoding is provided.

I left some comments in CFStringGetCStringPtr about the potentially confusing behavior that previous arose here due to this issue with an explanation about how the enforcement at string creation time makes the logic in that function sound.

Resolves #5164

@jmschonfeld
Copy link
Contributor Author

@swift-ci please test

if (hasNullByte && !__CFStringEncodingIsSupersetOfASCII(encoding)) {
// Non-8bit encodings cannot be safely read as c-strings because they may contain many null bytes
// This was documented as invalid previously, but now we validate that eagerly here to prevent creating truncated strings or strings that incorrectly assume 8bit representation
HALT_MSG("CFStringCreateWithCString can only be called with an 8-bit encoding");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it might be possible that people are calling this with some non-8bit encoding, but only ASCII data, and getting away with it today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many encodings I suspect they couldn't get away with it (for example in UTF-16 ASCII text would have a lot of intermediate null bytes so depending on the endianness either string creation would fail (because there would be an odd number of bytes) or the string would be empty (because the first byte is a null byte)). So it's possible that someone could be relying on the failing (NULL return) or empty string behavior, but I think that's not super likely. It's possible that there's another encoding that we have that might work - I can double check - but I can't think of one off the top of my head that would behave correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFStringGetCStringPtr does not check encoding
3 participants