Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFStringGetCStringPtr does not check encoding #5164

Open
madsmtm opened this issue Jan 28, 2025 · 2 comments · May be fixed by #5165
Open

CFStringGetCStringPtr does not check encoding #5164

madsmtm opened this issue Jan 28, 2025 · 2 comments · May be fixed by #5165

Comments

@madsmtm
Copy link

madsmtm commented Jan 28, 2025

NOTE: I have filed this both here and as feedback FB16417968, unsure which was the most relevant place.

CFStringGetCStringPtr is documented to return a string pointer in the given encoding, and "is simply an optimization", but it does not actually check whether the string is of that encoding, only whether it can be represented in that encoding.

This means that for example the string "♥", which can be represented in UTF-16 as the hex bytes "65 26", can end up being interpreted as the same UTF-8 hex bytes "65 26", but which mean something completely different, namely "e&". The expected result would be that CFStringGetCStringPtr returned NULL in this case.

An alternative to resolving this issue would be to update the documentation to state this footgun.

Reproducer

The example code below shows the discrepancy between CFStringGetCStringPtr and CFStringGetCString:

#include <CoreFoundation/CoreFoundation.h>

int main() {
    // Supposed to contain a "♥".
    char* heart_utf16le = "\x65\x26";
    char* heart_utf8 = "\xE2\x99\xA5";

    // Create CFString with the heart from UTF-16 / Unicode.
    CFStringRef s = CFStringCreateWithCString(kCFAllocatorDefault, heart_utf16le, kCFStringEncodingUTF16LE);

    // CFStringGetCString converts to UTF-8.
    char buf[20];
    CFStringGetCString(s, (char*)&buf, 20, kCFStringEncodingUTF8);
    printf("%s\n", buf); // prints "♥"

    // But CFStringGetCStringPtr completely ignores the UTF-8 conversion
    // we asked it to do, i.e. a huge correctness footgun!
    const char* ptr = CFStringGetCStringPtr(s, kCFStringEncodingUTF8);
    printf("%s\n", ptr); // prints "e&"
}

Run with:

clang -framework CoreFoundation example.c && ./a.out

Expected result:

♥
(null)

Actual result:

♥
e&

Occurs on:

  • Mac OS X 10.12.6 on x86_64.
  • macOS 14.7.1 on Aarch64.
  • macOS 15.1.1 in VM.
@jmschonfeld
Copy link
Contributor

jmschonfeld commented Jan 29, 2025

Thanks for reporting! I took a look at the implementation and you're right, in CoreFoundation we don't actually check whether or not the string is an 8 bit representation explicitly before providing the pointer. This is the code in question:

static inline const char * _CFStringGetCStringPtrInternal(CFStringRef str, CFStringEncoding encoding, Boolean requiresNullTermination, Boolean requiresBridgingCheck) {
if (encoding != __CFStringGetEightBitStringEncoding() && (kCFStringEncodingASCII != __CFStringGetEightBitStringEncoding() || !__CFStringEncodingIsSupersetOfASCII(encoding))) return NULL;
// ??? Also check for encoding = SystemEncoding and perhaps bytes are all ASCII?
if (str == NULL) return NULL; // Should really just crash, but for compatibility... see <rdar://problem/12340248>
if (requiresBridgingCheck) {
CF_SWIFT_FUNCDISPATCHV(_kCFRuntimeIDCFString, const char *, (CFSwiftRef)str, NSString._fastCStringContents, requiresNullTermination);
CF_OBJC_FUNCDISPATCHV(_kCFRuntimeIDCFString, const char *, (NSString *)str, _fastCStringContents:requiresNullTermination);
}
__CFAssertIsString(str);
if ((!requiresNullTermination && __CFStrIsEightBit(str)) || __CFStrHasNullByte(str)) {
// Note: this is called a lot, 27000 times to open a small xcode project with one file open.
// Of these uses about 1500 are for cStrings/utf8strings.
return (const char *)__CFStrContents(str) + __CFStrSkipAnyLengthByte(str);
} else {
return NULL;
}
}

However, this CF implementation is technically valid based on its internal assumptions because it's theoretically prevented by another piece here. You're creating this string by calling CFStringCreateWithCString with kCFStringEncodingUTF16LE, however this itself is invalid as the documentation for CFStringCreateWithCString states:

The encoding must specify an 8-bit encoding.

And UTF-16LE is not an 8-bit encoding. For some bytes (all non-ASCII strings) it happens to work, but for UTF-16LE bytes that contain ascii characters (i.e. contain embedded null bytes when viewed as 8-bit characters) then the function either truncates the input or returns NULL. What we really should be doing is checking the encoding at the creation site here and ensuring that we only ever create strings from C-strings for 8-bit encodings. That means that in the code pointed at above, __CFStrHasNullByte is only ever TRUE for strings that are 8-bit and it would behave as expected.

Thanks for filing this and the feedback - we can use this issue to track an update on the linux side and your feedback will track a change to CoreFoundation on Darwin (which no longer shares source code with this repo). At least for this linux side, I suspect we should look into adding a trap to creation with non-8-bit encodings to avoid this issue along with the truncation/failure issues mentioned above to make it clear you shouldn't do this. In your own code for the moment, you can avoid this by using other CFString creation APIs that just take bytes and provide the length yourself (calculated some other way than strlen which isn't valid for UTF-16)

@madsmtm
Copy link
Author

madsmtm commented Jan 30, 2025

Ah, thanks so much for the answer, it never occurred to me that c-strings would not be valid if not an 8-bit encoding! I did also find it weird that such a simple bug had existed for so many years, but I just couldn't wrap my head around it being otherwise.

And indeed, it was confusing that CF still allowed me to create the string, a trap would've been nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants