-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFStringGetCStringPtr
does not check encoding
#5164
Comments
Thanks for reporting! I took a look at the implementation and you're right, in CoreFoundation we don't actually check whether or not the string is an 8 bit representation explicitly before providing the pointer. This is the code in question: swift-corelibs-foundation/Sources/CoreFoundation/CFString.c Lines 2222 to 2242 in 0129358
However, this CF implementation is technically valid based on its internal assumptions because it's theoretically prevented by another piece here. You're creating this string by calling
And UTF-16LE is not an 8-bit encoding. For some bytes (all non-ASCII strings) it happens to work, but for UTF-16LE bytes that contain ascii characters (i.e. contain embedded null bytes when viewed as 8-bit characters) then the function either truncates the input or returns Thanks for filing this and the feedback - we can use this issue to track an update on the linux side and your feedback will track a change to CoreFoundation on Darwin (which no longer shares source code with this repo). At least for this linux side, I suspect we should look into adding a trap to creation with non-8-bit encodings to avoid this issue along with the truncation/failure issues mentioned above to make it clear you shouldn't do this. In your own code for the moment, you can avoid this by using other |
Ah, thanks so much for the answer, it never occurred to me that c-strings would not be valid if not an 8-bit encoding! I did also find it weird that such a simple bug had existed for so many years, but I just couldn't wrap my head around it being otherwise. And indeed, it was confusing that CF still allowed me to create the string, a trap would've been nice! |
NOTE: I have filed this both here and as feedback FB16417968, unsure which was the most relevant place.
CFStringGetCStringPtr
is documented to return a string pointer in the given encoding, and "is simply an optimization", but it does not actually check whether the string is of that encoding, only whether it can be represented in that encoding.This means that for example the string "♥", which can be represented in UTF-16 as the hex bytes "65 26", can end up being interpreted as the same UTF-8 hex bytes "65 26", but which mean something completely different, namely "e&". The expected result would be that
CFStringGetCStringPtr
returned NULL in this case.An alternative to resolving this issue would be to update the documentation to state this footgun.
Reproducer
The example code below shows the discrepancy between
CFStringGetCStringPtr
andCFStringGetCString
:Run with:
clang -framework CoreFoundation example.c && ./a.out
Expected result:
Actual result:
Occurs on:
The text was updated successfully, but these errors were encountered: