You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
UTF-8 detection in Windows.py relies on an initial attempt to decode as UTF-16 raising UnicodeDecodeError. The unit test, by using an odd-character length UTF-8 test string, shows the detection apparently working, but substitution of an even-length test string shows it doesn't work.
[I found this issue while working on improving the Windows backend password maximum length of 1280 characters, and wanted to capture it as a separate issue.]
To Reproduce
Steps to reproduce the behavior:
Copy the existing test_read_utf8_password test to a new method, e.g. test_read_even_utf8_password
def test_read_even_utf8_password(self):
"""
Write a UTF-8 encoded credential and make sure it can be read back correctly.
"""
service = "keyring-utf8-test"
username = "keyring"
password = "utf8-test0" + UNICODE_CHARS
self.set_utf8_password(service, username, password)
assert self.keyring.get_password(service, username) == password
E AssertionError: assert '瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ...胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐' == 'utf8-test0זה...ръзнакатогьон'
E - utf8-test0זהכיףסתםלשמועאיךתנצחקרפדעץטובבגןξεσκεπάζωτηνψυχοφθόραβδελυγμίαСъешьжеещёэтихмягкихфранцузскихбулокдавыпейчаюЖълтатадюлябешещастливачепухъткойтоцъфназамръзнакатогьон
E + 瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ鷗鳗꧗黗闗ꋗ郗駗髗ꃗꛗ韗ꟗ꣗ꓗ鏗ꋗꗗ飗闗釗釗鋗鿗뻎뗎菏뫎뗎胏곎뛎觏蓏럎뷎裏藏蟏뿎蛏룎賏臏뇎닎듎뗎믎藏돎볎꿎뇎ꇐ諑뗐裑賑뛐뗐뗐觑金跑苑룐藑볐近돐뫐룐藑蓑胑냐뷐蛑菑럐臑뫐룐藑뇐菑믐뻐뫐듐냐닐译뿐뗐말蟑냐軑雐諑믐苑냐苑냐듐軑믐近뇐뗐裑뗐觑냐臑苑믐룐닐냐蟑뗐뿐菑藑諑苑뫐뻐말苑뻐蛑諑蓑뷐냐럐냐볐胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐
tests\backends\test_Windows.py:73: AssertionError
The problem arises because the apparent success of test_read_utf8_password in falling back to UTF-8 based on UTF-16 not working is based on the choice of an odd # of bytes in its test password. We can't, in general, assume that a UTF-8 encoded password will be an odd # of bytes.
In other words, encoding detection is really language detection, combined with knowledge of which languages tend to use which character encodings.
which indicates the challenge involved in detecting the encoding simply based on the bytes encountered. Without an explicit indication of the encoding expected, I do not know how to fix this issue.
Additional context
If we add a Credential Manager Attribute to the existing Credential structure, we could explicitly indicate which encoding keyring used for a given password. That won't help with identifying credentials generated by other apps, unfortunately.
Another option would be to offer a new API, a variation on get_password (e.g. get_password_bytes) that doesn't attempt to decode the bytes stored by Windows Credential Manager. This would support using keyring to retrieve exogenous credentials, without applying the customary UTF-16 decode to those bytes.
The text was updated successfully, but these errors were encountered:
Thanks. Have you had time to take a look at my "take 2" PR? I think it addresses all of your concerns.
It only resorts to UTF-8 when the password wouldn't have fit and would have raised winerror 1783 under the current implementation. Otherwise it works identically to the current implementation. It is allowing us to use poetry and partifact on Windows with no 1783 errors, without interfering with traditional uses.
Describe the bug
UTF-8 detection in Windows.py relies on an initial attempt to decode as UTF-16 raising
UnicodeDecodeError
. The unit test, by using an odd-character length UTF-8 test string, shows the detection apparently working, but substitution of an even-length test string shows it doesn't work.[I found this issue while working on improving the Windows backend password maximum length of 1280 characters, and wanted to capture it as a separate issue.]
To Reproduce
Steps to reproduce the behavior:
test_read_even_utf8_password
tests\backends\test_Windows.py:73: AssertionError
The problem arises because the apparent success of
test_read_utf8_password
in falling back to UTF-8 based on UTF-16 not working is based on the choice of an odd # of bytes in its test password. We can't, in general, assume that a UTF-8 encoded password will be an odd # of bytes.Expected behavior
If UTF-8 detection (or UTF-16 validation) were possible, I'd expect the test to work the same as the existing
test_read_utf8_password
. However, as the Python package chardet points out in (https://chardet.readthedocs.io/en/latest/faq.html#isnt-that-impossible)which indicates the challenge involved in detecting the encoding simply based on the bytes encountered. Without an explicit indication of the encoding expected, I do not know how to fix this issue.
Environment
Additional context
If we add a Credential Manager Attribute to the existing Credential structure, we could explicitly indicate which encoding
keyring
used for a given password. That won't help with identifying credentials generated by other apps, unfortunately.Another option would be to offer a new API, a variation on
get_password
(e.g.get_password_bytes
) that doesn't attempt to decode the bytes stored by Windows Credential Manager. This would support usingkeyring
to retrieve exogenous credentials, without applying the customary UTF-16 decode to those bytes.The text was updated successfully, but these errors were encountered: