Windows backend UTF-8 detection doesn't work #554

jrobbins-LiveData · 2021-11-29T13:13:33Z

Describe the bug
UTF-8 detection in Windows.py relies on an initial attempt to decode as UTF-16 raising UnicodeDecodeError. The unit test, by using an odd-character length UTF-8 test string, shows the detection apparently working, but substitution of an even-length test string shows it doesn't work.

[I found this issue while working on improving the Windows backend password maximum length of 1280 characters, and wanted to capture it as a separate issue.]

To Reproduce
Steps to reproduce the behavior:

Copy the existing test_read_utf8_password test to a new method, e.g. test_read_even_utf8_password

    def test_read_even_utf8_password(self):
        """
        Write a UTF-8 encoded credential and make sure it can be read back correctly.
        """
        service = "keyring-utf8-test"
        username = "keyring"
        password = "utf8-test0" + UNICODE_CHARS

        self.set_utf8_password(service, username, password)
        assert self.keyring.get_password(service, username) == password

Run the unit tests on Windows
See error

    assert self.keyring.get_password(service, username) == password
E AssertionError: assert '瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ...胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐' == 'utf8-test0זה...ръзнакатогьон'
E - utf8-test0זהכיףסתםלשמועאיךתנצחקרפדעץטובבגןξεσκεπάζωτηνψυχοφθόραβδελυγμίαСъешьжеещёэтихмягкихфранцузскихбулокдавыпейчаюЖълтатадюлябешещастливачепухъткойтоцъфназамръзнакатогьон
E + 瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ꫗鷗鳗꧗黗闗ꋗ郗駗髗꫗ꃗꛗ韗ꟗ꣗ꓗ鏗ꋗꗗ飗闗釗釗鋗鿗뻎뗎菏뫎뗎胏곎뛎觏蓏럎뷎裏藏蟏뿎蛏룎賏臏뇎닎듎뗎믎藏돎볎꿎뇎ꇐ諑뗐裑賑뛐뗐뗐觑金跑苑룐藑볐近돐뫐룐藑蓑胑냐뷐蛑菑럐臑뫐룐藑뇐菑믐뻐뫐듐냐닐译뿐뗐말蟑냐軑雐諑믐苑냐苑냐듐軑믐近뇐뗐裑뗐觑냐臑苑믐룐닐냐蟑뗐뿐菑藑諑苑뫐뻐말苑뻐蛑諑蓑뷐냐럐냐볐胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐

tests\backends\test_Windows.py:73: AssertionError

The problem arises because the apparent success of test_read_utf8_password in falling back to UTF-8 based on UTF-16 not working is based on the choice of an odd # of bytes in its test password. We can't, in general, assume that a UTF-8 encoded password will be an odd # of bytes.

Expected behavior
If UTF-8 detection (or UTF-16 validation) were possible, I'd expect the test to work the same as the existing test_read_utf8_password. However, as the Python package chardet points out in (https://chardet.readthedocs.io/en/latest/faq.html#isnt-that-impossible)

In other words, encoding detection is really language detection, combined with knowledge of which languages tend to use which character encodings.

which indicates the challenge involved in detecting the encoding simply based on the bytes encountered. Without an explicit indication of the encoding expected, I do not know how to fix this issue.

Environment

OS: Windows 10

$ pip list | grep keyring
keyring            23.4.0 
$ keyring --list-backends
keyring.backends.fail.Keyring (priority: 0)
keyring.backends.chainer.ChainerBackend (priority: -1)
keyring.backends.Windows.WinVaultKeyring (priority: 5)

Additional context
If we add a Credential Manager Attribute to the existing Credential structure, we could explicitly indicate which encoding keyring used for a given password. That won't help with identifying credentials generated by other apps, unfortunately.

Another option would be to offer a new API, a variation on get_password (e.g. get_password_bytes) that doesn't attempt to decode the bytes stored by Windows Credential Manager. This would support using keyring to retrieve exogenous credentials, without applying the customary UTF-16 decode to those bytes.

The text was updated successfully, but these errors were encountered:

jaraco · 2022-01-02T00:07:31Z

Nice analysis and thanks for the report.

I like the idea of using an Attribute. I wish the industry would give us a protocol around which we can rally.

Without that, perhaps this project should lead by example.

I'd like to see an exploration of using Attribute to save the encoding, use UTF-8 by default, and fall back to UTF-16 where needed for compatibility.

jrobbins-LiveData · 2022-01-02T00:20:16Z

Thanks. Have you had time to take a look at my "take 2" PR? I think it addresses all of your concerns.

It only resorts to UTF-8 when the password wouldn't have fit and would have raised winerror 1783 under the current implementation. Otherwise it works identically to the current implementation. It is allowing us to use poetry and partifact on Windows with no 1783 errors, without interfering with traditional uses.

jrobbins-LiveData mentioned this issue Dec 5, 2021

"take-2": new Windows backend, ctypes-based, separate class which shards via Attributes #555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows backend UTF-8 detection doesn't work #554

Windows backend UTF-8 detection doesn't work #554

jrobbins-LiveData commented Nov 29, 2021

jaraco commented Jan 2, 2022

jrobbins-LiveData commented Jan 2, 2022

Windows backend UTF-8 detection doesn't work #554

Windows backend UTF-8 detection doesn't work #554

Comments

jrobbins-LiveData commented Nov 29, 2021

jaraco commented Jan 2, 2022

jrobbins-LiveData commented Jan 2, 2022