Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows backend UTF-8 detection doesn't work #554

Open
jrobbins-LiveData opened this issue Nov 29, 2021 · 2 comments
Open

Windows backend UTF-8 detection doesn't work #554

jrobbins-LiveData opened this issue Nov 29, 2021 · 2 comments

Comments

@jrobbins-LiveData
Copy link

Describe the bug
UTF-8 detection in Windows.py relies on an initial attempt to decode as UTF-16 raising UnicodeDecodeError. The unit test, by using an odd-character length UTF-8 test string, shows the detection apparently working, but substitution of an even-length test string shows it doesn't work.

[I found this issue while working on improving the Windows backend password maximum length of 1280 characters, and wanted to capture it as a separate issue.]

To Reproduce
Steps to reproduce the behavior:

  1. Copy the existing test_read_utf8_password test to a new method, e.g. test_read_even_utf8_password
    def test_read_even_utf8_password(self):
        """
        Write a UTF-8 encoded credential and make sure it can be read back correctly.
        """
        service = "keyring-utf8-test"
        username = "keyring"
        password = "utf8-test0" + UNICODE_CHARS

        self.set_utf8_password(service, username, password)
        assert self.keyring.get_password(service, username) == password
  1. Run the unit tests on Windows
  2. See error
    assert self.keyring.get_password(service, username) == password

E AssertionError: assert '瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ...胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐' == 'utf8-test0זה...ръзнакатогьон'
E - utf8-test0זהכיףסתםלשמועאיךתנצחקרפדעץטובבגןξεσκεπάζωτηνψυχοφθόραβδελυγμίαСъешьжеещёэтихмягкихфранцузскихбулокдавыпейчаюЖълтатадюлябешещастливачепухъткойтоцъфназамръзнакатогьон
E + 瑵㡦琭獥ぴ雗铗鯗駗ꏗꇗ꫗鷗鳗꧗黗闗ꋗ郗駗髗꫗ꃗꛗ韗ꟗ꣗ꓗ鏗ꋗꗗ飗闗釗釗鋗鿗뻎뗎菏뫎뗎胏곎뛎觏蓏럎뷎裏藏蟏뿎蛏룎賏臏뇎닎듎뗎믎藏돎볎꿎뇎ꇐ諑뗐裑賑뛐뗐뗐觑金跑苑룐藑볐近돐뫐룐藑蓑胑냐뷐蛑菑럐臑뫐룐藑뇐菑믐뻐뫐듐냐닐译뿐뗐말蟑냐軑雐諑믐苑냐苑냐듐軑믐近뇐뗐裑뗐觑냐臑苑믐룐닐냐蟑뗐뿐菑藑諑苑뫐뻐말苑뻐蛑諑蓑뷐냐럐냐볐胑諑럐뷐냐뫐냐苑뻐돐賑뻐뷐

tests\backends\test_Windows.py:73: AssertionError

The problem arises because the apparent success of test_read_utf8_password in falling back to UTF-8 based on UTF-16 not working is based on the choice of an odd # of bytes in its test password. We can't, in general, assume that a UTF-8 encoded password will be an odd # of bytes.

Expected behavior
If UTF-8 detection (or UTF-16 validation) were possible, I'd expect the test to work the same as the existing test_read_utf8_password. However, as the Python package chardet points out in (https://chardet.readthedocs.io/en/latest/faq.html#isnt-that-impossible)

In other words, encoding detection is really language detection, combined with knowledge of which languages tend to use which character encodings.

which indicates the challenge involved in detecting the encoding simply based on the bytes encountered. Without an explicit indication of the encoding expected, I do not know how to fix this issue.

Environment

  • OS: Windows 10
$ pip list | grep keyring
keyring            23.4.0 
$ keyring --list-backends
keyring.backends.fail.Keyring (priority: 0)
keyring.backends.chainer.ChainerBackend (priority: -1)
keyring.backends.Windows.WinVaultKeyring (priority: 5)

Additional context
If we add a Credential Manager Attribute to the existing Credential structure, we could explicitly indicate which encoding keyring used for a given password. That won't help with identifying credentials generated by other apps, unfortunately.

Another option would be to offer a new API, a variation on get_password (e.g. get_password_bytes) that doesn't attempt to decode the bytes stored by Windows Credential Manager. This would support using keyring to retrieve exogenous credentials, without applying the customary UTF-16 decode to those bytes.

@jaraco
Copy link
Owner

jaraco commented Jan 2, 2022

Nice analysis and thanks for the report.

I like the idea of using an Attribute. I wish the industry would give us a protocol around which we can rally.

Without that, perhaps this project should lead by example.

I'd like to see an exploration of using Attribute to save the encoding, use UTF-8 by default, and fall back to UTF-16 where needed for compatibility.

@jrobbins-LiveData
Copy link
Author

Thanks. Have you had time to take a look at my "take 2" PR? I think it addresses all of your concerns.

It only resorts to UTF-8 when the password wouldn't have fit and would have raised winerror 1783 under the current implementation. Otherwise it works identically to the current implementation. It is allowing us to use poetry and partifact on Windows with no 1783 errors, without interfering with traditional uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants