Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention header name error message for invalid header encodings #3400

Open
RobertCraigie opened this issue Nov 12, 2024 Discussed in #3399 · 3 comments
Open

Mention header name error message for invalid header encodings #3400

RobertCraigie opened this issue Nov 12, 2024 Discussed in #3399 · 3 comments

Comments

@RobertCraigie
Copy link
Contributor

Discussed in #3399

Originally posted by RobertCraigie November 12, 2024
This openai-python user ran into a confusing error when passing a non-ascii header value, would it be possible to mention the header name in the error message?

Minimal repro

import httpx

httpx.Headers({"auth": "здравейздравейздравейздравей"})
Traceback (most recent call last):
  File "script.py", line 3, in <module>
    httpx.Headers({"auth": "здравейздравейздравейздравей"})
  File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 74, in __init__
    self._list = [
  File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 78, in <listcomp>
    normalize_header_value(v, encoding),
  File ".venv/lib/python3.9/site-packages/httpx/_utils.py", line 53, in normalize_header_value
    return value.encode(encoding or "ascii")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-27: ordinal not in range(128)
```</div>
@tomchristie
Copy link
Member

Would you be able to review what range of characters the h11 package uses for valid HTTP headers?
(Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

@jasonkaedingrhino
Copy link

Would you be able to review what range of characters the h11 package uses for valid HTTP headers?
(Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

From https://github.com/python-hyper/h11/blob/master/h11/_headers.py

# Facts
# -----
#
# Headers are:
#   keys: case-insensitive ascii
#   values: mixture of ascii and raw bytes
#
# "Historically, HTTP has allowed field content with text in the ISO-8859-1
# charset [ISO-8859-1], supporting other charsets only through use of
# [RFC2047] encoding.  In practice, most HTTP header field values use only a
# subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD
# limit their field values to US-ASCII octets.  A recipient SHOULD treat other
# octets in field content (obs-text) as opaque data."
# And it deprecates all non-ascii values

So it's essentially direct-quoting from HTTP/1.1 spec, and thus the choice of ascii encoding makes sense.

@jasonkaedingrhino
Copy link

jasonkaedingrhino commented Nov 25, 2024

In the main, these sorts of situations are going to happen when using authentication headers, which are often obtained via some sort of "secret management" process that includes encryption/decryption and/or base64 encoding/decoding along the way before such values get injected into actual code. This leaves the door open for upstream human errors to propagate down into this level while not being "obvious" due to the opaque nature of it all.

While the example above is very contrived using Cyrillic alphabet, the real error source was more like some bad copy/paste of the correct value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants