Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient ssl.SSLWantReadError exception slows down very common use-case #123954

Open
tarasko opened this issue Sep 11, 2024 · 12 comments
Open
Assignees
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage topic-SSL type-bug An unexpected behavior, bug, or error

Comments

@tarasko
Copy link

tarasko commented Sep 11, 2024

Bug report

Bug description:

Event loops like uvloop, asyncio use nonblocking ssl. They typically

  1. read data from the socket when epoll returns that it is ready
  2. push data to the incoming MemoryBIO
  3. read from SSLObject until SSLWantReadError is thrown
  4. pass read data to the application protocol

when peers are exchanging relatively small messages, SSLObject.read is typically called 2 times . First call returns data, second - throws SSLWantReadError

perf shows that the second call is almost as expensive as the first call because of time spent on constructing new exception object.

Is it possible to optimize exception object creation for the second call?

I tried to avoid the second call by analyzing MemoryBIO.pending and SSLObject.pending values but they can't always reliably tell that we have to wait for more data.

For example, it is possible that incoming MemoryBIO.pending > 0, SSLObject.pending == 0. We call SSLObject.read and it throws because incoming MemoryBIO doesn't have the full ssl frame yet.

Example echo client that replicates internal logic in asyncio/uvloop:

import socket
import ssl
import select
from typing import Optional

ssl_context = ssl.create_default_context()
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE

ep = select.epoll(2)

incoming = ssl.MemoryBIO()
outgoing = ssl.MemoryBIO()

sock: Optional[socket.socket] = None
ssl_sock: Optional[ssl.SSLObject] = None


def wait_data():
    ep.poll()

    try:
        while True:
            chunk = sock.recv(1024)
            incoming.write(chunk)
    except BlockingIOError:
        pass


def wait_data_until_ssl_read_succeed():
    data = bytearray()
    try:
        wait_data()
        # while ssl_sock.pending() > 0 or incoming.pending > 0:
        while True:
            data += ssl_sock.read()
    except ssl.SSLWantReadError:
        pass

    return data


with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
    ep.register(sock.fileno(), select.EPOLLIN)

    ssl_sock = ssl_context.wrap_bio(incoming, outgoing, server_hostname='localhost')

    sock.connect(('127.0.0.1', 25000))
    sock.setblocking(False)

    handshake_complete = False
    message_sent = False

    msg = b"a" * 256

    # do handshake
    while True:
        try:
            ssl_sock.do_handshake()
            break
        except ssl.SSLWantReadError as ex:
            if outgoing.pending > 0:
                chunk = outgoing.read(outgoing.pending)
                sock.send(chunk)
            wait_data()

    # send message and wait for reply
    while True:
        ssl_sock.write(msg)
        chunk = outgoing.read(outgoing.pending)
        sock.send(chunk)

        data = wait_data_until_ssl_read_succeed()
        # print(data)

Perf output:

   17.41%     0.25%            43  python   _ssl.cpython-314-x86_64-linux-gnu.so     [.] _ssl__SSLSocket_read
            |          
             --17.16%--_ssl__SSLSocket_read
                       |          
                       |--8.09%--SSL_read_ex
                       |          |          
                       |           --7.74%--0x7b1fbef883f9
                       |                     |          
                       |                     |--4.83%--0x7b1fbefadc22
                       |                     |          |          
                       |                     |          |--0.93%--0x7b1fbefa6919
                       |                     |          |          |          
                       |                     |          |           --0.90%--EVP_DecryptUpdate
                       |                     |          |                     |          
                       |                     |          |                      --0.90%--0x7b1fbec90c8b
                       |                     |          |                                |          
                       |                     |          |                                 --0.87%--0x7b1fbec90b45
                       |                     |          |          
                       |                     |          |--0.79%--0x7b1fbefa64a5
                       |                     |          |          |          
                       |                     |          |           --0.61%--EVP_CIPHER_CTX_get_iv_length
                       |                     |          |          
                       |                     |          |--0.72%--0x7b1fbefa6721
                       |                     |          |          |          
                       |                     |          |           --0.60%--EVP_CipherInit_ex
                       |                     |          |          
                       |                     |          |--0.57%--0x7b1fbefa6750
                       |                     |          |          
                       |                     |           --0.52%--0x7b1fbefa68f0
                       |                     |          
                       |                     |--1.13%--0x7b1fbefadf94
                       |                     |          |          
                       |                     |           --0.52%--0x7b1fbefac3c7
                       |                     |          
                       |                      --0.68%--0x7b1fbefad750
                       |          
                       |--6.21%--PySSL_SetError.constprop.0
                       |          |          
                       |           --5.17%--fill_and_set_sslerror
                       |                     |          
                       |                     |--2.83%--PyUnicode_FromFormat
                       |                     |          |          
                       |                     |           --2.54%--unicode_from_format
                       |                     |                     |          
                       |                     |                      --1.23%--__sprintf_chk
                       |                     |                                __vsprintf_internal
                       |                     |                                |          
                       |                     |                                 --0.99%--__vfprintf_internal
                       |                     |          
                       |                      --1.01%--PyObject_SetAttr
                       |                                |          
                       |                                 --0.98%--PyObject_GenericSetAttr
                       |                                           |          
                       |                                            --0.52%--_PyObjectDict_SetItem
                       |          
                        --0.60%--SSL_get_error

To reproduce you would need some ssl echo server running on localhost 25000 port. After you have started it, run echo client code under perf.

$ perf record -F 999 -g --call-graph lbr --user-callchains -- python echo_client.py
$ perf report -G -n --stdio

Let it work for 15 seconds and then press Ctrl-C

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@NoneTypeCoder
Copy link

NoneTypeCoder commented Oct 22, 2024

How do you get this perf result? I want to learn about this, how to do perf for my code, thanks for your answer.

@tarasko
Copy link
Author

tarasko commented Oct 22, 2024

How do you get this perf result? I want to learn about this, how to do perf for my code, thanks for your answer.

I have updated description with instruction on how to run perf

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

We cannot really do much with SSL_read_ex because the latter is provided by OpenSSL. However, I'll try to improve the CPython part.

@picnixz picnixz self-assigned this Dec 26, 2024
@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

After investigation, it appears it would become quite difficult to optimize the exception construction unless we optimize something for exceptions in general. The problem is as follows:

  • Multiple attributes must be set on the new objects and there are redundant checks which, unless we refactor the module quite a lot, would be hard to add.
  • Most of the time spent for the construction is split between SSL_read_ex and fill_and_set_sslerror. We cannot do anything with SSL_read_ex and we can only try improving calls to PyUnicode_FromFormat and possibly calls for setting attributes. However, changing them to a plain UnicodeWriter does not give much better performances. I havent' really thought much about it but unless someone is willing to optimize the construction of error messages and/or exceptions (we could have dedicated attributes and creating dedicated exceptions inheriting from OSError, maybe this could help us but I'm not really sure we want to do it for now), I'm afraid this issue will be hard to improve =/

Some questions:

  • how much time does it effectively spend for creating the exception?
  • is it possible to protect against that behaviour by waiting for the full SSL frame or would it be against the definition of a non-blocking SSL socket?
  • maybe the issue is somewhere else, namely, maybe we raise the exception too early?

cc @gpshead for another SSL expert.

@picnixz picnixz removed their assignment Dec 26, 2024
@tarasko
Copy link
Author

tarasko commented Dec 26, 2024

I had a couple of thoughts on how to optimize this:

  1. Would it be possible to raise the same exception object and pre-create them during module initialization?

  2. If not, then a half-measure would be to not format anything at all for SSLWantRead/SSLWantWrite error. Use preallocated string or None. I doubt anyone ever would read the message of SSLWantRead exception. This will at least eliminate this part
    2.83%--PyUnicode_FromFormat

It is calling PyUnicode_FromFormat in order to put file, line number into the error message, which doesn't have much practical benefits especially for SSLWantRead/SSLWantWrite

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

Would it be possible to raise the same exception object and pre-create them during module initialization?

No, because the objects should be freshly created. In addition, the exception might come from various places in ssl.c, and the message it contains depends on the underlying libssl (and what needs to be cached has mostly been cached I think).

If not, then a half-measure would be to not format anything at all for SSLWantRead/SSLWantWrite error. Use preallocated string or None. I doubt anyone ever would read the message of SSLWantRead exception. This will at least eliminate this part

Unfortunately, we cannot say that anyone would ever read the message. People that want to debug issues may want to read it (especially since it contains SSL-based dedicated messages, with the error's reason and library). This is something we should include in the message. It wouldn't be very nice to ask users to log the SSL error attributes in case of a failure (note that SSL exceptions inherit from OSError and those also include errnos & co, so we probably need to format them in a similar manner).

It is calling PyUnicode_FromFormat in order to put file, line number into the error message, which doesn't have much practical benefits especially for SSLWantRead/SSLWantWrite

see above.

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

Now, if you want to simplify the error messages for some SSL exceptions, I'd first suggest gathering some support on DPO: https://discuss.python.org/c/ideas/6. Depending on the result, we might proceed like that, but I'm not sure the community that is interested in those messages is actually present on DPO (so we would perhaps be asking those that don't need it)

@tarasko
Copy link
Author

tarasko commented Dec 26, 2024

is it possible to protect against that behaviour by waiting for the full SSL frame or would it be against the definition of a non-blocking SSL socket?

It is possible sometimes to predict SSLWantReadError by checking for
MemoryBIO.pending == 0 and SSLObject.pending() == 0

I proposed such change for uvloop MagicStack/uvloop#629

In my particular case it works very well, my client and server exchange small messages. I have literally 0 SSLWantReadError and those checks are very fast comparing to SSLWantReadError construction.

If I replace while True with while ssl_sock.pending() > 0 or incoming.pending > 0 in wait_data_until_ssl_read_succeed
then perf output looks like this:

    10.07%     0.16%            27  python   _ssl.cpython-310-x86_64-linux-gnu.so      [.] _ssl__SSLSocket_read
            |          
             --9.91%--_ssl__SSLSocket_read
                       |          
                        --8.88%--SSL_read_ex
                                  |          
                                   --8.56%--ssl3_read_internal
                                             |          
                                              --8.49%--ssl3_read_bytes
                                                        |          
                                                        |--7.20%--ssl3_get_record
                                                        |          |          
                                                        |          |--5.29%--tls13_enc
                                                        |          |          |          
                                                        |          |          |--1.46%--EVP_DecryptUpdate
                                                        |          |          |          |          
                                                        |          |          |           --1.45%--ossl_gcm_stream_update
                                                        |          |          |                     |          
                                                        |          |          |                      --1.41%--gcm_cipher_internal
                                                        |          |          |                                |          
                                                        |          |          |                                 --1.04%--generic_aes_gcm_cipher_update
                                                        |          |          |          
                                                        |          |          |--1.31%--EVP_CipherInit_ex
                                                        |          |          |          |          
                                                        |          |          |           --1.04%--evp_cipher_init_internal
                                                        |          |          |                     |          
                                                        |          |          |                      --0.80%--EVP_CIPHER_CTX_get_iv_length
                                                        |          |          |          
                                                        |          |           --0.53%--EVP_CIPHER_CTX_ctrl
                                                        |          |          
                                                        |          |--0.63%--ssl3_read_n
                                                        |          |          
                                                        |           --0.58%--ERR_set_mark
                                                        |          
                                                         --0.59%--ssl3_setup_read_buffer


     0.94%     0.21%            37  python   _ssl.cpython-310-x86_64-linux-gnu.so      [.] _ssl__SSLSocket_pending
            |          
             --0.72%--_ssl__SSLSocket_pending

@gpshead
Copy link
Member

gpshead commented Dec 26, 2024

I think a workaround like that for the common case is the most useful strategy today. (no informed opinion from me on if that is the right logic or not, but the idea is good)

Some PyUnicode_FromFormat calls from fill_and_set_sslerror could technically be cached @functools.lrucache style - though that could be a pain to implement. That'd give an answer to the "Would the cache lookup cost be meaningfully cheaper?" question. Probably overkill. Per your profile that'd save at most 40% on this exception path. Per the C code, some of the things being rendered into strings are clearly always constants and could be pre-computed but in C the compiler has no way of knowing that through this API stack and the problem is presumably more that we're dynamically creating strings in the first place, not the total number of things we put in them.

I do think a fast path for the WantRead/Write errors that raise a pre-created singleton with .library and .reason already set to None on the class is viable. If there is some reason a singleton exception instance can't be used, a specific subclass that skips a lot of attribute setting via pre-set class attributes could perhaps still help. These exceptions are a very specific purpose that should be obvious based on the call they are raised from. There is no practical need for extra information on their instances.

We could possibly already avoid the PyObject_SetAttr calls today already by setting those to None on the base class itself and not calling SetAttr when the value would be None.

@gpshead
Copy link
Member

gpshead commented Dec 26, 2024

ie: this exception creating path was clearly designed with informative error messages in mind, not rapid control flow alternate path logic.

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

I converted the code to use the private UnicodeWriter with dedicated pre-allocation (this can be precomputed so that we don't over-allocate strings) and skip snprintf calls, but it wasn't really convincing.

What I wanted to however do is change the exception type itself and make it a full-fledged heap type inheriting from OSError but the refactoring would be better left to an improved API for creating exception types with pre-defined attributes (probably a clinic thing IMO, or something that could be used elsewhere) so I didn't really go that path.

I can continue tomorrow investigating this.

@kumaraditya303
Copy link
Contributor

I would be interested to know how much difference this makes on asyncio_tcp_ssl benchmark. https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_asyncio_tcp/run_benchmark.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage topic-SSL type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants