read() and read_multiple() block when consuming from python-managed pipes or sockets #354

fungs · 2024-02-14T00:36:56Z

Hi, I know that there is ongoing work to remove or reduce the dependency on raw file descriptors in #283 and #311. However, this issue I have is with the existing implementation (reference 2.0.0b2) and how it works with pipe (os.pipe) and socket (socket.socket) objects in Python, which do expose usable raw file descriptors via fileno().

We can construct a pipe using

pipe_read, pipe_write = os.pipe()

Data can be generated or read in one thread (or process) like

chunk_size = 1024**2  # 1 MiB
with os.fdopen(pipe_write, mode = "wb") as write_file:
  while chunk := file_like_source.read(chunk_size):
    write_file.write(chunk)

and be consumed in another thread like

with os.fdopen(pipe_read, mode = "rb") as read_file:
  for item in MyStruct.read_multiple(read_file):
    do_something(item)

What I observe is, that both MyStruct.read_multiple(read_file) and write_file.write(chunk) block, if chunk_size is shorter than the serialized struct item. I hypothesize, that this has to do how the reader peaks into the data, which is in fact a stream, without actually consuming it, but I don't know.

Strangely, if a process outside Python generates the stream via process = subprocess.Popen() and writes it into a pipe via standard output using stdout=subprocess.PIPE, read_multiple() can read it without issues using process.stdout.

Maybe someone has an idea why this happens and how it could be circumvented or fixed? Happy to hear your thoughts.

The text was updated successfully, but these errors were encountered:

LasseBlaauwbroek · 2024-02-14T17:06:31Z

My first guess would be that your second snippet is somehow blocking in the C++ code, while not relinquishing the GIL. That would prevent the other thread from continuing, creating a deadlock. However, the following snippet shows that the GIL is being released since #308:

pycapnp/capnp/lib/capnp.pyx

Lines 3872 to 3873 in 1fb1687

    
           with nogil: 
        
               self.thisptr = new schema_cpp.InputStreamMessageReader(stream, opts)

Are you sure you are using a pycapnp version that is new enough?

fungs · 2024-02-16T09:44:24Z

I finally figured out how to make this approach work using processes, not threads!

Most imporantanly, for the pipe version, the buffer on the write side has to be set to 0 using os.fdopen(pipe_write, mode = "wb", buffering=0):

chunk_size = 1024**2  # 1 MiB
with os.fdopen(pipe_write, mode = "wb", buffering=0) as write_file:
  while chunk := file_like_source.read(chunk_size):
    write_file.write(chunk)

The socket version using socket.socket seems to work with buffers on the sender and receiver side.

There are also fallpits when using a process, in particular using os.pipe(): depending on the mode and platform (fork, spawn), you may have to transfer the file descriptors into the child process or make sure to close the file descriptors in both, parent and child process, otherwise the reader will wait for EOF and not terminate.

However, when I run the exact same code using threading.Thread() instead of multiprocessing.Process, the data channel blocks, still. For me, this is a strong indication, that there is an issue involving the GIL. In my case, the parent is actually a thread spawned from the main process, which could also be an issue?

fungs mentioned this issue Feb 16, 2024

Proposal: Reduce dependency on file descriptors #311

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read() and read_multiple() block when consuming from python-managed pipes or sockets #354

read() and read_multiple() block when consuming from python-managed pipes or sockets #354

fungs commented Feb 14, 2024

LasseBlaauwbroek commented Feb 14, 2024 •

edited

Loading

fungs commented Feb 16, 2024

read() and read_multiple() block when consuming from python-managed pipes or sockets #354

read() and read_multiple() block when consuming from python-managed pipes or sockets #354

Comments

fungs commented Feb 14, 2024

LasseBlaauwbroek commented Feb 14, 2024 • edited Loading

fungs commented Feb 16, 2024

LasseBlaauwbroek commented Feb 14, 2024 •

edited

Loading