Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected connection_lost() call when cancelling a direct_tls connection with an active request can lead to a connection leak #1211

Open
jraby opened this issue Nov 15, 2024 · 0 comments

Comments

@jraby
Copy link

jraby commented Nov 15, 2024

  • asyncpg version: 0.30.0
  • PostgreSQL version: 16
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : CloudSQL with cloud sql connector, but it is possible to reproduce it locally with a direct_tls setup.
  • Python version: 3.12
  • Platform: linux / macos
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: poetry
  • If you built asyncpg locally, which version of Cython did you use?:
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : the reproducer uses asyncio, but I hit the problem "in prod" with uvicorn / uvloop

Hi, I reported the issue first to sqlalchemy (sqlalchemy/sqlalchemy#12099) but managed to reproduce it using only asyncpg.

From what I could observe, if a direct_tls connection is waiting for postgres to reply and the task within which this connection lives is cancelled, calling connection.close() in an exception handler will fail with unexpected connection_lost() call, leaving the connection open. (a call to connection.terminate() after that doesn't close the connection either)

The problem only occurs with direct_tls=True.
The issue was first observed with sqlalchemy with a connection pool of asyncpg connections.
SA thought the connections were closed and would then open new ones, this lead to using up all the available slots on the pg side.

This script can reproduce the issue, however, it needs a postgres setup where direct_tls can be used.
There a docker-compose file at https://github.com/brittlesoft/repro-starlette-sa-conn-leak that can be used to get a working setup quickly.

import asyncio
import asyncpg

import logging
logging.basicConfig(level=logging.DEBUG)

async def do(i):
    try:
        # connect using direct_tls leads to `unexpected connection_lost() call` when calling conn.close()
        conn = await asyncpg.connect('postgresql://postgres:postgres@localhost:5443/postgres', direct_tls=True)

        # connect using default params works fine
        #conn = await asyncpg.connect('postgresql://postgres:postgres@localhost:5432/postgres')

        # lock and simulate work (or block if lock already taken) -- using select pg_sleep(10) would also work
        await conn.execute("select pg_advisory_lock(1234)")
        await asyncio.sleep(10)

    except BaseException as e:
        print(i,"got exc:", e, type(e))
        try:
            await conn.close(timeout=2)
        except BaseException as e:
            print(i, "close got exc: ",e)

            # NOTE: aborting transport here seems to release the connection
            #conn._transport.abort()

            try:
                print(i, "calling terminate")
                conn.terminate()
            except BaseException as e:
                print(i, "terminate got exc: ",e)


async def main():
    ts = []
    for i in range(10):
        ts.append(asyncio.create_task(do(i)))

    async def timeouter():
        await asyncio.sleep(1)
        for t in ts:
            t.cancel()

    asyncio.create_task(timeouter())

    try:
        await asyncio.gather(*ts)
    except asyncio.CancelledError:
        print("cancelled")

    # Sleep so we can observe the state of the connections
    await asyncio.sleep(30)

if __name__ == '__main__':
    asyncio.run(main())

output:

DEBUG:asyncio:Using selector: KqueueSelector
0 got exc:  <class 'asyncio.exceptions.CancelledError'>
1 got exc:  <class 'asyncio.exceptions.CancelledError'>
2 got exc:  <class 'asyncio.exceptions.CancelledError'>
3 got exc:  <class 'asyncio.exceptions.CancelledError'>
4 got exc:  <class 'asyncio.exceptions.CancelledError'>
5 got exc:  <class 'asyncio.exceptions.CancelledError'>
6 got exc:  <class 'asyncio.exceptions.CancelledError'>
7 got exc:  <class 'asyncio.exceptions.CancelledError'>
8 got exc:  <class 'asyncio.exceptions.CancelledError'>
9 got exc:  <class 'asyncio.exceptions.CancelledError'>
1 close got exc:  unexpected connection_lost() call
1 calling terminate
7 close got exc:  unexpected connection_lost() call
7 calling terminate
0 close got exc:  unexpected connection_lost() call
0 calling terminate
4 close got exc:  unexpected connection_lost() call
4 calling terminate
5 close got exc:  unexpected connection_lost() call
5 calling terminate
3 close got exc:  unexpected connection_lost() call
3 calling terminate
9 close got exc:  unexpected connection_lost() call
9 calling terminate
6 close got exc:  unexpected connection_lost() call
6 calling terminate
8 close got exc:  unexpected connection_lost() call
8 calling terminate

And it postgres we see this:

 2024-11-15 13:29:02.164078+00 | 2024-11-15 13:29:02.164078+00 | 2024-11-15 13:29:02.164078+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188851+00 | 2024-11-15 13:29:02.188851+00 | 2024-11-15 13:29:02.188852+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
                               | 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:03.04934+00  | Client          | ClientRead          | idle   |             |              |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:02.163994+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.189558+00 | 2024-11-15 13:29:02.189558+00 | 2024-11-15 13:29:02.189558+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188884+00 | 2024-11-15 13:29:02.188884+00 | 2024-11-15 13:29:02.188885+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188826+00 | 2024-11-15 13:29:02.188826+00 | 2024-11-15 13:29:02.188829+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.189478+00 | 2024-11-15 13:29:02.189478+00 | 2024-11-15 13:29:02.189479+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.19116+00  | 2024-11-15 13:29:02.19116+00  | 2024-11-15 13:29:02.191161+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant