Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for async generator finalization #1564

Merged
merged 23 commits into from
Aug 5, 2020
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
26f1aea
Async generator hooks, simpler approach
oremanj Jun 10, 2020
ac3e46d
blacken
oremanj Jun 10, 2020
2947de7
Respond to review comments + add more tests
oremanj Jun 25, 2020
d198ed2
blacken
oremanj Jun 25, 2020
b720303
Merge remote-tracking branch 'origin/master' into asyncgenhooks-basic
oremanj Jun 25, 2020
45c2eee
flake8
oremanj Jun 25, 2020
2d81bb0
Fix mismerge with master
oremanj Jun 25, 2020
5a37f78
Work correctly in -Werror mode too
oremanj Jun 25, 2020
3acf34b
Handle asyncgens correctly when Trio is the guest of an asyncio host
oremanj Jun 25, 2020
80e8ab3
Fix 3.6
oremanj Jun 25, 2020
cfdb850
Make tests pass on pypy 7.2 which doesn't run firstiter hooks
oremanj Jun 25, 2020
153ce13
Hopefully resolve coverage issues
oremanj Jun 25, 2020
ebaf69c
blacken
oremanj Jun 25, 2020
5b2c544
Merge remote-tracking branch 'origin/master' into asyncgenhooks-basic
oremanj Jun 25, 2020
936ccdb
Add docs and newsfragment
oremanj Jun 25, 2020
7d0fcd9
Fix formatting
oremanj Jun 25, 2020
86a8b7d
Merge remote-tracking branch 'origin/master' into asyncgenhooks-basic
oremanj Jul 7, 2020
4776728
Fix flake8
oremanj Jul 7, 2020
5574b9e
Merge remote-tracking branch 'origin/master' into asyncgenhooks-basic
oremanj Jul 24, 2020
9868f25
Respond to review comments; split asyncgens logic into a separate file
oremanj Jul 24, 2020
2af6974
Fix mypy
oremanj Jul 24, 2020
33d168e
Merge remote-tracking branch 'origin/master' into asyncgenhooks-basic
oremanj Aug 5, 2020
0441952
Review responses
oremanj Aug 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file modified docs/source/conf.py
100644 → 100755
Empty file.
176 changes: 176 additions & 0 deletions docs/source/reference-core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1465,6 +1465,182 @@ don't have any special access to Trio's internals.)
:members:


.. _async-generators:

Notes on async generators
-------------------------

Python 3.6 added support for *async generators*, which can use
``await``, ``async for``, and ``async with`` in between their ``yield``
statements. As you might expect, you use ``async for`` to iterate
over them. :pep:`525` has many more details if you want them.

For example, the following is a roundabout way to print
the numbers 0 through 9 with a 1-second delay before each one::

async def range_slowly(*args):
"""Like range(), but adds a 1-second sleep before each value."""
for value in range(*args):
await trio.sleep(1)
yield value

async def use_it():
async for value in range_slowly(10):
print(value)

trio.run(use_it)

Trio supports async generators, with some caveats described in this section.

Finalization
~~~~~~~~~~~~

If you iterate over an async generator in its entirety, like the
example above does, then the execution of the async generator will
occur completely in the context of the code that's iterating over it,
and there aren't too many surprises.

If you abandon a partially-completed async generator, though, such as
by ``break``\ing out of the iteration, things aren't so simple. The
async generator iterator object is still alive, waiting for you to
resume iterating it so it can produce more values. At some point,
Python will realize that you've dropped all references to the
iterator, and will call on Trio to throw in a `GeneratorExit` exception
so that any remaining cleanup code inside the generator has a chance
to run: ``finally`` blocks, ``__aexit__`` handlers, and so on.

So far, so good. Unfortunately, Python provides no guarantees about
*when* this happens. It could be as soon as you break out of the
``async for`` loop, or an arbitrary amount of time later. It could
even be after the entire Trio run has finished! Just about the only
guarantee is that it *won't* happen in the task that was using the
generator. That task will continue on with whatever else it's doing,
and the async generator cleanup will happen "sometime later,
somewhere else": potentially with different context variables,
not subject to timeouts, and/or after any nurseries you're using have
been closed.

If you don't like that ambiguity, and you want to ensure that a
generator's ``finally`` blocks and ``__aexit__`` handlers execute as
soon as you're done using it, then you'll need to wrap your use of the
generator in something like `async_generator.aclosing()
<https://async-generator.readthedocs.io/en/latest/reference.html#context-managers>`__::

# Instead of this:
async for value in my_generator():
if value == 42:
break

# Do this:
async with aclosing(my_generator()) as aiter:
async for value in aiter:
if value == 42:
break

This is cumbersome, but Python unfortunately doesn't provide any other
reliable options. If you use ``aclosing()``, then
your generator's cleanup code executes in the same context as the
rest of its iterations, so timeouts, exceptions, and context
variables work like you'd expect.

If you don't use ``aclosing()``, then Trio will do
its best anyway, but you'll have to contend with the following semantics:

* The cleanup of the generator occurs in a cancelled context, i.e.,
all blocking calls executed during cleanup will raise `Cancelled`.
This is to compensate for the fact that any timeouts surrounding
the original use of the generator have been long since forgotten.

* The cleanup runs without access to any :ref:`context variables
<task-local-storage>` that may have been present when the generator
was originally being used.

* If the generator raises an exception during cleanup, then it's
printed to the ``trio.async_generator_errors`` logger and otherwise
ignored.

* If an async generator is still alive at the end of the whole
call to :func:`trio.run`, then it will be cleaned up after all
tasks have exited and before :func:`trio.run` returns.
Since the "system nursery" has already been closed at this point,
Trio isn't able to support any new calls to
:func:`trio.lowlevel.spawn_system_task`.

If you plan to run your code on PyPy to take advantage of its better
performance, you should be aware that PyPy is *far more likely* than
CPython to perform async generator cleanup at a time well after the
last use of the generator. (This is a consequence of the fact that
PyPy does not use reference counting to manage memory.) To help catch
issues like this, Trio will issue a `ResourceWarning` (ignored by
default, but enabled when running under ``python -X dev`` for example)
for each async generator that needs to be handled through the fallback
finalization path.

Cancel scopes and nurseries
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. warning:: You may not write a ``yield`` statement that suspends an async generator
inside a `CancelScope` or `Nursery` that was entered within the generator.

That is, this is OK::

async def some_agen():
with trio.move_on_after(1):
await long_operation()
yield "first"
async with trio.open_nursery() as nursery:
nursery.start_soon(task1)
nursery.start_soon(task2)
yield "second"
...

But this is not::

async def some_agen():
with trio.move_on_after(1):
yield "first"
async with trio.open_nursery() as nursery:
yield "second"
...

Async generators decorated with ``@asynccontextmanager`` to serve as
the template for an async context manager are *not* subject to this
constraint, because ``@asynccontextmanager`` uses them in a limited
way that doesn't create problems.

Violating the rule described in this section will sometimes get you a
useful error message, but Trio is not able to detect all such cases,
so sometimes you'll get an unhelpful `TrioInternalError`. (And
sometimes it will seem to work, which is probably the worst outcome of
all, since then you might not notice the issue until you perform some
minor refactoring of the generator or the code that's iterating it, or
just get unlucky. There is a `proposed Python enhancement
<https://discuss.python.org/t/preventing-yield-inside-certain-context-managers/1091>`__
that would at least make it fail consistently.)

The reason for the restriction on cancel scopes has to do with the
difficulty of noticing when a generator gets suspended and
resumed. The cancel scopes inside the generator shouldn't affect code
running outside the generator, but Trio isn't involved in the process
of exiting and reentering the generator, so it would be hard pressed
to keep its cancellation plumbing in the correct state. Nurseries
use a cancel scope internally, so they have all the problems of cancel
scopes plus a number of problems of their own: for example, when
the generator is suspended, what should the background tasks do?
There's no good way to suspend them, but if they keep running and throw
an exception, where can that exception be reraised?

If you have an async generator that wants to ``yield`` from within a nursery
or cancel scope, your best bet is to refactor it to be a separate task
that communicates over memory channels.

For more discussion and some experimental partial workarounds, see
Trio issues `264 <https://github.com/python-trio/trio/issues/264>`__
(especially `this comment
<https://github.com/python-trio/trio/issues/264#issuecomment-418989328>`__)
and `638 <https://github.com/python-trio/trio/issues/638>`__.


.. _threads:

Threads (if you must)
Expand Down
6 changes: 6 additions & 0 deletions newsfragments/265.headline.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Trio now supports automatic :ref:`async generator finalization
<async-generators>`, so more async generators will work even if you
don't wrap them in ``async with async_generator.aclosing():``
blocks. Please see the documentation for important caveats; in
particular, yielding within a nursery or cancel scope remains
unsupported.
193 changes: 193 additions & 0 deletions trio/_core/_asyncgens.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
import attr
import logging
import sys
import warnings
import weakref

from .._util import name_asyncgen
from . import _run
from .. import _core

# Used to log exceptions in async generator finalizers
ASYNCGEN_LOGGER = logging.getLogger("trio.async_generator_errors")


@attr.s(eq=False, slots=True)
class AsyncGenerators:
# Async generators are added to this set when first iterated. Any
# left after the main task exits will be closed before trio.run()
# returns. During most of the run, this is a WeakSet so GC works.
# During shutdown, when we're finalizing all the remaining
# asyncgens after the system nursery has been closed, it's a
# regular set so we don't have to deal with GC firing at
# unexpected times.
alive = attr.ib(factory=weakref.WeakSet)

# This collects async generators that get garbage collected during
# the one-tick window between the system nursery closing and the
# init task starting end-of-run asyncgen finalization.
trailing_needs_finalize = attr.ib(factory=set)

prev_hooks = attr.ib(init=False)

def install_hooks(self, runner):
def firstiter(agen):
if hasattr(_run.GLOBAL_RUN_CONTEXT, "task"):
self.alive.add(agen)
else:
# An async generator first iterated outside of a Trio
# task doesn't belong to Trio. Probably we're in guest
# mode and the async generator belongs to our host.
# The locals dictionary is the only good place to
# remember this fact, at least until
# https://bugs.python.org/issue40916 is implemented.
agen.ag_frame.f_locals["@trio_foreign_asyncgen"] = True
if self.prev_hooks.firstiter is not None:
self.prev_hooks.firstiter(agen)

def finalize_in_trio_context(agen, agen_name):
try:
runner.spawn_system_task(
self._finalize_one,
agen,
agen_name,
name=f"close asyncgen {agen_name} (abandoned)",
)
except RuntimeError:
# There is a one-tick window where the system nursery
# is closed but the init task hasn't yet made
# self.asyncgens a strong set to disable GC. We seem to
# have hit it.
self.trailing_needs_finalize.add(agen)

def finalizer(agen):
agen_name = name_asyncgen(agen)
try:
is_ours = not agen.ag_frame.f_locals.get("@trio_foreign_asyncgen")
except AttributeError: # pragma: no cover
is_ours = True

if is_ours:
runner.entry_queue.run_sync_soon(
finalize_in_trio_context, agen, agen_name
)

# Do this last, because it might raise an exception
# depending on the user's warnings filter. (That
# exception will be printed to the terminal and
# ignored, since we're running in GC context.)
warnings.warn(
f"Async generator {agen_name!r} was garbage collected before it "
f"had been exhausted. Surround its use in 'async with "
f"aclosing(...):' to ensure that it gets cleaned up as soon as "
f"you're done using it.",
ResourceWarning,
stacklevel=2,
source=agen,
)
else:
# Not ours -> forward to the host loop's async generator finalizer
if self.prev_hooks.finalizer is not None:
self.prev_hooks.finalizer(agen)
else:
# Host has no finalizer. Reimplement the default
# Python behavior with no hooks installed: throw in
# GeneratorExit, step once, raise RuntimeError if
# it doesn't exit.
closer = agen.aclose()
try:
# If the next thing is a yield, this will raise RuntimeError
# which we allow to propagate
closer.send(None)
except StopIteration:
pass
else:
# If the next thing is an await, we get here. Give a nicer
# error than the default "async generator ignored GeneratorExit"
raise RuntimeError(
f"Non-Trio async generator {agen_name!r} awaited something "
f"during finalization; install a finalization hook to "
f"support this, or wrap it in 'async with aclosing(...):'"
)

self.prev_hooks = sys.get_asyncgen_hooks()
sys.set_asyncgen_hooks(firstiter=firstiter, finalizer=finalizer)

async def finalize_remaining(self, runner):
# This is called from init after shutting down the system nursery.
# The only tasks running at this point are init and
# the run_sync_soon task, and since the system nursery is closed,
# there's no way for user code to spawn more.
assert _core.current_task() is runner.init_task
assert len(runner.tasks) == 2

# To make async generator finalization easier to reason
# about, we'll shut down asyncgen garbage collection by turning
# the alive WeakSet into a regular set.
self.alive = set(self.alive)

# Process all pending run_sync_soon callbacks, in case one of
# them was an asyncgen finalizer that snuck in under the wire.
runner.entry_queue.run_sync_soon(runner.reschedule, runner.init_task)
await _core.wait_task_rescheduled(
lambda _: _core.Abort.FAILED # pragma: no cover
)
self.alive.update(self.trailing_needs_finalize)
self.trailing_needs_finalize.clear()

# None of the still-living tasks use async generators, so
# every async generator must be suspended at a yield point --
# there's no one to be doing the iteration. That's good,
# because aclose() only works on an asyncgen that's suspended
# at a yield point. (If it's suspended at an event loop trap,
# because someone is in the middle of iterating it, then you
# get a RuntimeError on 3.8+, and a nasty surprise on earlier
# versions due to https://bugs.python.org/issue32526.)
#
# However, once we start aclose() of one async generator, it
# might start fetching the next value from another, thus
# preventing us from closing that other (at least until
# aclose() of the first one is complete). This constraint
# effectively requires us to finalize the remaining asyncgens
# in arbitrary order, rather than doing all of them at the
# same time. On 3.8+ we could defer any generator with
# ag_running=True to a later batch, but that only catches
# the case where our aclose() starts after the user's
# asend()/etc. If our aclose() starts first, then the
# user's asend()/etc will raise RuntimeError, since they're
# probably not checking ag_running.
#
# It might be possible to allow some parallelized cleanup if
# we can determine that a certain set of asyncgens have no
# interdependencies, using gc.get_referents() and such.
# But just doing one at a time will typically work well enough
# (since each aclose() executes in a cancelled scope) and
# is much easier to reason about.

# It's possible that that cleanup code will itself create
# more async generators, so we iterate repeatedly until
# all are gone.
while self.alive:
batch = self.alive
self.alive = set()
for agen in batch:
await self._finalize_one(agen, name_asyncgen(agen))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, does the batching here do anything? Would it work just as well to write:

while self.alive:
    agen = self.alive.pop()
    await self._finalize_one(agen, name_asyncgen(agen))

? I'm not too worried either way but I'm wondering if I'm missing something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batching finalizes the currently-active generators before any new ones that get firstiter'ed during that process. I can't think of anywhere that the difference would matter but the batch-based ordering seems more intuitive to me.


def close(self):
sys.set_asyncgen_hooks(*self.prev_hooks)

async def _finalize_one(self, agen, name):
try:
# This shield ensures that finalize_asyncgen never exits
# with an exception, not even a Cancelled. The inside
# is cancelled so there's no deadlock risk.
with _core.CancelScope(shield=True) as cancel_scope:
cancel_scope.cancel()
await agen.aclose()
except BaseException:
ASYNCGEN_LOGGER.exception(
"Exception ignored during finalization of async generator %r -- "
"surround your use of the generator in 'async with aclosing(...):' "
"to raise exceptions like this in the context where they're generated",
name,
)
Loading