Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Always create TaskCompletionSource<T> with TaskCreationOptions.RunContinuationsAsynchronously" may be detrimentally overcorrecting. #72

Open
LBensman opened this issue May 21, 2021 · 2 comments

Comments

@LBensman
Copy link

I think at minimum, this advice should be clarified in two ways: example improved to actually show how a deadlock or other adversity can happen (example just shows when it'll run on the same thread, not how one gets in trouble when that happens; I don't have one in C# of the top of my head but see footnote2); and highlight performance trade-offs when this flag is used, as it does come at a penalty(!).

In fact, I spent couple of days thinking about it, and the more I'm thinking about it, the more I'm starting to lean toward that it may actually be a counter-productive (ok, bad 😄, forgive me 😹 ) advice. Here's my rationale.

There are two parts to my argument:

  1. There's a performance cost associated with the TaskCreationOptions.RunContinuationsAsynchronously, and using it indiscriminately bears that cost.
  2. I believe it doesn't solve all the cases, and still demands that care be taken in the continuation tasks; and is not a magic solution to all the problems. (Afterall, I think there's a reason that it's not the default option). In fact, it may be exacerbating them by virtue of masking (hiding) them, rather than fixing them

So, first, onto the performance part.

When a task is run in-line - that is, non-asynchronous continuation - it's run immediately as a direct function call. The cost of invoking that function is simply building the stack call frame for the function and making the execution jump to the first instruction of that function. And that has to happen regardless of inline or async continuation, and so inline execution is pretty much done at zero net cost.

Asynchronous continuation, however, is more involved. Instead of making that direct call, it is placed into a work queue. Another pool thread gets notified that there's available work, which then dequeues the work and only then calls the continuation function. So this trip through the queuing is the extra cost that is incurred with the TaskCreationOptions.RunContinuationsAsynchronously.

Below example is from a part of TPL code - not sure if that's the exact part that handles this or something similar for some particular variation, but I think that's illustrative:

https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Threading/Tasks/RendezvousAwaitable.cs#L101-L118

The above makes it pretty clear: if no RunContinuationsAsynchronously is demanded, it's a direct call to the continuing action when it does c(). But if it does demand asynchronous continuation, it sends the action down the Task.Run's tunnels which "do a few things" (in fact, I was under impression that TPL used lock-less work-stealing queues, but I do see use of lock in scheduling code, which adds to penalty), and those "few things" add [often] unnecessary performance cost. "Often" as in if flag used indiscriminately whether it's needed to prevent deadlock or if there is no possibility of deadlock even without it.

As I understand it, most tasks that result in calling async functions or task-returning functions default to inline execution. E.g. if you call a task-function without await and that function never results in an operation that requires some resource completion later (e.g. you call Stream.ReadAsync() where data was already in the buffer and didn't require waiting for a pending read operation to complete), the function will complete and continue to run the remaining part of the caller on the same thread. That's efficient. If crossing every task boundary was done via pool roundtrip, it would be much less performant (e.g. every Stream.ReadAsync() call result to be forced through the pool so that non-io completion doesn't run inline).2

So making all TaskCompletionSources be async continuations may help reduce unsuspecting deadlocks or w/e issues due to poorly written caller, I suspect that accomplishes so at a substantial performance cost. This is where I think the advice is not complete in its recommendation, and is misleading.

The real issue is not that continuation may run inline! The real issue is that caller and/or continuation are not written properly to handle such a case! (Ergo, wrong solution at the wrong place.) Which brings me to the second argument.

You can't control all TaskCompletionSources, or rather that all TaskCompletionSources in a code path will use TaskCreationOptions.RunContinuationsAsynchronously configuration option. It's not just a matter of changing it all in one's code! All 3rd party libraries, including .NET Core, would have to all be configured the same with that option. If one instance of TaskCompletionSource in a library one doesn't control doesn't make use of this option, and the scenario is where it can cause the problem, then it can cause the problem b/c it may run inline. Thus, if one is diligent to always set TaskCreationOptions.RunContinuationsAsynchronously on ALL their TaskCompletionSources in their code, it seems they're still screwed in case continuation is called from 3rd party library's TaskCompletionSource that didn't set it. Back to SquareOne™!!! Except now you don't even suspect that that's an issue, because you've been diligent to have it run through the pool!

The problem with this solution, it seems to me, is that it requires the callee (i.e. code issuing TaskCompletionSource instance) to do something special to play well with a caller that it doesn't know. It's much safer, though, when the caller and/or continuation makes itself safe and hardened against potential inline execution of continuation, and takes that burden upon itself, IMHO.

I suspect these are at least part of the reason TPL defaults to inline - to use more performant operation so as not to penalize by default, and because such issue is better resolved at the consumer rather than the producer end. (And doing so is even easier than the C++ example in the footnotes: with a simple await Task.Yield();).

I may very well have missed something, or failed to consider, so very much welcome your thoughts on the above.


1 I mentioned earlier that I don't have a good C# example with which to improve, but I do have C++. One codebase I'm currently working on actually has in-house written C++ asynchronous library modeled, to a degree, on .NET's TPL. It has a notion similar to a promise, which, when resolved, will run all lambdas that were added to be notified of the resolution. It doesn't even have a concept of running such flag to call such functions via a thread pool. Such burden is placed on consumer of API to make such determination, and handle accordingly. And so example when such thing is done to avoid deadlock/exception is when caller is holding a non-recursive (non-reentrant) lock on std::mutex while calling an asynchronous function. Since that function may or may not complete while holding a lock, the handler the explicitly moves further processing via a pool. But other code, when not in danger of inline mutex, would continue to run biz logic potentially inline since in those cases it's safe to do so, and is faster than always queueing to pool. So:

Promise A::DoLockedAsync();  // Must hold lock when calling.
Promise A::DoAsync();  // no need to hold lock to call.

void B::DoIt()
{
  std::lock<std::mutex> l(_mutex);

  A::DoLockedAsync().wait([](auto result) {
    // This may run inline and we're holding a lock, so queue to pool.
    ThreadPool.enqueue([result](){
      std::lock_guard<std::mutex> g(_mutex);
      ConsumeLocked(result) });
  });

  l.unlock();

  A::DoAsync().wait([](auto result) {
    // This may run inline and don't care if so.
    Consume(result) });
  });
}

... but I don't recall C# having non-recursive locks, and so, as I mentioned, can't seem to come up with another example.


2 This problem, by the way, is not native to .NET's TPL. Most JavaScript libraries that issue promises have it where attaching a handler to an already resolved promise will fire that handler inline, rather than via its message pump. AFAIK, these libraries do not attempt to avoid firing inline (by use of setTimeout(0, c), for example) and leave it up to the consumer of the API to handle so appropriately.

@davidfowl
Copy link
Owner

You're not gonna change my mind on this advice. I have no problem helping to clarify things but the extreme stances and language is provided to the 98% case and to avoid the nuance in the 2% case when you know what you're doing.

@ontorder
Copy link

ontorder commented Nov 7, 2024

if i have to be honest async continuation was kinda what i expected from Try/SetResult, so i was surprised when instead some synchronization stuff in my code was not working and i saw that execution was going directly in the method that was waiting for the TCS, but after adjusting to set the state correctly before calling SetResult all was fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants