Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document cancel safety #104

Open
madadam opened this issue Aug 16, 2022 · 16 comments
Open

Document cancel safety #104

madadam opened this issue Aug 16, 2022 · 16 comments
Labels
documentation Improvements or additions to documentation

Comments

@madadam
Copy link

madadam commented Aug 16, 2022

It's unclear whether Sender::send_async is cancel-safe. That is, does it guarantee that if the future returned from it is cancelled, the message is not sent? It would be great to document it. For inspiration, see tokio's docs: https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html#cancel-safety

@Restioson
Copy link
Collaborator

Restioson commented Aug 16, 2022

Notes toward documentation on this topic:

Send

  • If SendFut is called but never polled, the item is not sent.
  • If SendFut is polled and returns Poll::Pending, it may be received by a receiver at any point.

Currently, upon Drop, the SendFut will remove the item from the queue of waiting senders, if it had not already been sent. There is no method to cancel the send future and return the item if it has not already been dropped, though. This is something that we could consider to be missing from the public API and could address here.

Receive

  • If RecvFut is dropped without being polled, the item is never received.
  • If RecvFut is polled and returns Poll::Pending, it may receive an item at any point.
  • If RecvFut is dropped and it had received an item, the item will be sent back into the queue for other receivers to receive it.

As a general design principle, we decided not to worry about users who leak or mem::forget futures. If I remember correctly, we considered these cases to be logic bugs in the program, and that the channel does not have to ensure messages are received or sent in these cases. Of course, flume should never cause undefined behavior regardless of any (safe) logic bugs, as it is #![deny(unsafe_code)].

I do not quite understand the cancellation safety doc in tokio, to be perfectly honest. It is worded as very specific to tokio::select!, so does this imply that another select combinator would not also preserve this behaviour?

@Restioson Restioson added the documentation Improvements or additions to documentation label Aug 16, 2022
@madadam
Copy link
Author

madadam commented Aug 16, 2022

If SendFut is polled and returns Poll::Pending, it may be received by a receiver at any point.

Bit confused about this. Does this mean that if SendFut is polled at least once and it returns Pending and then never polled again, the message might still be sent? Btw, I would not count that as cancellation. I think cancellation implies the future is dropped before it returned Ready.

As a general design principle, we decided not to worry about users who leak or mem::forget futures.

That sounds valid and reasonable to me.

I do not quite understand the cancellation safety doc in tokio, to be perfectly honest. It is worded as very specific to tokio::select!, so does this imply that another select combinator would not also preserve this behaviour?

I'm not a tokio dev, but my understanding is that they just use tokio::select as an example of cancellation not that the cancel-safety somehow depends on it.

@Restioson
Copy link
Collaborator

Does this mean that if SendFut is polled at least once and it returns Pending and then never polled again, the message might still be sent?

Yes, indeed. The item is sent into a queue of waiters. This is an optimisation for latency - the receiver doesn't need to wake the sender and then wait for the sender to wake, get polled, send the item, and then wake the receiver again. The receiver can just take the item and wake the sender. But, in some cases the sender could be dropped after item taken but before it is woken.

@Restioson
Copy link
Collaborator

By the way, could you elaborate on the usecase mentioned for cancellation safety in launchbadge/sqlx#2054

@madadam
Copy link
Author

madadam commented Aug 16, 2022

Yes, indeed. ...

I see. So that means flume::Sender is not "cancel safe" in the same sense that tokio::mpsc::Sender is. That is, when the send_async future is cancelled (dropped), the message might still be sent. That is a bit unfortunate for my use case but I understand the performance justification.

could you elaborate on the usecase mentioned for cancellation safety in launchbadge/sqlx#2054

Sure. For the mentioned fix to work, I need to make sure that send_async does not send the message when it's cancelled. If that would not be the case (and it seems it indeed isn't), then what can happen is the BEGIN command can still be sent to the background thread before the RAII guard is created and so we could still end up with dangling transaction.

@Restioson
Copy link
Collaborator

Sure. For the mentioned fix to work, I need to make sure that send_async does not send the message when it's cancelled. If that would not be the case (and it seems it indeed isn't), then what can happen is the BEGIN command can still be sent to the background thread before the RAII guard is created and so we could still end up with dangling transaction.

Could the RAII guard be created before the send occurs? The worker thread would need to deal with the potential for spurious messages, however.

This is similar to how you usually create a guard which notifies if something happens before checking if it has actually happened, or you check twice - once before creation and once after. This is in case the notification fires after checking and before creating the listener, as then the listening guard may never be woken. I'm not sure if this makes sense, but here is an example of this pattern.

This may solve this particular issue (or maybe not!) but maybe looking at a cancellation-safe design would be good regardless, or for other use cases.

@madadam
Copy link
Author

madadam commented Aug 16, 2022

Could the RAII guard be created before the send occurs? The worker thread would need to deal with the potential for spurious messages, however.

Yes exactly. That could create the opposite problem - we could end up executing ROLLBACK without the corresponding BEGIN.

But there are other ways to solve this problem. I'm already pondering some ideas. Thanks for the input anyway!

@Restioson
Copy link
Collaborator

Restioson commented Aug 16, 2022

Yes exactly. That could create the opposite problem - we could end up executing ROLLBACK without the corresponding BEGIN.

In theory, maybe the worker can keep track of whether the corresponding BEGIN has been executed and ignore it if so, but this is getting a bit off topic now :) Good luck!

@abonander
Copy link

The item is sent into a queue of waiters. This is an optimisation for latency - the receiver doesn't need to wake the sender and then wait for the sender to wake, get polled, send the item, and then wake the receiver again.

How does this work with bounded channels? This sounds like the sender isn't woken until the item is received, but no one expects a channel to behave like that with nonzero capacity.

@Restioson
Copy link
Collaborator

Restioson commented Aug 17, 2022

Just to clarify, if the channel has capacity, send async will return in one poll. So, this behavior of waiting only applies to when there is no capacity in the channel.

When capacity is full, the sender goes into a queue of waiting senders, which stores its waker and the item. Then, when a receiver receives an item, there is now space in the channel, so the waiting sender is woken and the receiver puts that item on the channel. I think this was unclear in my original message, as I said "it may be received by a receiver at any point". This is better stated as it is pulled by a receiver from the pending queue into the channel's queue (an operation called pull_pending in the code).

By the way, please correct me if the code does not actually do this, but I think I'm reading it right here 🙂

no one expects a channel to behave like that with nonzero capacity.

It may seem slightly surprising at first, but I do not believe that this actually breaks any assumptions about bounded channels. The end result that the send will wait for capacity is the same.

@Restioson Restioson reopened this Aug 17, 2022
@Restioson
Copy link
Collaborator

Restioson commented Aug 17, 2022

Oops sorry, that was an accidental close 😅

@zesterer
Copy link
Owner

This is an interesting discussion and not something that has previous had much thought put into it. Given that we've pretty much "solved" (in the most primitive sense of the word) this issue on the receiving end for receiver slots, it might be viable to do something similar on the sending end, although I've not thought too much about that.

If anybody is interested in looking into this further, I'd be happy to review code / give advice, but I'm not sure I've got enough free time to work on it myself, at least for the next few months.

@abonander
Copy link

I think it's highly surprising that a send_async() operation can be cancelled but still go through, that could easily lead to some very subtle bugs. If nothing else, that needs to be clearly documented.

It also weakens the guarantees of having a bounded channel as, if I understand correctly, you could essentially do tx.send_async(item).now_or_never() in a loop to get unbounded behavior.

@Restioson
Copy link
Collaborator

It also weakens the guarantees of having a bounded channel as, if I understand correctly, you could essentially do tx.send_async(item).now_or_never() in a loop to get unbounded behavior.

This is not the case. When the future is dropped after now or never returns None, the waiter will be removed and the send will be canceled.

@Restioson
Copy link
Collaborator

I think it's highly surprising that a send_async() operation can be cancelled but still go through, that could easily lead to some very subtle bugs. If nothing else, that needs to be clearly documented.

I agree. It's something that should be documented or otherwise changed. As far as changing it goes, I may look into it in a successor to #84

@oblique
Copy link

oblique commented Sep 8, 2023

From my understanding recv_async is cancel safe. Am I correct? Can we document this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants