Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canceling/dropping an execution plan should cancel/drop all inner tasks #1103

Closed
crepererum opened this issue Oct 11, 2021 · 4 comments · Fixed by #1121
Closed

Canceling/dropping an execution plan should cancel/drop all inner tasks #1103

crepererum opened this issue Oct 11, 2021 · 4 comments · Fixed by #1121
Labels
bug Something isn't working

Comments

@crepererum
Copy link
Contributor

Describe the bug
Canceling futures in Rust is usually quite easy: just drop them. However DataFusion has a few places where it uses tokio::spawn to create new, independent execution flows and forgets to wire up the JoinHandles in a way that dropping the owning future/stream would also drop the sub-flow.

The sub-flow is usually connected using a futures::channel::oneshot::channel, so dropping the receiver side (the original future/stream) leads to error on the sender side (the sub-flow). Furthermore the sender side can test if the channel was cancelled. However this only works as long as the control flow on the sender side actually tries to send something. For long-running external IO (e.g. pulling data from an object store) or UDFs, this cancellation can be a bit late.

To Reproduce

  1. Create a ExecutionPlan that emit streams that are forever Pending
  2. Use this plan as an input for SortExec (for example, other operators also have this issue)
  3. Start collecting the output of SortExec, it will be Pending
  4. Drop the entire SortExec
  5. The input (the plan created in step 1) will never be dropped

Expected behavior
At least after a short while, the input should be dropped as well even when it is pending forever.

Additional context
This affects master commit 90a4c8435152841117c957a74ea4d046815d6c6a.

@crepererum crepererum added the bug Something isn't working label Oct 11, 2021
@crepererum
Copy link
Contributor Author

I'm actively working on this issue.

@crepererum
Copy link
Contributor Author

Ref #928 as it is at least somewhat related.

crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 11, 2021
@alamb alamb changed the title Canceling/dropping an exuction plan should cancel/drop all inner tasks Canceling/dropping an execution plan should cancel/drop all inner tasks Oct 11, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 12, 2021
alamb pushed a commit that referenced this issue Oct 12, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 13, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 13, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 13, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 13, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 14, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 14, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 14, 2021
Dandandan pushed a commit that referenced this issue Oct 14, 2021
…ergeExec`, `WindowAggExec` (#1112)

* allow `BlockingExec` to mock multiple partitions

* Clean up spawned task on `SortPreservingMergeExec` drop

Ref #1103.

* Clean up spawned task on `WindowAggExec` drop

Ref #1103.

* isolate common test code

* Clean up spawned task on `RepartitionExec` drop

Ref #1103.

* rename variables / fix copy+paste mistake

* introduce `RepartitionExecState` struct

* introduce `AbortOnDrop{Single,Many}` helpers
@alamb
Copy link
Contributor

alamb commented Oct 14, 2021

With the merge of #1112 is there anything left for this ticket?

@crepererum
Copy link
Contributor Author

Yes, a few more. Will open a PR later today.

crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 15, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 15, 2021
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Oct 15, 2021
As a side-effect, cancelation now works with all users of
`RecordBatchReceiverStream` (e.g. `ParquetExec`) but there the effect
should be slightly less important.

Ref apache#1103.
houqp pushed a commit that referenced this issue Oct 16, 2021
…xec`, `HashAggregateExec` (#1121)

* Clean up spawned task on `HashAggregateExec` drop

Ref #1103.

* Clean up spawned task on `CoalescePartitionsExec` drop

Ref #1103.

* Clean up spawned task on `AnalyzeExec` drop

As a side-effect, cancelation now works with all users of
`RecordBatchReceiverStream` (e.g. `ParquetExec`) but there the effect
should be slightly less important.

Ref #1103.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants