fix(hub): avoid deadlocks when emitting events #633
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In a very specific combination of events, any event emission can deadlock if it tries to add stuff to the Hub, like breadcrumbs. This can be easily reproduced with the following setup:
sentry-log
(orsentry-tracing
iftracing-log
is set up)Should the event fail to be sent for too long, the channel to the sender thread fills up and events start to be dropped. This will generate a debug log line, logging that an event has been dropped and why. With
sentry-log
(ortracing-log
+sentry-tracing
), this would generate a breadcrumb.However, the whole call to
Transport::send_envelope
is done in the context ofHubImpl::with
, that holds a read lock on theHubImpl
's stack. When generating the breadcrumb for the debug line, we end up callingHub::add_breadcrumb
, which callsHubImpl::with_mut
to get a mutable reference to the top of the stack. However, since we already have a read lock on the stack, we deadlock on the write lock.The fix is to move the call to
Transport::send_envelope
outside of the lock zone, and we useHubImpl::with
only to clone the topStackLayer
. Since this structure is onlyArc
s, the only performance hit is twoArc
clones and not the whole stack cloning.We hit this in prod because (for reasons) we had to backport the legacy pre-envelope transport, and this one uses the warning level when the channel is full. The default log filter is safe from this deadlock because debug events are dropped, but anyone enabling at least breadcrumbs for debug events are exposed to it.
Thanks!