-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock when using Family::get_or_create
#231
Comments
Agreed that this is not intuitive (i.e. a footgun). That said, I am not sure what to do about it. Cloning a metric might be expensive. Thus internally on every |
Some of the metrics are "cheap" to clone (Arc/etc.). There's a few options I can think of to accommodate those but most would require specialization in some form. I wonder if, alternatively, there's a way to move the mutex up one level? Something (very roughly) like this: let guard = family.lock();
Metrics {
inbound: guard.get_or_create(&vec!["dir", "inbound"]).clone(),
outbound: guard.get_or_create(&vec!["dir", "outbound"]).clone()
} It does have some downsides of its own where it holds a write lock slightly longer than it strictly needs to, but this feels like an API that's harder to misuse when multiple metrics are being extracted. I imagine this could either replace the existing |
I've also run into this footgun myself. I am sympathetic to the idea that calling It's worth noting that Taking a look inside of the library, we have metrics definitions that look like this:
These types do already offer The standard library's implementation of The remaining metric type is // info.rs
#[derive(Debug)]
pub struct Info<S>(pub(crate) S); The OpenMetrics specification states:
If these SHOULD NOT change, then they won't be set more than once, and implicit cloning is thus out of the question. They won't be the metric type As for the remaining metric types, My understanding is that this library is intentionally designed to allow consumers to provide their own implementations of counters, histograms, and gauges, so long as they implement I agree with @sfleen's suggestion that we directly return the // family.rs
impl<S: Clone + std::hash::Hash + Eq, M, C: MetricConstructor<M>> Family<S, M, C> {
pub fn get(&self, label_set: &S) -> Option<M> {
// ...
}
pub fn get_or_create(&self, label_set: &S) -> M {
// ...
}
pub fn get_ref(&self, label_set: &S) -> MappedRwLockReadGuard<M> {
// ...
}
}
I'll cross-reference #130 and #230, which both propose a Would this be a welcome change @mxinden? I'll note that this would be a breaking change, but preventing deadlocks feels like a worthwhile design goal for this library. |
Hello @mxinden! I wanted to ask if you had thoughts on the proposal above. I would be happy to drive this work myself, but don't want to spend time writing that if it would not be a welcome PR. |
Thank you @sfleen and @cratelyn for the thorough input! Much appreciated.
For the sake of completeness, only on first call, not on the return-early get path. client_rust/src/metrics/family.rs Lines 237 to 239 in 12923ca
When writing the implementation we have today, I only had the ad-hoc access in mind, e.g. calling
Not off the top of my head.
Definitely interested fixing this footgun.
Instead of a breaking change, can we start off with a non-breaking change, e.g. by adding Thouhgts? @sfleen @cratelyn would either of you be willing to write a prototype pull request? (We could also explore reentrace-mutexes. That said, I find the current |
@mxinden i'd be happy to write a prototype pull request! i'll likely get to this early next week. |
fixes prometheus#231. this introduces a new method to `Family<S, M, C>` to help alleviate the risk of deadlocks when accessing multiple series within a given metrics family. this returns a plain `M`, rather than the `MappedRwLockReadGuard<M>` RAII guard returned by `get_or_create()`. a test case is introduced in this commit to demonstrate that structures accessing multiple series within a single expression will not accidentally create a deadlock. Signed-off-by: katelyn martin <[email protected]>
thanks for your patience! i have opened #244 to provide an accessor that will clone the metric |
fixes prometheus#231. this introduces a new method to `Family<S, M, C>` to help alleviate the risk of deadlocks when accessing multiple series within a given metrics family. this returns a plain `M`, rather than the `MappedRwLockReadGuard<M>` RAII guard returned by `get_or_create()`. a test case is introduced in this commit to demonstrate that structures accessing multiple series within a single expression will not accidentally create a deadlock. Signed-off-by: katelyn martin <[email protected]>
fixes prometheus#231. this introduces a new method to `Family<S, M, C>` to help alleviate the risk of deadlocks when accessing multiple series within a given metrics family. this returns a plain `M`, rather than the `MappedRwLockReadGuard<M>` RAII guard returned by `get_or_create()`. a test case is introduced in this commit to demonstrate that structures accessing multiple series within a single expression will not accidentally create a deadlock. Signed-off-by: katelyn martin <[email protected]>
fixes prometheus#231. this introduces a new method to `Family<S, M, C>` to help alleviate the risk of deadlocks when accessing multiple series within a given metrics family. this returns a plain `M`, rather than the `MappedRwLockReadGuard<M>` RAII guard returned by `get_or_create()`. a test case is introduced in this commit to demonstrate that structures accessing multiple series within a single expression will not accidentally create a deadlock. Signed-off-by: katelyn martin <[email protected]>
It's fairly common practice for us to create a set of metrics from a metric family given a set of labels, and store them for later use. However, this can cause a deadlock when
Family::get_or_create
is called multiple times on the same instance within the same statement:This is due to the
RwLockReadGuard
returned by each call toFamily::get_or_create
being temporaries that are not dropped until the end of the statement.The user-facing solution is to create separate bindings for each metric:
However, this is a footgun that can be pretty tricky to debug. Is there an alternative API here that would work, perhaps just returning the metric directly instead of a mutex guard?
The text was updated successfully, but these errors were encountered: