Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to BinaryHeap for efficiently modifying the greatest element #13

Closed
bsteinb opened this issue Feb 14, 2016 · 7 comments
Closed

Comments

@bsteinb
Copy link

bsteinb commented Feb 14, 2016

Hello container enthusiasts.

I have recently been working on a K-Way merge for the itertools crate which internally uses BinaryHeap from std (rust-itertools/itertools#97). Using BinaryHeap's interface as it is, the merge algorithm goes through a lot of pop() -> modify -> optional push() sequences. This is similar to the pop() -> modify -> push() sequence that a decrease-key/increase-key operation can be mapped to. Both sequences can be optimised by allowing (safe) access to the first element of the heap for inspection/modification/removal and only restoring the heap invariant once the operation is complete (unlike pop() -> modify -> push() which has to sift twice).

Inspired by @frankmcsherry's optimised implementation of the algorithm using an ad-hoc heap, I have set out to capture the essence of the optimised access pattern in an interface that does not require access to the internals of the heap data structure. I have added them to a clone of the binary_heap module.

The experiment currently contains two sets of interfaces. One that uses closures for the modify part (or the modify -> optional push() part) and is similar to the poke() interface proposed by @llogiq. I have named the functions pop_push() and pop_and_then_push() which offer two variants of the optimised pop() -> modify -> optional push() sequence and peek_mut() for the optimised pop() -> modify -> push() sequence by way of a mutable reference.

The other interface should cover the same operations but is based on wrapper types that provide access to the greatest element and restore the heap invariant in drop() (similar to Vecs Drain wrapper). The wrapper types are Head which allows pop() -> modify -> optional push() and HeadRef for pop() -> modify -> push().

While the closure based interface is safer (no wrappers that might be leaked), the wrapper interface feels more rustic (just like Iterators and Drain, I guess). The implementations make use of unsafe code for a performance improvement of roughly 2 x in my experiments based on the K-way merge which operate on small heaps.

At the moment, neither interface has the ability to modify any element but the greatest as was discussed in #11.

I now turn to you for feedback on my design. Does the interface seem sensible? Is this something that might be considered for inclusion in std? I know it significantly adds to the surface of BinaryHeaps interface for just an optimisation, but on the other hand it could be used to either re-implement some of the methods (e.g. pop() and replace()) or altogether subsume them.

@Eh2406
Copy link

Eh2406 commented Jul 17, 2016

Just a random programmer from the interwebs, so take my 0.02 with as much salt as you want.

The Head api sounds to me kind of like the Entry api, and seems good to me.

Ok, now for my crazy thoughts. Add the ability for the heap to be in a broken state.
If the heap is in correct form, then pop returns the max item and leaves a hole in the heap.
If the heap has a hole, then pop fixes the hole and then returns the max item and leaves a hole in the heap.
If the heap is in correct form, then push dose the usual.
If the heap has a hole, then push replaces the hole with the new item and fixes the invariant.

I think this naturally optimizes pop then push without new syntax. Is this crazy?

@llogiq
Copy link

llogiq commented Jul 17, 2016

I think it could work with a kind of Entry-like API that mutably borrows the heap, mediating all accesses to it while in an inconsistent state. However, this would hinge on drop() being called which is not guaranteed, so we'd need to poison the heap (e.g. setting size=0 or better some invalid value) while in an invalid state, and restore it on return.

@sfackler
Copy link

We've added this to the standard library's BinaryHeap: https://doc.rust-lang.org/beta/std/collections/struct.BinaryHeap.html#method.peek_mut. It's up for stabilization in 1.12.

@bsteinb bsteinb closed this as completed Jul 17, 2016
@llogiq
Copy link

llogiq commented Jul 17, 2016

@sfackler apparently this does neither poison nor leak-amplify.

@sfackler
Copy link

It doesn't need to. The worst that happens if the guard is leaked is that the heap is improperly ordered, which can already happen if an Ord method panics. Things accessing the heap after that happens may see improperly ordered values, but there are no safety issues.

@llogiq
Copy link

llogiq commented Jul 17, 2016

That is true. However, it means the operation does not fail fast, but leaves the heap in an inconsistent state, which means subsequent operations can silently return wrong values. This is a quite unfortunate error state, if you ask me. Poisoning – perhaps at least in debug mode – would be a preferable solution if you ask me.

@sfackler
Copy link

I'm not aware of a context in which someone would end up accidentally leaking the guard, so it doesn't really seem like something worth investing all that much effort in. I don't believe any other guards poison themselves after a leak (e.g. Vec::drain). They in general do the easiest thing that avoids memory corruption etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants