-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust is missing an advanced heap #11
Comments
Note that I posit this is impossible to overcome. This is a fundamental issue of data trust, and no mechanism I am aware of could possibly prevent having to validate the tokens (and in turn, maintaining metadata to make it possible to validate the tokens). Perhaps there's a fundamentally different approach, though? The pairing heap could have an intrusive map, so you just do lookup with actual keys, but this is definitely way more overhead than necessary. Note that this kind of heap is generally an optimization to begin with, so too much overhead negates even using it. |
I think a way of handling this is to allocate the elements of the queue "manually" by just storing them in a big vec, and refer to them using indices internally. The elements are allocated in a big vec (which can grow dynamically, indices remain valid!). Each "slot" of this array also has a u64 "version number". Whenever an element is popped, the slot in the big array is cleared, and the corresponding version number is incremented. Tokens contain an index to a slot in the element array, as well as the current version number. decrease_key can thus fail, if the token refers to un "unallocated" element in the array, or if the version number isn't correct. Note that this is essentially just implementing manual memory control over an array and using indices, but the upside is that it needs no unsafe code an won't corrupt the global program heap or the runtime or anything. At worst the data structure fails. But yes, this does mean runtime validation of tokens. |
What are the required complexities of the desired operations? In the persistent data structure world, I use a priority search queue when needing to modify priorities, but that not may be best for Rust. |
It seems like this is a good use case for just making the data structure intrusive (as opposed to storing extra intrusive metadata). Then removal from the structure doesn't actually free the associated memory. |
Would a BTreeMap work here? You get logarithmic insert, delete, update, and find_min. It's also cache friendly. It would be nice to see a comparison of heap performance on modern hardware, but I'm having trouble finding one. |
For reference, the persistent tree-based priority search queue:
|
Algorithmic complexity is notably total nonsense for high-performance heaps. A fibonacci is the theoretical best heap: https://en.wikipedia.org/wiki/Fibonacci_heap
A pairing heap meanwhile both has non-tight bounds, and the lower bounds it does have make it theoretically worse than a fibonacci heap. However in practice the hidden constants on a fibonacci heap are attrocious, and a pairing heap is supposed to be better. |
So mostly I care about actual perf. a simple benchmark would be "do dijkstras as fast as possible". |
Yes, I only care about actual performance also. Ironically, I just tweeted today on the pointlessness of FIbonacci heaps :-) https://twitter.com/franklinchen/status/644894021144461312 |
IME a D-ary heap organized as a complete D-ary tree encoded in an array (implemented like a binary heap, but with better cache locality.. so child pointers are implicit etc.) beats all of the fancy heaps in practice. Benchmark to find the best D (possibly depends on element size). It doesn't have the best time complexity for many operations, but the constant factors are pretty minimal so you'd have to be doing something pretty out of the ordinary for it to be beaten by more complex option. |
@ssylvan How does one provide decrease_key that isn't |
Perhaps unrelated, but I have a binary heap implementation in Java (unfortunately belonging to my employer, so I can't show code) that allows to change the first element without allocation, re-shuffling the heap as necessary (I use this for a fast "first N out of X" where X ≫ N. It uses a bounded array for data storage. This interface could be made quite elegant in rust, e.g. |
@gankro Reduce priority value, then swap with parent if necessary to maintain heap invariant (repeat until you're at root, or the element is in the right place). That's O(log_d n), where d is the branching factor (i.e. higher branching factor means you don't need to do a lot of work for even huge heaps.. e.g. a billion elements for d=8 is only 10 levels deep). |
@ssylvan Note that the constant factors grow with the branching factor, also as your d approaches n, our operations become essentially linear. |
@ssylvan I don't understand how you're finding the element. External users of the data structure can't just have pointers/indices, because those are constantly shifting without a node-based structure. |
@gankro You'd need to have some kind of "stable handle" just like in the other heaps. The handle would just have an index into the array (and possibly a version number, for validation). The heap element in turn would have a pointer (or index) to the stable handle and update where the handle points whenever the element moves. This is all O(1). |
@llogiq Yes. Greater branching factor means a faster decrease_key (because fewer levels), but more expensive delete_min, etc. That said, the actual operation you care about is cache misses, not comparisons, so you can pick a branching factor equal to cache_line_size/sizeof(T) without really affecting constant factors. |
A container that provides a stable handles to the elements stored inside is It is interesting to ask then what kind of data structure could play that role
Such a container should in general provide an interface supporting following impl<T> Container<T> {
...
pub fn push(&mut self, elem: T) -> Handle;
pub fn remove(&mut self, handle: Handle) -> T;
pub fn get(&self, handle: Handle) -> &T;
pub fn get_mut(&mut self, handle: Handle) -> &mut T;
...
} Looking at existing design space, there seem to be two major approaches Array with indexes as handlesThe simplest design would be to store elements inside a vector and make handles pub struct Container<T> {
data: Vec<Entry<T>>,
/// Head of free-list.
free: usize,
/// Number of occupied entries.
size: usize,
}
enum Entry<T> {
/// Empty space, with an index of next free entry.
Vacant(usize)
/// Space used by an element.
Occupied(T)
}
pub struct Handle(usize) To validate an untrusted handle you would:
This is essentially a design used by Carl's slab allocator, and what Sebastian suggested Note that this only validates handle as far as memory safety is required. It Reference counted pointersAnother idea is to used reference counted pointers as suggested by Gankro pub struct Container<T> {
head / tail: ... linked list of nodes ...
nonce: Box<Nonce>,
}
pub struct Handle<T> {
elem: Weak<Node<T>>
nonce: *const Nonce
} To validate an untrusted handle you would:
The check that handle is from current container seems to be crucial if you want This addresses the case of elements already removed from list, case of handle |
So I wrote https://github.com/apopiak/hollow_heap and I'd like to get a code review/know how to polish it so it might be a contender for contain-rs. |
One that lets you modify the priority of keys.
My academic background tells me that a pairing_heap is excellent, but I've also heard good things of a meldable heap?
Of note is a nasty API issue around how you specify that a key is to be reprioritized.
My intuition is that
push
returns a token, and then youincrease_key(&token, new_weight)
or whatever. The token is a pointer to the node that contains the element, but this has a fundamental memory safety issue (the element associated with the token may have been popped).One solution is to just make decrease/increase unsafe.
The other is to use Rcs internally, and have tokens be a Weak.
Neither is super great, honestly.
The text was updated successfully, but these errors were encountered: