-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an algorithm to make this crate no_std, take 2 #33
Conversation
unsafe { | ||
// Initialize the node. | ||
node_ptr.write(Node::new(listener)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't .assume_init()
used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bypass it by just casting node.get()
into an *mut Node
, which allows me to write the Node
straight into the slot without worrying about MaybeUninit
.
/// | ||
/// This is kept synchronized by the `state` variable; therefore, it's | ||
/// technically `Sync`. | ||
waker: Cell<Wakeup>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make more sense to use UnsafeCell
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally used UnsafeCell
, but given that most of the semantics involved are just "replace the waker
with something else", I ended up using Cell
instead.
I tried replacing it with UnsafeCell<MaybeUninit<Wakeup>>
(and also getting rid of the Wakeup::None
variant), but any gains are minimal at best. I figure it's better to just use Cell
and avoid the unsafe code.
src/queue.rs
Outdated
/// Orphan a node in the queue. | ||
/// | ||
/// The return value is Some if the node was notified, and true is the notification | ||
/// was an additional notification. | ||
pub(crate) unsafe fn orphan(&self, node: NonNull<Node>) -> Option<bool> { | ||
let (needs_drop, notified) = unsafe { node.as_ref().listener.orphan() }; | ||
|
||
// If we need to deallocate the node, do so. | ||
if needs_drop { | ||
unsafe { self.dealloc(node) }; | ||
} | ||
|
||
notified | ||
} | ||
|
||
/// Allocate a new node for a listener. | ||
fn alloc(&self, listener: Listener) -> NonNull<Node> { | ||
// Try to install a cached node. | ||
if !self.cache.occupied.swap(true, Ordering::Acquire) { | ||
// We can now initialize the node. | ||
let node_ptr = self.cache.node.get() as *mut Node; | ||
|
||
unsafe { | ||
// Initialize the node. | ||
node_ptr.write(Node::new(listener)); | ||
|
||
// Return the node. | ||
NonNull::new_unchecked(node_ptr) | ||
} | ||
} else { | ||
// We failed to install a cached node, so we need to allocate a new | ||
// one. | ||
unsafe { NonNull::new_unchecked(Box::into_raw(Box::new(Node::new(listener)))) } | ||
} | ||
} | ||
|
||
/// Deallocate a node. | ||
unsafe fn dealloc(&self, node: NonNull<Node>) { | ||
// Is this node the current cache node? | ||
if ptr::eq(self.cache.node.get() as *const Node, node.as_ptr()) { | ||
// We can now clear the cache. | ||
unsafe { (self.cache.node.get() as *mut Node).drop_in_place() }; | ||
self.cache.occupied.store(false, Ordering::Release); | ||
} else { | ||
// We need to deallocate the node on the heap. | ||
unsafe { Box::from_raw(node.as_ptr()) }; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these methods (alloc
, dealloc
, orphan
) appear to only act on the CachedNode
and imo should be moved into an impl
block associated with that. (and if these methods are still necessary, should be replaced with forwarding methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved alloc
and dealloc
, but orphan
is, semantically, an important operation on the Queue
so I've kept it there.
|
||
/// A wrapper around an `Arc` that is initialized atomically in a racy manner. | ||
#[derive(Default)] | ||
pub(crate) struct RacyArc<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big is the difference between this and OnceCell
impls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two main differences:
1). While the once cells are synchronized, this one is racy. While the once cells only run one closure at a time, this one runs all closures at once and sets the value to whichever one comes first.
2). It's only one system word in length, while once cells are usually at least three.
In addition, since our once cells are backed by this crate, we can't use once cells here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI, the once_cell crate also provides a racy "OnceCell": https://docs.rs/once_cell/latest/once_cell/race/struct.OnceNonZeroUsize.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to avoid the once_cell
crate due to its recent MSRV bump, see smol-rs/async-io#93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to ask about the performance of this, because in the past I implemented https://github.com/zseri/revenq,which I originally based upon a similar pattern and hit a huge perf bottleneck. Thus I ask: How much does this get "hammered" during its lifetime or the lifetime of the chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After initialization (which is a one-time process), loads from the AtomicPtr
are a simple atomic load. This is opposed to a OnceCell
, which consists of a load as well. I doubt that there's much overhead aside from at initialization time. At worst, this leads to the initialization code being run once per thread.
I could replace it with the proper cell to see what happens, but I doubt it would make a difference. Besides, this is what this code used prior to this change.
src/queue.rs
Outdated
// We can now initialize the node. | ||
let node_ptr = self.node.get() as *mut Node; | ||
|
||
unsafe { | ||
// Initialize the node. | ||
node_ptr.write(Node::new(listener)); | ||
|
||
// Return the node. | ||
NonNull::new_unchecked(node_ptr) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this part at all, especially since it appears to misuse MaybeUninit
and/or UnsafeCell
somewhat. how about:
unsafe {
let node_ptr: &mut MaybeUninit<_> = &mut *self.node.get();
// Initialize the node
node_ptr.write(Node::new(listener));
// Return the node
node_ptr.assume_init_mut().into()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced the node cast to an as_mut_ptr()
usage. Unfortunately I can't use MaybeUninit::write()
and MaybeUninit::assume_init_mut()
since they were introduced in Rust 1.55, which would lead to a significant MSRV bump
Closing in favor of #34 |
I'm not satisfied with the direction #30 has taken, so I've written this PR. It's less destructive, less verbose, and faster than the previous one. I haven't implemented
portable_atomic
orloom
support here, I'll do that in future PRs.In this PR, rather than using a
Mutex
to wrap an unsync linked list (what's currently onmaster
) or usingConcurrentQueue
(as in #30), I use an atomic singly linked list to store listeners. Similarly to #30, listeners in this list can be marked as "orphaned" to indicate that they should not be notified (e.g. when theEventListener
is dropped). As opposed to #30, it only exhibits a 20% slowdown as compared tomaster
on my PC. With the increased number of atomic operations involved, I think that that is about as low as we can go without using a locking mechanism of some kind.Future work:
portable_atomic
andloom
support.RacyArc
because we may be able to split it off into another crate, the interior mutability inListener
can be resolved usingcrossbeam_utils::AtomicCell<Wakeup>
or similar, and the linked list could probably be emulated usingAtomicCell
andArc
as well. The main concern is that we would have to not use the cache inListenerQueue
, and this (among other things) would likely cause significant performance issues (Arc
is expensive).master
branch, but replace theMutex
with a spinlock.notify
andlisten
both try to access the spinlock. Ifnotify
sees the spinlock active, it places the notification counts in an atomic variable and whoever holds the spinlock preforms the notification once they're about the drop the lock. Iflisten
(or any ofEventListener
's methods) see the spinlock locked, they put theirWaker
s (orUnparker
s) into a short-termConcurrentQueue
which is then used to set up listeners once the current holder tries to drop the lock. This may be a better way of doing it, but I don't know whether it would be faster or slower.