Skip to content

Commit

Permalink
Add timeout to UCXX generic operations (#2398)
Browse files Browse the repository at this point in the history
rapidsai/ucxx#238 introduced a new timeout argument for `registerGeneric{Pre,Post}` that can be used to prevent blocking indefinitely should there be no UCX worker progress wakeup events. This should also result in new RAFT packages with updated symbols.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2398
  • Loading branch information
pentschev authored Jul 25, 2024
1 parent e75bac6 commit 7bffdac
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions cpp/include/raft/comms/detail/std_comms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -307,13 +307,16 @@ class std_comms : public comms_iface {
bool restart = false; // resets the timeout when any progress was made

if (worker->isProgressThreadRunning()) {
// Wait for a UCXX progress thread roundtrip
// Wait for a UCXX progress thread roundtrip, prevent waiting for longer
// than 10ms for each operation, will retry in next iteration.
ucxx::utils::CallbackNotifier callbackNotifierPre{};
worker->registerGenericPre([&callbackNotifierPre]() { callbackNotifierPre.set(); });
worker->registerGenericPre([&callbackNotifierPre]() { callbackNotifierPre.set(); },
10000000 /* 10ms */);
callbackNotifierPre.wait();

ucxx::utils::CallbackNotifier callbackNotifierPost{};
worker->registerGenericPost([&callbackNotifierPost]() { callbackNotifierPost.set(); });
worker->registerGenericPost([&callbackNotifierPost]() { callbackNotifierPost.set(); },
10000000 /* 10ms */);
callbackNotifierPost.wait();
} else {
// Causes UCXX to progress through the send/recv message queue
Expand Down

0 comments on commit 7bffdac

Please sign in to comment.