Add marl::DAG - a AoT declarative task graph #167

ben-clayton · 2020-07-08T14:37:14Z

No description provided.

benvanik · 2020-07-08T20:16:13Z

include/marl/dag.h

+      notify(ctx, node->outs[0]);
+      break;
+    default: {
+      // 2 or more dependees.


why not notify all here? wouldn't you want the fan-out to be able to start all dependees immediately?

why not notify all here

All are being notified, but concurrently.

wouldn't you want the fan-out to be able to start all dependees immediately

If immediately means "on this thread" - the reason is simple - notify() may call invoke(), blocking until the work is finished. We don't want to run all the dependees serially.

Ahh that makes sense!

Though notify may also return early in the case of a join and each dependee notify schedule() will be pinning a fiber to wait on the waitgroup. If the dependee notify doesn't return early and instead invokes then the waitgroup will also block - so if A -> B | C then the waitgroup won't complete until max(B, C) (which, if B and C are of vastly different durations will be bad).

Took me a while to understand what you're saying - yes - this is correct. In other words:

The wg.wait() will block until all dependees have entirely finished. There is the potential for a fiber switch for every node that fans out. The implementation to schedule, call notify() on this fiber, then wg.wait() is an attempt to avoid an unnecessary fiber switch. It uses an arbitrary dependee to directly call notify() on, and we could be smarter here, say picking the dependee with the most child nodes, but you're upping the complexity a bit.

then the waitgroup won't complete until max(B, C) (which, if B and C are of vastly different durations will be bad).

Why is it bad? The node has to wait for all dependees to finish, and the call to wait() will either be a no-op if the schedule()s have already finished, or fiber switch to go and help out with the tasks.

Thinking about this some more, I think I see what you're getting at. So I can do the dependee count decrement and test before the marl::schedule() call. That way we don't schedule a whole load of tiny tasks that just decrement an atomic for the fan in case.
I'll do this tomorrow morning.

Sorry, it took me awhile to formulate even that :P but I think you got it: trying to avoid creating additional no-op work (the schedule()s/done()s of non-ready work) but also trying to minimize lifetime of all tasks such that as few fibers are waiting as possible.

IIUC, I could have a chain of N nodes where by construction I end up with N-1 live fibers sitting at wg.wait() as a depth-first style notify-and-invoke-inline chain continues along. Since the contract for the DAG is really just that all dependees are notified and allowed to proceed and not that all dependees need to have completed execution that feels suboptimal. Mapping this to std::condvar, I'd want to perform a notify_all() here and then immediately return, not caring who was woken or how long they took to do their work (or whether they themselves performed additional notifications).

I think your suggestion may solve this (if the wg goes away), and exploits the fact that you know what you are notifying (other DAG nodes within the same DAG) and can use that information freely.

IIUC, I could have a chain of N nodes where by construction I end up with N-1 live fibers sitting at wg.wait() as a depth-first style notify-and-invoke-inline chain continues along. Since the contract for the DAG is really just that all dependees are notified and allowed to proceed and not that all dependees need to have completed execution that feels suboptimal.

Right - gotcha. You're spot on - I've added some follow up changes that should dramatically reduce the number of fiber switches. Please take a look.

I think your suggestion may solve this (if the wg goes away), and exploits the fact that you know what you are notifying (other DAG nodes within the same DAG) and can use that information freely.

So we still have one WaitGroup - but it is shared by the entire DAG execution.
DAG::run() still needs to wait until all the nodes have processed, and we definitely want to process fan-outs concurrently, so a synchronisation primitive is required. Fortunately, because each node does its work before kicking the children, and the node doesn't need wait on the children to finish (with exception of the root), we can drop the wg.wait() from each of the nodes.
We still need one call to wg.wait() in DAG::run() so we don't return from run() before the graph has fully processed.

This is much nicer. Thank you!

Looks awesome!

benvanik · 2020-07-08T20:17:54Z

src/dag_test.cpp

+  auto a0 = root.then([](Data& data) { data.push("A0"); });
+  auto a1 = root.then([](Data& data) { data.push("A1"); });
+
+  auto b = builder.node([](Data& data) { data.push("B"); }, {a0, a1});


this is really nice!

benvanik · 2020-07-08T20:19:15Z

include/marl/dag.h

+
+  // Run this node's work.
+  if (node->work) {
+    ctx->invoke(node->work);


what if the work enqueues tasks of its own? would the work just use a waitgroup/parallelize like normal (in which case invoke here wouldn't return until the join completed)? that seems pretty nice

Yup. The work function is free to schedule new tasks and wait on them if it so wishes. The contract is simple: once the work function returns, the DAG node is complete and the dependees will execute as soon as possible. If you want to run sub DAGs from a work function, you can.

AWoloszyn · 2020-07-10T19:37:31Z

include/marl/memory.h

@@ -203,7 +226,7 @@ std::shared_ptr<T> Allocator::make_shared(ARGS&&... args) {

  auto alloc = allocate(request);
  new (alloc.ptr) T(std::forward<ARGS>(args)...);
-  return std::shared_ptr<T>(reinterpret_cast<T*>(alloc.ptr), Deleter{this});
+  return std::shared_ptr<T>(reinterpret_cast<T*>(alloc.ptr), Deleter{this, 1});


I had not noticed this before but:
std::shared_ptr<> typically allocates more memory internally, so you may end up with a heap allocation that is not caught by your allocator. Not sure how much this matters, but it's something I noticed.

Great catch. Yes - I hadn't thought about this either, but you're right. Unfortunately there's no make_shared that takes an allocator. Grr - looks like I'm going to have to roll my own ref-counted types.

Mentioning #131 so this isn't forgotten.

@AWoloszyn just pointed me at std::allocate_shared which looks like it might do exactly what we need. Thanks Andrew!

ben-clayton force-pushed the dag branch 2 times, most recently from 18494da to 18b220b Compare July 8, 2020 16:03

ben-clayton mentioned this pull request Jul 8, 2020

Efficient queuing of tasks that start waiting? #165

Closed

benvanik reviewed Jul 8, 2020

View reviewed changes

ben-clayton force-pushed the dag branch from 749eaa7 to f23d659 Compare July 9, 2020 10:53

ben-clayton changed the title ~~WIP: Add marl::DAG - a AoT declarative task graph~~ Add marl::DAG - a AoT declarative task graph Jul 9, 2020

ben-clayton marked this pull request as ready for review July 9, 2020 11:15

ben-clayton requested review from amaiorano and AWoloszyn July 9, 2020 11:15

Add marl::DAG - a AoT declarative task graph

d6695af

ben-clayton force-pushed the dag branch from f23d659 to d6695af Compare July 10, 2020 18:47

AWoloszyn reviewed Jul 10, 2020

View reviewed changes

AWoloszyn approved these changes Jul 10, 2020

View reviewed changes

ben-clayton merged commit 3448974 into google:main Jul 10, 2020

ben-clayton deleted the dag branch July 10, 2020 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add marl::DAG - a AoT declarative task graph #167

Add marl::DAG - a AoT declarative task graph #167

ben-clayton commented Jul 8, 2020

benvanik Jul 8, 2020

ben-clayton Jul 8, 2020

benvanik Jul 8, 2020 •

edited

Loading

ben-clayton Jul 8, 2020

ben-clayton Jul 9, 2020 •

edited

Loading

benvanik Jul 9, 2020

ben-clayton Jul 9, 2020

benvanik Jul 9, 2020

benvanik Jul 8, 2020

benvanik Jul 8, 2020

ben-clayton Jul 8, 2020

AWoloszyn Jul 10, 2020

ben-clayton Jul 10, 2020

ben-clayton Jul 10, 2020

Add marl::DAG - a AoT declarative task graph #167

Add marl::DAG - a AoT declarative task graph #167

Conversation

ben-clayton commented Jul 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benvanik Jul 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ben-clayton Jul 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benvanik Jul 8, 2020 •

edited

Loading

ben-clayton Jul 9, 2020 •

edited

Loading