refactor: use listen and trigger universally #164

enigbe · 2024-01-27T07:47:37Z

What this PR does

Listens for shutdown trigger across all channels
Trigger shutdown when tasks exit with error
Documentation update

Related Issue(s)

Resolves Robustness: Use Listen and Shutdown Trigger Universally #162

Notes

Branched off refactor-dispatch given its advanced state of review.

carlaKC

Thanks for the PR! A few structural/stylistic comments on first pass.

One high level question that I'm wondering about is whether we can simplify this by always triggering shutdown whenever we break one of the loop/select patterns in the code Right now we have to inline on shutdown, but if the expected behavior is to break the loop on error we could just have a single trigger on break? Perhaps with the exception of listener triggering, because we know we can just return there.

Update: see comment below.

I think that could lead to simpler understanding of the shutdown logic (which is a nightmare right now), but I haven't looked at whether it works in every instance across the codebase. Interested to hear thoughts on this from others!

.gitignore

sim-lib/Cargo.toml

Cargo.lock

sim-lib/src/lib.rs

carlaKC · 2024-01-31T13:47:15Z

Discussed this PR a little more offline. Another option for a cleaner/more testable solution would be:

Refactor to return Result from functions that can trigger shutdown
Call trigger at spawn site rather than inside of the function

This saves us from having to pass shutdown all the way down to every task, and cuts down on the number of places where we need to call trigger. Also has the benefit of making some of these functions more testable, because we can assert on return values.

Eg, for our simulation results task:

        tasks.spawn(async move {
            if let Err(e) =
                produce_simulation_results(nodes, output_receiver, results_sender, listener_results)
                    .await
            {
                shutdown.trigger();
                log::error!("produce simulation results exited with error: {e:?}.");
            }
        });

carlaKC · 2024-02-02T16:19:51Z

Can be rebased on #160 and sincerest apolgies in advance for all the rebase conflicts :')

enigbe · 2024-02-05T13:41:06Z

Discussed this PR a little more offline. Another option for a cleaner/more testable solution would be:
* Refactor to return `Result` from functions that can trigger shutdown

* Call `trigger` at spawn site rather than inside of the function
This saves us from having to pass shutdown all the way down to every task, and cuts down on the number of places where we need to call trigger. Also has the benefit of making some of these functions more testable, because we can assert on return values.

Eg, for our simulation results task:
        tasks.spawn(async move {
            if let Err(e) =
                produce_simulation_results(nodes, output_receiver, results_sender, listener_results)
                    .await
            {
                shutdown.trigger();
                log::error!("produce simulation results exited with error: {e:?}.");
            }
        });

I have refactored the handling of errors across all tasks. This reduces the number of places trigger() can be called and streamlines the logic. However, I think we can further reduce the call to trigger() to just one location.

From the snippet from sim-lib/lib.rs::run below, we await the completion of all tasks in the join set tasks. The error from the first task failure can be propagated until it gets here, in which we call trigger() (and break out of the loop) to shut down all listening tasks in the set.

        while let Some(res) = tasks.join_next().await {
            if let Err(e) = res {
                // log::error!("Task exited with error: {e}.");
                // success = false;
               self.shutdown();
               break;
            }
        }

This would mean we have to have another loop waiting for all tasks to exit (post trigger()) to have a graceful shutdown. Uncertain if this is a good idea/approach and happy to get thoughts on this.

carlaKC

vrrynice. Just some annoying nitpicking about logging and breaking from me, I think this is almost there!

Only major comment is about cleaning up the result writer functions a bit, but that's pre-existing.

sim-lib/Cargo.toml

.gitignore

carlaKC · 2024-02-08T18:31:54Z

docs/ARCHITECTURE.md

+1. [Triggered](https://docs.rs/triggered/latest/triggered): a `Trigger`
+   that can be used to inform threads that it's time to shut down, and
+   a `Listener` that propagates this signal.
+2. The (`Trigger`, `Listener`) pair are used with channels: if a channel errors out across `send()` or `receive()`, shutdown is triggered. There is no reliance on channel mechanics, i.e. errors generated when all senders are and/or a receiver is dropped.


I would clarify this to note that our channels don't have buffers, so we have to select on the shutdown signal on send/receive to make sure that a receiver exiting before a sender does doesn't indefinitely hang.

Ie, this scenario:

Task 1: Sending into sender
Task 2: Receiving on receiver

Task 1 sends into sender, unblocks due to buffer size of 1 that we use everywhere

Task 2 errors out before consuming from receiver

Task 1 wants to send into sender again, but can't because the receiver has shut down

If we always select on exit, then we don't run into this.

docs/ARCHITECTURE.md

sim-lib/src/lib.rs

carlaKC

Thanks for addressing comments, think this only needs one more round!

Remaining comments are really about logging consistency - if we're logging an exit error at a function's call site there's not need to also log on error return (we'll double log). Would also like to have all the starting/stopping logs moved into the spawn as well.

docs/ARCHITECTURE.md

sim-lib/src/lib.rs

carlaKC · 2024-02-27T20:19:16Z

sim-lib/src/lib.rs

+                                    set.spawn(track_payment_result(
+                                        source_node.clone(), results.clone(), payment, listener.clone()
+                                    ));


Should we handle errors returned by track_payment_result and trigger shutdown here?

Yes, we should. I refactored produce_simulation_results to include an additional branch to wait on concurrently. Within this branch, we propagate any track_payment_result error to produce_simulation_results and trigger shutdown at the latter's spawn site.

Nice, realize that we weren't actually waiting on that set at all before this ☠️

This method is an interesting one (/different to our other ones) because it has its own set of tasks that it should wait on. As is, if we get the shutdown listener signal we won't wait for all the spawned payment tracking tasks to complete (which is messy).

Don't need to update in this PR, let's gettit in, but note to self to create an issue/fix this up in future!

I've taken note of this and will create an issue for better handling of exits on all tasks spawned in set.

sim-lib/src/lib.rs

carlaKC

alrightalrightalright!

Actually, actually last comments:

We still need to flush our batched writer to disk on shutdown
Take a look at line wrapping at 120 (I think that a few places are over / feel free to shorten error messagse)

Otherwise, you can go ahead and squash the fixups and we'll merge. Nice stuff 🏅

sim-lib/src/lib.rs

carlaKC · 2024-03-02T20:24:03Z

sim-lib/src/lib.rs

+                                    set.spawn(track_payment_result(
+                                        source_node.clone(), results.clone(), payment, listener.clone()
+                                    ));


Nice, realize that we weren't actually waiting on that set at all before this ☠️

This method is an interesting one (/different to our other ones) because it has its own set of tasks that it should wait on. As is, if we get the shutdown listener signal we won't wait for all the spawned payment tracking tasks to complete (which is messy).

Don't need to update in this PR, let's gettit in, but note to self to create an issue/fix this up in future!

carlaKC · 2024-03-11T16:48:59Z

Testing this and it looks like TrackPayment doesn't run anymore and the simulator hangs on shutdown.

Will take a look at the code, but iirc this is a regression since my last review (I tested last time and this was fine).

enigbe · 2024-03-11T16:54:58Z

That's weird. Taking another look right now.

carlaKC · 2024-03-12T20:10:08Z

sim-lib/src/lib.rs

-                        log::error!("Event consumer exited with error: {e:?}.");
-                    },
+                let consume_event_node = ce_node.clone();
+                let node_guard = ce_node.lock().await;


This is deadlocking with the lock in consume_events!

lol I hate programming, can fix this with:

tasks.spawn(async move { let node_info = ce_node.lock().await.get_info().clone(); log::debug!("Starting events consumer for {}.", node_info); if let Err(e) = consume_events(ce_node, receiver, ce_output_sender, ce_listener).await { ce_shutdown.trigger(); log::error!("Event consumer exited with error: {e:?}."); } else { log::debug!("Event consumer for node {node_info} received shutdown signal."); } });

Problem is that the node_guard lock doesn't get dropped until after consume_events (because we're still borrowing node_info in the log). Had to crack out le old rust book for that one.

- additionally, remove every `unwrap()` call that could panic, replacing with error propagation and/or context with `expect()` - return Result<(), SimulationError> for all spawned tasks - handles triggering shutdown at call site for spawned tasks - move starting/stopping logs to spawn site

carlaKC

tACK at 5aa5f65.

Some minor comments that can be addressed in a followup if desired.

carlaKC · 2024-03-14T19:16:50Z

sim-lib/src/lib.rs

    }

-    log::trace!("Payment result tracker exiting.");
+    log::trace!("Result tracking complete. Payment result tracker exiting.");


nit: logging at call site, not in function as with others.

carlaKC · 2024-03-14T19:17:47Z

sim-lib/src/lib.rs

@@ -985,12 +1045,14 @@ impl Display for PaymentResultLogger {
    }
 }

+/// Reports a summary of payment results at a duration specified by `interval`
+/// Note that `run_results_logger` does not error in any way, thus it has no
+/// trigger. It listens for triggers to ensure clean exit.


nit: line wrap at 120

carlaKC · 2024-03-14T19:19:04Z

sim-lib/src/lib.rs

+                writer.map_or(Ok(()), |(ref mut w, _)| w.flush().map_err(|_| {
+                    SimulationError::FileError
+                }))?;
+                return Ok(());


Don't need ? + return Ok(()) -> can just return writer.map_or...

carlaKC · 2024-03-14T19:20:49Z

.gitignore

nit: newline still here - might be editor automatically adding it

carlaKC · 2024-03-14T19:25:07Z

sim-lib/src/lib.rs

+                let source = executor.source_info.clone();
+
+                log::info!(
+                    "Starting activity producer for {}: {}.",
+                    source,


nit: can just inline + clone because we don't have any locks here

carlaKC · 2024-03-14T19:47:09Z

sim-lib/src/lib.rs

+                    None => return Ok(())
+                }
+            },
+            track_payment = set.join_next() => {


nit: add a TODO explaining that we're not going to wait for all tasks to exit, just so we don't forget

enigbe requested review from sr-gi, okjodom and carlaKC January 27, 2024 07:47

enigbe self-assigned this Jan 27, 2024

carlaKC reviewed Jan 29, 2024

View reviewed changes

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 26a6378 to a86e3f6 Compare February 5, 2024 13:05

enigbe marked this pull request as ready for review February 5, 2024 13:41

enigbe requested a review from carlaKC February 5, 2024 13:41

carlaKC reviewed Feb 8, 2024

View reviewed changes

enigbe mentioned this pull request Feb 13, 2024

chore: add rustfmt and clippy #167

Merged

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 452d1da to d8ff458 Compare February 24, 2024 07:58

enigbe requested a review from carlaKC February 27, 2024 14:06

carlaKC reviewed Feb 27, 2024

View reviewed changes

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 9a8a3c8 to b84ce8d Compare February 29, 2024 11:33

enigbe requested a review from carlaKC March 1, 2024 10:36

carlaKC reviewed Mar 2, 2024

View reviewed changes

carlaKC mentioned this pull request Mar 2, 2024

Add start/duration options to the defined activity configuration #168

Closed

enigbe mentioned this pull request Mar 11, 2024

feat: add simulation config #166

Closed

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 3ddaef0 to d0be6ec Compare March 11, 2024 16:36

enigbe requested a review from carlaKC March 11, 2024 16:36

carlaKC reviewed Mar 12, 2024

View reviewed changes

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 581eab9 to 46b4713 Compare March 13, 2024 17:26

enigbe added 3 commits March 14, 2024 08:52

docs: update shutdown section in ARCHITECTURE

3bf6fc8

chore: fmt all code to default max width

5aa5f65

enigbe force-pushed the refactor-use-listen-and-trigger-universally branch from 4b08725 to 5aa5f65 Compare March 14, 2024 07:58

enigbe requested a review from carlaKC March 14, 2024 08:02

carlaKC approved these changes Mar 14, 2024

View reviewed changes

carlaKC merged commit c3292a8 into bitcoin-dev-project:main Mar 14, 2024
2 checks passed

carlaKC mentioned this pull request Mar 14, 2024

Issue #168: Adding optional start and count values to sim.json file #173

Merged

adamjonas mentioned this pull request Mar 18, 2024

feat: add simulation config adamjonas/sim-ln#1

Open

This was referenced Mar 18, 2024

feat: add simulation config Extheoisah/sim-ln#1

Closed

feat: add simulation config Extheoisah/sim-ln#2

Closed

feat: add simulation config Extheoisah/sim-ln#3

Closed

carlaKC mentioned this pull request Mar 27, 2024

Code-Health: Clean Shutdown Across Threads #28

Closed

carlaKC mentioned this pull request Aug 1, 2024

shutdown: Select on listener on all channel sends #193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use listen and trigger universally #164

refactor: use listen and trigger universally #164

enigbe commented Jan 27, 2024 •

edited

Loading

carlaKC left a comment •

edited

Loading

carlaKC commented Jan 31, 2024

carlaKC commented Feb 2, 2024

enigbe commented Feb 5, 2024 •

edited

Loading

carlaKC left a comment

carlaKC Feb 8, 2024

carlaKC left a comment

carlaKC Feb 27, 2024

enigbe Mar 1, 2024

carlaKC Mar 2, 2024

enigbe Mar 11, 2024

carlaKC left a comment

carlaKC Mar 2, 2024

carlaKC commented Mar 11, 2024 •

edited

Loading

enigbe commented Mar 11, 2024

carlaKC Mar 12, 2024

carlaKC Mar 12, 2024 •

edited

Loading

carlaKC left a comment

carlaKC Mar 14, 2024

carlaKC Mar 14, 2024

carlaKC Mar 14, 2024

carlaKC Mar 14, 2024

carlaKC Mar 14, 2024

carlaKC Mar 14, 2024

refactor: use listen and trigger universally #164

refactor: use listen and trigger universally #164

Conversation

enigbe commented Jan 27, 2024 • edited Loading

What this PR does

Related Issue(s)

Notes

carlaKC left a comment • edited Loading

Choose a reason for hiding this comment

carlaKC commented Jan 31, 2024

carlaKC commented Feb 2, 2024

enigbe commented Feb 5, 2024 • edited Loading

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC commented Mar 11, 2024 • edited Loading

enigbe commented Mar 11, 2024

Choose a reason for hiding this comment

carlaKC Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enigbe commented Jan 27, 2024 •

edited

Loading

carlaKC left a comment •

edited

Loading

enigbe commented Feb 5, 2024 •

edited

Loading

carlaKC commented Mar 11, 2024 •

edited

Loading

carlaKC Mar 12, 2024 •

edited

Loading