-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discovery+graph: track job set dependencies in vb #9241
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
7acf321
to
fc00572
Compare
ValidationBarrier
ValidationBarrier
fc72083
to
7d95cd2
Compare
cc: @gijswijs for review |
func (v *ValidationBarrier) SignalDependants(job interface{}, allow bool) { | ||
// SignalDependents signals to any child jobs that this parent job has | ||
// finished. | ||
func (v *ValidationBarrier) SignalDependents(job interface{}, id JobID) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha sneaky change from British spelling to American 😝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been driving me crazy for years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding the distinction between parent and child jobs here both confusing and unnecessary. What we have here is a dependency graph of undifferentiated jobIDs. once all of the dependencies have run we can run. once we run we want to signal all of our dependents. We should be able to accomplish this with a single removeJob
that does this index cleanup and dependent signaling.
The main difficulty I'm noticing in this PR is that we have multiple IDs that we want to be able to map to JobID
s from disjoint domains. My recommendation here is to make the core algebra of this component undifferentiated and then have auxilliary mappings that help recover the relevant JobID
from the other unique protocol identifiers.
annID = msg.ShortChannelID | ||
|
||
// TODO: If ok is false, we have serious issues. | ||
parentJobIDs, ok = v.jobDependencies[childJobID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this read not need a mutex lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's locked above no?
graph/validation_barrier.go
Outdated
// We don't want to block when sending out the signal. | ||
select { | ||
case notifyChan <- struct{}{}: | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ok with swallowing the signal instead? Seems like this could case jobs to never be run, particularly if lastJob
is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. 1. How would jobs not be run in the current scenario? 2. What does swallowing the signal look like here?
cc: @gijswijs for review |
Outstanding things to do:
|
@Crypt-iQ - is this ready for re-review given the itest failures? |
Sorry no, it seems like I broke something |
Ready for review now, had an issue with |
@Crypt-iQ - defs ready given all the failures? |
now it is, the fn change broke it |
I could not figure out a decent way to use generics to accomplish this, so
will update this code with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this change. 👍 Somehow, for me, the old solution with the depency maps is harder to reason about. I like the concept of jobInfoMap
better.
One thing I found (somewhat in line with @ProofOfKeags earlier comment) is that the usage of the words dependency, dependent and parent, child respectively is inconsistent throughout. The code would be more readable if we would stick to one pair. If we choose parent/child we don't need to fight anymore about british vs us spelling, which is an added benefit.
Apart from that I made some small comments in the code.
jobDesc = fmt.Sprintf("job=lnwire.NodeAnnouncement, pub=%s", | ||
vertex) | ||
route.Vertex(msg.NodeID)) | ||
|
||
// Other types of jobs can be executed immediately, so we'll just | ||
// return directly. | ||
case *lnwire.AnnounceSignatures1: | ||
// TODO(roasbeef): need to wait on chan ann? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe return nil
here and remove the following block below:
// If it's not ok, it means either the job is not a dependent type, or
// it doesn't have a dependency signal. Either way, we can return
// early.
if !ok {
return nil
}
Same for the case
after this one.
Less loc and to my mind clearer intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
graph/validation_barrier.go
Outdated
default: | ||
} | ||
|
||
// If this is the last parent job for this id, also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// If this is the last parent job for this id, also | |
// If this is the last parent job for this annID, also |
Or I could not be understanding this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's correct it should have been annID
. Is it more clear now?
graph/validation_barrier_test.go
Outdated
@@ -120,11 +131,19 @@ func TestValidationBarrierQuit(t *testing.T) { | |||
// Signal completion for the first half of tasks, but only allow | |||
// dependents to be processed as well for the second quarter. | |||
case i < numTasks/4: | |||
barrier.SignalDependants(anns[i], false) | |||
err := barrier.SignalDependents( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the deny
and allow
logic the first case can be removed. There is no difference with the second case anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, good catch
This omits calls to InitJobDependencies, SignalDependants, and WaitForDependants. These changes have been made here because the router / builder code does not actually need job dependency management. Calls to the builder code (i.e. AddNode, AddEdge, UpdateEdge) are all blocking in the gossiper. This, combined with the fact that child jobs are run after parent jobs in the gossiper, means that the calls to the router will happen in the proper dependency order.
@ellemouton: review reminder |
This commit does two things: - removes the concept of allow / deny. Having this in place was a minor optimization and removing it makes the solution simpler. - changes the job dependency tracking to track sets of abstact parent jobs rather than individual parent jobs. As a note, the purpose of the ValidationBarrier is that it allows us to launch gossip validation jobs in goroutines while still ensuring that the validation order of these goroutines is adhered to when it comes to validating ChannelAnnouncement _before_ ChannelUpdate and _before_ NodeAnnouncement.
Addressed comments, but may need to hold off on upgrading the |
@Crypt-iQ is this ready for another round or do you want to address CI first? |
I think for this to be ready for review, the changes made to the |
This PR changes the
ValidationBarrier
to track abstract job dependencies. This just means that every time a child job comes in (i.e. channel update or node announcement), we track the set of possible parent jobs that are related to it (channel announcement(s)) that have registered viaInitJobDependencies
. The goroutines containing the child jobs will then wait to be notified every time one of their parent jobs completes. From the child job's POV, this just works as ref-counting except that you're only counting the parent jobs you're interested in.With this, we can now extend the
ValidationBarrier
to track any sort of abstract job that requires both concurrency and waiting for another job to finish. It also makes it possible in a future PR to very easily make node announcements depend on channel announcements. See the commit messages for more details.TODO:
ValidationBarrier
and ensure that all child jobs finish after their related parent jobs.