Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swarm): rewrite NetworkBehaviour macro for more optimal code gen #5303

Closed
wants to merge 0 commits into from

Conversation

jakubDoka
Copy link
Contributor

Description

I have rewritten NetworkBehavior derive macro to generate more optimal and faster to compile code when using more behaviours (5, 10, 20), I noticed performance degrades even though I benchmarked the same load. This is related to #5026.

New macro implementation generates enums and structs for each type implementing the traits instead of type-level linked lists. In many cases, this makes resulting types more compact (we store just one enum tag, whereas composed Eithers each need to store tags to make values addressable) and makes the enum dispatch constant. This also opened the opportunity to optimize UpgradeInfoIterator and ConnectionHandler into a state machine (they now remember where they stopped polling/iterating and skipped exhausted subhandlers/iterators). We could optimize the NetworkBehaviour itself too, but it would require users to put extra fields into the struct (this could be optional for BC).

Change checklist

  • I have performed a self-review of my code
  • I have made corresponding changes to the documentation (none)
  • I have added tests that prove my fix is effective or that my feature works (no regressions)
  • A changelog entry has been made in the appropriate crates

@jakubDoka
Copy link
Contributor Author

Numbers for new macro

behaviours iters protocols per behaviour - per iter -
1 10000 10 5.0441 ns 5.0892 ns 5.1472 ns
1 10000 100 7.8285 ns 8.0366 ns 8.3113 ns
1 10000 1000 19.439 ms 19.568 ms 19.731 ms
5 10000 2 6.1172 ns 6.2539 ns 6.4356 ns
5 10000 20 14.192 ns 14.835 ns 15.676 ns
5 10000 200 270.12 ms 272.66 ms 275.51 ms
10 10000 1 8.0800 ns 8.2733 ns 8.4751 ns
10 10000 10 18.930 ns 19.989 ns 21.327 ns
10 10000 100 301.00 ms 307.12 ms 313.86 ms
20 5000 1 12.019 ns 12.222 ns 12.488 ns
20 5000 10 40.238 ns 41.747 ns 43.697 ns
20 5000 100 731.50 ms 748.87 ms 767.03 ms

Compared to #5026.

behaviours iters protocols per behaviour - per iter - - per iter - verdict
1 10000 10 6.9099 ns 7.0400 ns 7.1924 ns +19.863% +51.795% +95.414% Performance has regressed.
1 10000 100 11.301 ns 11.701 ns 12.220 ns +3.6304% +61.992% +156.01% No change in performance detected.
1 10000 1000 45.102 ms 45.699 ms 46.335 ms +129.98% +133.54% +137.49% Performance has regressed.
5 10000 2 6.7147 ns 6.8633 ns 7.0595 ns -13.115% +21.191% +65.712% No change in performance detected.
5 10000 20 33.995 ns 36.069 ns 38.805 ns +56.477% +186.56% +410.85% Performance has regressed.
5 10000 200 844.90 ms 863.90 ms 883.72 ms +208.68% +216.83% +224.95% Performance has regressed.
10 10000 1 9.6857 ns 9.8934 ns 10.168 ns -5.8785% +33.452% +94.927% No change in performance detected.
10 10000 10 353.64 ns 380.66 ns 415.67 ns +1326.5% +2344.9% +4154.7% Performance has regressed.
10 10000 100 1.2271 s 1.2532 s 1.2804 s +296.43% +308.03% +320.38% Performance has regressed.
20 5000 1 15.601 ns 16.091 ns 16.726 ns +11.958% +65.527% +145.85% Performance has regressed.
20 5000 10 128.33 µs 138.88 µs 152.28 µs +237805% +421868% +752554% Performance has regressed.
20 5000 100 1.9057 s 1.9361 s 1.9668 s +151.25% +158.54% +166.02% Performance has regressed.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you for this!

I don't have the capacity to review this in detail, will defer to @jxs. Overall, performance improvements are always welcome. We don't have many explicit tests that ensure the derive macro works correctly though. Esp. if we start to generate our own poll impls, I am a bit worried that we are introducing a source of very-hard-to-find bugs.

We could offer the new version behind a feature toggle and let users experiment with it instead of re-writing it right away. For example, we can start using it internally for all our tests etc.

What do you think?

@jakubDoka
Copy link
Contributor Author

Feature flag sounds very reasonable (since I want to use this in my project), In that case, I can introduce a required field in the behavior that can store the state needed to also improve the main poll implementation. The main idea behind the custom poll is simple:

// previous implementation
#(
    if let std::task::Poll::Ready(event) = self.#fields.poll(cx) {
        return std::task::Poll::Ready(event
            .map_custom(#to_beh::#var_names)
            .map_outbound_open_info(#ooi::#var_names)
            .map_protocol(#ou::#var_names));
    }
)*

// proposed implementation
let mut fuel = #beh_count;
while fuel > 0 {
    // save the poll position to avoid repolling exhaused handlers
    match self.field_index {
        #(#indices => match self.#fields.poll(cx) {
            std::task::Poll::Ready(event) =>
                return std::task::Poll::Ready(event
                    .map_custom(#to_beh::#var_names)
                    .map_outbound_open_info(#ooi::#var_names)
                    .map_protocol(#ou::#var_names)),
            std::task::Poll::Pending => {}
        },)*
        _ => {
            self.field_index = 0;
            continue;
        }
    }
    self.field_index += 1;
    fuel -= 1;
}

in each poll:

  1. we return pending if all handlers return pending
  2. if behavior returns ready, the next poll will poll it first
  3. if we start polling from later behavior, we wrap around if needed

This polling pattern ensures we don't repoll other behaviors while exhausting events from some point of the hierarchy. Hopefully, the compiler is smart enough to use a branch table for the match.

For this to work, I need to maintain an integer between poll calls, thus the extra field.

Copy link
Contributor

mergify bot commented Apr 15, 2024

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

@drHuangMHT
Copy link
Contributor

Find myself doing this too lol. My implementation mimics existing *Select structs(public or hidden). Somehow I managed to make it work, which is funny. Past discussion: #3902. TL;DR: type-based composition is preferred.

Copy link
Contributor

mergify bot commented Apr 22, 2024

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

@thomaseizinger thomaseizinger requested a review from jxs May 18, 2024 11:58
Copy link
Contributor

mergify bot commented May 18, 2024

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

@jxs jxs changed the title fix(swarm): NetworkBehaviour macro rewritten to generate more optimal code fix(swarm): rewrite NetworkBehaviour macro for more optimal code gen Jun 3, 2024
Copy link
Contributor

mergify bot commented Jun 4, 2024

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

1 similar comment
Copy link
Contributor

mergify bot commented Jun 27, 2024

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

@dhuseby
Copy link
Contributor

dhuseby commented Jul 2, 2024

@jakubDoka we have a rust-libp2p maintainers meeting where we all discuss technical changes/PRs that I think you should attend. The next one is starting right now: https://lu.ma/2024-07-02-rust-libp2p

But the next one is in two weeks: https://lu.ma/2024-07-16-rust-libp2p

@jxs jxs added this to the v0.54.0 milestone Jul 5, 2024
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jakubDoka, thanks for this and sorry for the delay in reviewing, can you address the conflicts? Thanks!

@jakubDoka
Copy link
Contributor Author

@dhuseby sorry, I missed the last one, I'll hopefully show up on the 16th

@jakubDoka jakubDoka closed this Jul 8, 2024
@jakubDoka jakubDoka force-pushed the network-behaviour-rework branch from 3f8eb31 to 4e4b594 Compare July 8, 2024 16:27
@jakubDoka
Copy link
Contributor Author

Well that was an accident

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants