Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce binary size by using trait objects more #3110

Open
divergentdave opened this issue May 7, 2024 · 1 comment
Open

Reduce binary size by using trait objects more #3110

divergentdave opened this issue May 7, 2024 · 1 comment

Comments

@divergentdave
Copy link
Collaborator

Compilation time, peak memory usage during linking, and binary size of janus_aggregator are unusually high. I ran cargo-llvm-lines on a few build configurations, and it reports that janus_aggregator_core::datastore::Datastore::run_tx_once::{{closure}}::{{closure}} and janus_aggregator_core::datastore::Datastore::run_tx::{{closure}}::{{closure}} are among the top contributors to code size. Each has 431 instantiations in a test build. Using llvm-cov to generate a test coverage report helps visualize where these symbols come from. I think both nested closures comes from the #[tracing::instrument()] macro on each method.

We may be able to reduce the number of instantiations if we replace the type parameter for the transaction's inner closure with a trait object. The costs of the indirect jump should be okay to absorb, since each transaction is closely tied to a network roundtrip already. I tried applying this change to just run_tx_once(), and I got it to compile with just a few more Sync bounds, and it reduced the number of instantiations of the relevant instrument closures in a release build from from 178 to 71. (We still need the T type parameter, so we can return results of the appropriate type on the stack, which explains most of the instantiation count) The number of LLVM IR lines for one closure dropped from 379073 to 134388, and the other dropped from 121054 to 47816. This is a 4.5% drop in the total number of lines. Applying the same change to both run_tx() and run_tx_once() should get us a bit more.

I previously tried eliminating the C: Clock type parameter, but that didn't have much of an impact on the numbers FWIW.

@branlwyd branlwyd self-assigned this May 8, 2024
@divergentdave
Copy link
Collaborator Author

FWIW using rust-lld instead of GNU ld ameliorates the first two issues greatly, and it will become the default on Linux in two stable releases' time. Compilation time is cut by around ten seconds, and there isn't a memory pressure issue when running doctests anymore, as peak memory consumption is around 2GB.

@branlwyd branlwyd removed their assignment May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants