Evaluate more advanced optimizations like LTO, PGO, PLO #141
Replies: 2 comments 1 reply
-
Regarding LTO, I've made quick local tests regarding how efficient LTO will be from the improving the binary size perspective. In this test, I didn't measure performance improvements since it's a bit more difficult to perform compared to the binary size comparisons. Env: Fedora 41, Rust 1.83, the latest version of the project at the moment. I added Fat LTO +
|
Beta Was this translation helpful? Give feedback.
-
Oh my goodness! @zamazan4ik, I read your post when you first put it out but I must have forgotten to reply. I'm sorry about that. If you'd like to open a PR to enable |
Beta Was this translation helpful? Give feedback.
-
Hi!
I just read an article about Harper at Reddit - nice work! I guess I have several possibly interesting ideas to try with Harper regarding its performance and binary size.
At first, I saw that Link-Time Optimization (LTO) was not enabled. Have you tried to enable it before for the project? It can help a lot with reducing the binary size and helps a compiler perform more aggressive optimizations (always a good thing to have). If you think that enabling LTO with the default one "Release" profile can affect developers experience too much, you can create a dedicated build profile like "advanced_release" or "dist" - many projects enable LTO exactly in this way.
Secondly, after LTO I highly recommend taking a look at PGO (Profile-Guided Optimization). This optimization gives to a compiler more information about how a program is executed. Based on this, the compiler can perform more aggressive optimizations with better runtime performance. I collect as much as many materials about PGO in my repo - https://github.com/zamazan4ik/awesome-pgo . There you can read more about actual PGO benchmarks in various software (parsers, compilers, databases, etc.). Also, highly recommend to read the (unfinished-yet) article/book about PGO - it can answer many of your possible questions.
I also performed some quick PGO benchmarks for the project based on its built-in benchmarks.
Test environment
harper
version:master
branch on commitccf14d1535c2f1450b42027afac2a8446f98e11d
taskset -c 0
is used for reducing the OS scheduler's noise during the benchmarks (as much as I can guarantee ofc). For PGO optimization I use cargo-pgo tool.I got the following results.
Release (
taskset -c 0 cargo bench --workspace --all-features
):PGO optimized compared to Release (
taskset -c 0 cargo pgo optimize bench -- --workspace --all-features
):(just for reference) PGO instrumented compared to Release (
taskset -c 0 cargo pgo bench -- --workspace --all-features
):According to the results, PGO can help with improving the library performance further. However, in the uncached example, we see performance degradation. I think it's due to the training dataset skew between loads for something like that - more experiments can be performed in this area. Before that, maybe this PGO-related information would be helpful for other performance-oriented users.
After PGO, I can suggest evaluating PLO (Post-Link Optimization) with LLVM BOLT as an additional optimization step. However, I recommend enabling it only after PGO (PGO usually works better than PLO in practice for now).
Regarding priorities. I highly suggest enabling LTO now. PGO and PLO, IMHO, can wait for more time (I guess spending this time on actual features would be a better option since switching on PGO with PLO, and possible CI pipelines tweaks can consume too much human resources).
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions