-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl_sv_setsv_flags - IV/NV & cold code optimisation #22725
Perl_sv_setsv_flags - IV/NV & cold code optimisation #22725
Conversation
Note:I tried to create a fast path for PV->PV assignments, the most popular path after the fast NULL/IV/NV code. It was easy enough to make modifications to reduce the number of instructions and branches, but the number of actual cycles reported by perf either stayed identical or increased! This seemed to be due to much greater front-end and back-end stalls. That could be down to coincidental unfavourable alignment, so might be something to revisit, Here's a section of the gcov output produced by running
|
With regards to the performance questions, I believe Intel vTune is intended for answering these types of question, and unlike years ago is part of the free oneAPI toolkit. I haven't had a need for it myself (yet) but it might worth trying here if you have the time/inclination. |
8458306
to
7cf872e
Compare
One other thing here: this is apparently a performance optimization, the results of benchmarks would be useful, and the conditions they were measured under. |
Thanks for the reminder, @tonycoz. I measured a few different scenarios using No measurable change was expected or observed on:
There seemed to be a tiny improvement to the following, but nothing significant:
(The fast-NULL/IV/NV block at the top of the function, plus the PV->PV path, account Noticeable changes were expected and observed when blead upgraded to a type
blead:
patched:
blead:
patched:
blead:
patched:
blead: 7738900 kbytes |
I'll add doing this to my list for 2025. :) |
When the fast code at the start of Perl_sv_setsv_flags was modified to also support bodyless NVs, the simplest possible change was made. However, this meant that there was no fast handling when one SV was an IV and the other a NV. Actually having this seems desirable since it avoids the need to allocate (and later release) an XPVIV or XPVNV body.
The fast code at the top of Perl_sv_setsv_flags now handles all cases where both SVs are < SVt_NV / SVt_IV, depending on the size of NVs. This means that the subsequent code paths involving those combinations are unreachable and can be removed to streamline there function. Note: Doing this actually made a difference with gcc 12.2.0, which didn't seem to figure out that this was possible by itself. Similarly, sprinking some ASSUME() statements around didn't help.
Perl_sv_setsv_flags has a number of fail-safe checks which will croak if triggered. However, these code paths are *really* cold - they aren't even hit by the test harness. Since they are so cold and always result in an immediate croak, they can be pulled out into an unoptimized helper function. This leaves Perl_sv_setsv_flags smaller and therefore more cache friendly.
a867572
to
c0d317f
Compare
Rebased. Please let me know if there are any further changes needed before it is good to merge. |
Perl_sv_setsv_flags
is one of the hottest functions in the interpreter,at least when looking at the coverage from running the test harness.
This PR basically does two things:
The "fast number" code early in the function now handles one SV being
an IV and the other a NV, if both are "bodyless" types. This has the
upshot that an IV will only be upgraded to an NV, not a PVNV, to store
an NV value. An NV will be downgraded to an IV, not upgraded to a PVIV,
to store an IV value. This saves having to allocate/free actual bodies.
Having done this, subsequent code paths later in the function are
rendered completely unreachable and have been excised.
These changes should be transparent to most Perl users, only anyone who
is actually looking at SV types - presumably in test code - might notice.
The croak-ing code within extremely cold fail-safe code paths has been
moved out into a helper function. (I'd have put this in cold.c if we
had one.) This new function need not be optimised.
This change should be entirely transparent.