Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pp_substr: avoid unnecessary work prior to inserting a replacement string #22910

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

richardleach
Copy link
Contributor

There look to be a couple of inefficiencies in pp_substr when preparing to insert a replacement string:

        if (repl) {
            SV* repl_sv_copy = NULL;

            if (repl_need_utf8_upgrade) {
                repl_sv_copy = newSVsv(repl_sv);
                sv_utf8_upgrade(repl_sv_copy);
                repl = SvPV_const(repl_sv_copy, repl_len);
            }
            if (!SvOK(sv))
                SvPVCLEAR(sv);
            sv_insert_flags(sv, byte_pos, byte_len, repl, repl_len, 0);
            SvREFCNT_dec(repl_sv_copy);
        }

As per the two commit messages:

  • repl_sv_copy = newSVsv(repl_sv); is overkill when we can do repl_sv_copy = newSVpvn(repl, repl_len);, which is a much more straightforward process.
  • By the time we get to this block, sv should definitely be SvOK. (This check has been changed to an assertion, rather than just removed, out of an abundance of caution.)

  • This set of changes does not require a perldelta entry.

Currently, when copying the replacement string in preparation for
`sv_utf8_upgrade()`, `pp_substr` calls:

    repl_sv_copy = newSVsv(repl_sv);

However, `repl_sv` was previously checked for GMAGIC, coerced to a
string if necessary, and the char * and length obtained by:

    repl = SvPV_const(repl_sv, repl_len);

We should be able to produce a copy more directly by just doing:

    repl_sv_copy = newSVpvn(repl, repl_len);
When replacing some characters in pp_substr, there should be no
need to check `SvOK(sv)`, since previous calls to
`SvPV_force_nomg(sv)` and (if needed) `sv_utf8_upgrade_nomg(sv);`
should always return a `sv` that is `SvOK`.
@@ -3434,7 +3434,7 @@ PP(pp_substr)
SV* repl_sv_copy = NULL;

if (repl_need_utf8_upgrade) {
repl_sv_copy = newSVsv(repl_sv);
repl_sv_copy = newSVpvn(repl, repl_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code could COW the copy, this will never do so.

That said, we know at this point we need to convert the byte per code point of the source to utf-8, and we don't actually need an SV to do the sv_insert_flags(), so we can just use bytes_to_utf8_free_me(), which won't even copy the string if the upgrade is trivial, ie. all of the code points are invariants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. I'll take a look at bytes_to_utf8_free_me().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants