Skip to content

Commit

Permalink
Add bytes_to_utf8_temp_pv()
Browse files Browse the repository at this point in the history
This is like bytes_to_utf8_free_me, but any new memory is arranged to be
freed at the end of the current pseudo block via SAVEFREEPV.

This adds the one missing function that are inverse to the
utf8_to_bytes_foo() ones.
  • Loading branch information
khwilliamson committed Jan 29, 2025
1 parent e7a6a02 commit dd06810
Show file tree
Hide file tree
Showing 6 changed files with 40 additions and 1 deletion.
3 changes: 3 additions & 0 deletions embed.fnc
Original file line number Diff line number Diff line change
Expand Up @@ -800,6 +800,9 @@ Adp |U8 * |bytes_to_utf8_free_me \
|NN const U8 *s \
|NN STRLEN *lenp \
|NULLOK void **free_me
Adip |U8 * |bytes_to_utf8_temp_pv \
|NN const U8 *s \
|NN STRLEN *lenp
AOdp |SSize_t|call_argv |NN const char *sub_name \
|I32 flags \
|NN char **argv
Expand Down
1 change: 1 addition & 0 deletions embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
# define bytes_from_utf8(a,b,c) Perl_bytes_from_utf8(aTHX_ a,b,c)
# define bytes_to_utf8(a,b) Perl_bytes_to_utf8(aTHX_ a,b)
# define bytes_to_utf8_free_me(a,b,c) Perl_bytes_to_utf8_free_me(aTHX_ a,b,c)
# define bytes_to_utf8_temp_pv(a,b) Perl_bytes_to_utf8_temp_pv(aTHX_ a,b)
# define c9strict_utf8_to_uv Perl_c9strict_utf8_to_uv
# define call_argv(a,b,c) Perl_call_argv(aTHX_ a,b,c)
# define call_atexit(a,b) Perl_call_atexit(aTHX_ a,b)
Expand Down
13 changes: 13 additions & 0 deletions inline.h
Original file line number Diff line number Diff line change
Expand Up @@ -1236,6 +1236,19 @@ Perl_bytes_to_utf8(pTHX_ const U8 *s, STRLEN *lenp)
return bytes_to_utf8_free_me(s, lenp, NULL);
}

PERL_STATIC_INLINE U8 *
Perl_bytes_to_utf8_temp_pv(pTHX_ const U8 *s, STRLEN *lenp)
{
void * free_me = NULL;
U8 * converted = bytes_to_utf8_free_me(s, lenp, &free_me);

if (free_me) {
SAVEFREEPV(free_me);
}

return converted;
}

PERL_STATIC_INLINE bool
Perl_utf8_to_bytes_new_pv(pTHX_ U8 const **s_ptr, STRLEN *lenp, void ** free_me)
{
Expand Down
11 changes: 10 additions & 1 deletion pod/perldelta.pod
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,16 @@ well.

=item *

XXX
Two new API functions are introduced to convert strings encoded in
native bytes format to UTF-8. These return the string unchanged if its
UTF-8 representation is the same as the original. Otherwise, new memory
is allocated to contain the converted string. This is in contrast to
the existing L<perlapi/C<bytes_to_utf8>> which always allocates new
memory. The new functions are L<perlapi/C<bytes_to_utf8_free_me>> and
L<perlapi/C<bytes_to_utf8_temp_pv>>.
L<perlapi/C<bytes_to_utf8_temp_pv>> arranges for the new memory to
automatically be freed. With C<bytes_to_utf8_free_me>, you are
responsible for freeing any newly allocated memory.

=back

Expand Down
5 changes: 5 additions & 0 deletions proto.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -3256,6 +3256,7 @@ Perl_bytes_from_utf8(pTHX_ const U8 *s, STRLEN *lenp, bool *is_utf8p)
/*
=for apidoc bytes_to_utf8
=for apidoc_item bytes_to_utf8_free_me
=for apidoc_item bytes_to_utf8_temp_pv
These each convert a string C<s> of length C<*lenp> bytes from the native
encoding into UTF-8 (UTF-EBCDIC on EBCDIC platforms), returning a pointer to
Expand All @@ -3275,6 +3276,13 @@ already there.
In both cases, the caller is responsible for arranging for any new memory to
get freed.
C<bytes_to_utf8_temp_pv> simply returns a pointer to the input string if the
string's UTF-8 representation is the same as its native representation, thus
behaving like C<bytes_to_utf8_free_me> in this situation. Otherwise, it
behaves like C<bytes_to_utf8>, returning a pointer to new memory containing the
conversion of the input. The difference is that it also arranges for the new
memory to automatically be freed by calling C<L</SAVEFREEPV>> on it.
C<bytes_to_utf8_free_me> takes an extra parameter, C<free_me> to communicate.
to the caller that memory was allocated or not. If that parameter is NULL,
C<bytes_to_utf8_free_me> acts identically to C<bytes_to_utf8>, always
Expand Down

0 comments on commit dd06810

Please sign in to comment.