Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #8099: Add binary:join/2 to stdlib #8100

Merged

Conversation

onno-vos-dev
Copy link
Contributor

See linked issue for details.

Copy link
Contributor

github-actions bot commented Feb 8, 2024

CT Test Results

    2 files     96 suites   1h 7m 38s ⏱️
2 163 tests 2 115 ✅ 48 💤 0 ❌
2 523 runs  2 473 ✅ 50 💤 0 ❌

Results for commit 7a3bd5c.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

<<"a, b, c">>
```
""".
-doc(#{since => <<"OTP 27.0">>}).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it'll make it into OTP 27 or not but I'm assuming that RC is still open 👍

-spec join([binary()], binary()) -> binary().
join([H], _Separator) -> H;
join([H | T], Separator) ->
join(T, Separator, H).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an option

Suggested change
join(T, Separator, H).
lists:foldl(fun(Element, Acc) -> <<Acc/binary, Separator/binary, Element/binary>> end, H, T).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not lose some performance though? Performance being the main reason for this implementation.

Copy link
Contributor Author

@onno-vos-dev onno-vos-dev Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's very little change performance wise in this suggestion nor do I see the big difference in reduction of complexity? Care to elaborate why you'd prefer this option? 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a will to replace generic code patterns with standard library functions (to use high-order-functions instead of manual recursion).

@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch 2 times, most recently from 335d9fc to 26e06f7 Compare February 8, 2024 10:10
@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch 2 times, most recently from 4b2771c to e0df3f0 Compare February 8, 2024 11:53
@onno-vos-dev
Copy link
Contributor Author

Squashed to get rid of some of the commit noise

@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch from e0df3f0 to 036b654 Compare February 8, 2024 12:39
@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch from 3fc013a to fa6d3b7 Compare February 8, 2024 14:49
@paulo-ferraz-oliveira
Copy link
Contributor

Fwiw, there was an attempt at this a few years back, that ended up not making it due to a core team decision. It's possible, though, that stuff's changed since then.

Comment on lines 955 to 957
%% Starting with an empty binary convinces the compiler to use the new "private append" optimisation
Acc = <<>>,
join(T, Separator, <<Acc/binary, H/binary>>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorng @jhogberg @michalmuskala

I understand why starting with the empty string makes the compiler use the new "private append" optimisation. However I wonder if this could be generalized?

While H cannot be privately appended because it comes from an external function, the next iteration of join/3 can be privately appended, because no one uses the intermediate binary. Therefore, for those "closed loops", would it make sense to have a bit in the binary that tells when to private append or not? Generally speaking, everything after the first iteration would be privately appended. This would allow private append to happen in more situations, although I am not aware of the costs of reserving one extra bit for binaries.

PS: I may be completely off mark here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why starting with the empty string makes the compiler use the new "private append" optimisation. However I wonder if this could be generalized?

While H cannot be privately appended because it comes from an external function, the next iteration of join/3 can be privately appended, because no one uses the intermediate binary.

The private append operation is faster because it does fewer tests and less working than general append operation. Therefore, the private append must only be used with a binary that has been specially prepared.

However, it should be possible for the compiler to generalize the optimization. If the call looks like:

join(T, Separator, H)

the compiler could rewrite it to:

    Acc = <<>>,
    join(T, Separator, <<Acc/binary, H/binary>>);

Not sure that this code pattern is common enough to make the optimization worthwhile to implement, though. It would not be needed if the clause is rewritten to:

join([_ | _]=List, Separator) when is_binary(Separator) ->
    join(List, Separator, <<>>);

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Feb 12, 2024
@jhogberg jhogberg added the stalled waiting for input by the Erlang/OTP team label Feb 19, 2024
@bjorng bjorng added this to the OTP-28.0 milestone Feb 20, 2024
@bjorng
Copy link
Contributor

bjorng commented Feb 20, 2024

After the first release candidate, we generally focus on bug fixes and polishing of features already included or planned for the release. To ensure that Erlang/OTP 27 will be as good as it possibly can be, we need to minimize the time we spend on things not to be included in the release. Therefore, we will not review this pull request until after OTP 27 has been released. If we have not came back to it before September, feel free to remind us.

@bjorng bjorng force-pushed the issue-8099-add-binary-join-to-stdlib branch from 83b55e7 to 9aed9c9 Compare October 31, 2024 05:08
@bjorng
Copy link
Contributor

bjorng commented Oct 31, 2024

Now we are back working on Erlang/OTP 28. OTP Technical Board will have to it approve this addition to the binary module. It is likely that it will be approved within a week or two.

Meanwhile, I've pushed a commit with suggested clean ups and proper error handling. Please review. If you approve, please rebase on the latest master and squash into one commit.

@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch from 9aed9c9 to 7a3bd5c Compare October 31, 2024 07:51
@onno-vos-dev
Copy link
Contributor Author

Now we are back working on Erlang/OTP 28. OTP Technical Board will have to it approve this addition to the binary module. It is likely that it will be approved within a week or two.

Meanwhile, I've pushed a commit with suggested clean ups and proper error handling. Please review. If you approve, please rebase on the latest master and squash into one commit.

Thank you! Squashed and rebased now 👍

@bjorng bjorng added testing currently being tested, tag is used by OTP internal CI and removed stalled waiting for input by the Erlang/OTP team labels Oct 31, 2024
@bjorng
Copy link
Contributor

bjorng commented Oct 31, 2024

Thanks! Added to our daily builds.

@bjorng bjorng merged commit b828aa5 into erlang:master Nov 7, 2024
20 checks passed
@bjorng
Copy link
Contributor

bjorng commented Nov 7, 2024

Thanks for the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.