Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize binary get_number implementation by reading multiple bytes at once #4391

Merged
merged 7 commits into from
Nov 29, 2024

Conversation

TianyiChen
Copy link
Contributor

@TianyiChen TianyiChen commented Jun 7, 2024

This PR improves performance for get_number implementation by reading multiple bytes at once, which saves calling overhead especially when interacting with file I/O. It adds get_elements to input adapters and allow them to select more efficient underlying calls to read multiple bytes if available

Performance for reading msgpack from C style FILE

develop

FromMsgpack/floats             166717521 ns    163715750 ns            4 bytes_per_second=52.4267Mi/s
FromMsgpack/signed_ints        171707844 ns    167527000 ns            4 bytes_per_second=51.234Mi/s
FromMsgpack/unsigned_ints      167388344 ns    164910750 ns            4 bytes_per_second=52.0468Mi/s
FromMsgpack/small_signed_ints  102578387 ns    100799000 ns            7 bytes_per_second=46.3689Mi/s

branch:

FromMsgpack/floats              61241576 ns     60370091 ns           11 bytes_per_second=142.174Mi/s
FromMsgpack/signed_ints         67659257 ns     62698083 ns           12 bytes_per_second=136.895Mi/s
FromMsgpack/unsigned_ints       59830260 ns     57518000 ns           13 bytes_per_second=149.224Mi/s
FromMsgpack/small_signed_ints   63264757 ns     61850167 ns           12 bytes_per_second=75.5688Mi/s

Questions:

  • for get_elements in wide_string_input_adapter, I encountered a compile error without it. If I don't implement it, tests are passing locally, which seems to make sense: if wchar is used, likely the content has non-ASCII text and we won't want to interpret it as binary numbers, where get_elements is currently used. After looking into more UTF-8 decoding rules I think parsing binary from wide string doesn't make sense, and is making it an error.
  • adding msgpack parsing via C FILE which was used for benchmarking to the benchmark.

Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

Please don't

  • The C++11 support varies between different compilers and versions. Please note the list of supported compilers. Some compilers like GCC 4.7 (and earlier), Clang 3.3 (and earlier), or Microsoft Visual Studio 13.0 and earlier are known not to work due to missing or incomplete C++11 support. Please refrain from proposing changes that work around these compiler's limitations with #ifdefs or other means.
  • Specifically, I am aware of compilation problems with Microsoft Visual Studio (there even is an issue label for this kind of bug). I understand that even in 2016, complete C++11 support isn't there yet. But please also understand that I do not want to drop features or uglify the code just to make Microsoft's sub-standard compiler happy. The past has shown that there are ways to express the functionality such that the code compiles with the most recent MSVC - unfortunately, this is not the main objective of the project.
  • Please refrain from proposing changes that would break JSON conformance. If you propose a conformant extension of JSON to be supported by the library, please motivate this extension.
  • Please do not open pull requests that address multiple issues.

@github-actions github-actions bot added the L label Jun 7, 2024
@TianyiChen TianyiChen force-pushed the msgpack-int branch 3 times, most recently from b25a53f to ea8b03d Compare June 7, 2024 01:44
Copy link

github-actions bot commented Jun 7, 2024

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @TianyiChen
Please read and follow the Contribution Guidelines.

@coveralls
Copy link

Coverage Status

coverage: 99.951% (-0.05%) from 100.0%
when pulling ea8b03d on TianyiChen:msgpack-int
into 8c391e0 on nlohmann:develop.

@nlohmann
Copy link
Owner

I like this idea! @TianyiChen any plans to continue working on this?

@nlohmann nlohmann added the please rebase Please rebase your branch to origin/develop label Nov 26, 2024
@github-actions github-actions bot added the tests label Nov 27, 2024
@TianyiChen
Copy link
Contributor Author

TianyiChen commented Nov 27, 2024

Sure, I have rebased the branch and added a from msgpack benchmark using C FILE. Could you retrigger the CI and let's see what's missing right now? I saw the appveyor is failing which previously didn't trigger when rebasing, taking a look

@coveralls
Copy link

coveralls commented Nov 27, 2024

Coverage Status

coverage: 99.649% (+0.002%) from 99.647%
when pulling 26cddc6 on TianyiChen:msgpack-int
into e41905f on nlohmann:develop.

@nlohmann nlohmann marked this pull request as ready for review November 27, 2024 21:31
@nlohmann nlohmann self-requested a review as a code owner November 27, 2024 21:31
@TianyiChen TianyiChen force-pushed the msgpack-int branch 3 times, most recently from 895132a to 7acebd4 Compare November 27, 2024 23:55
@nlohmann nlohmann removed the please rebase Please rebase your branch to origin/develop label Nov 28, 2024
Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nlohmann
Copy link
Owner

I could reproduce the benchmark results. Good job, thank you!

Before

FromMsgpack/jeopardy          1149154833 ns   1148401000 ns            1 bytes_per_second=38.3318Mi/s
FromMsgpack/canada              26770413 ns     26757962 ns           26 bytes_per_second=37.6418Mi/s
FromMsgpack/citm_catalog         9823588 ns      9819069 ns           72 bytes_per_second=33.2626Mi/s
FromMsgpack/twitter             10075707 ns     10066348 ns           69 bytes_per_second=38.0386Mi/s
FromMsgpack/floats             175452521 ns    175401500 ns            4 bytes_per_second=48.9339Mi/s
FromMsgpack/signed_ints        176531916 ns    176491750 ns            4 bytes_per_second=48.6316Mi/s
FromMsgpack/unsigned_ints      176015917 ns    175288250 ns            4 bytes_per_second=48.9655Mi/s
FromMsgpack/small_signed_ints  108130396 ns    107524500 ns            6 bytes_per_second=43.4686Mi/s

After

FromMsgpack/jeopardy          1137093749 ns   1134339000 ns            1 bytes_per_second=38.807Mi/s
FromMsgpack/canada              13543979 ns     13528462 ns           52 bytes_per_second=74.4518Mi/s
FromMsgpack/citm_catalog         8898435 ns      8895269 ns           78 bytes_per_second=36.717Mi/s
FromMsgpack/twitter              9764953 ns      9760380 ns           71 bytes_per_second=39.231Mi/s
FromMsgpack/floats              58513708 ns     58492750 ns           12 bytes_per_second=146.737Mi/s
FromMsgpack/signed_ints         59872042 ns     59853500 ns           12 bytes_per_second=143.401Mi/s
FromMsgpack/unsigned_ints       55908673 ns     55896000 ns           13 bytes_per_second=153.554Mi/s
FromMsgpack/small_signed_ints   61281736 ns     61260583 ns           12 bytes_per_second=76.2961Mi/s

@nlohmann nlohmann added this to the Release 3.11.4 milestone Nov 28, 2024
@nlohmann nlohmann merged commit 935c6ee into nlohmann:develop Nov 29, 2024
123 checks passed
slowriot pushed a commit to slowriot/json that referenced this pull request Jan 10, 2025
… at once (nlohmann#4391)

* multibyte binary reader

* wide_string_input_adapter fallback to get_character

Update input_adapters.hpp

* Update json.hpp

* Add from msgpack test

* Test for broken msgpack with stream, address some warnings

* Reading binary number from wchar as an error, address warnings

* Not casting float to int, it violates strict aliasing rule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants