Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2065 use compressed string view for quick comparison if the leaf is compressed #7880

Merged
merged 5 commits into from
Jul 16, 2024

Conversation

nicola-cab
Copy link
Member

@nicola-cab nicola-cab commented Jul 11, 2024

What, How & Why?

Optimize string searching which are not for equality.
Current search relies on decompressing strings and comparing them via StringData comparison operator.
We can instead rely on the CompressedStringView if the leaf is compressed, and we have got a compressed_string_id for the current StringNode.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed
  • bindgen/spec.yml, if public C++ API changed

@nicola-cab nicola-cab requested a review from ironage July 11, 2024 15:40
@jedelbo jedelbo removed their request for review July 12, 2024 12:55
Copy link

coveralls-official bot commented Jul 12, 2024

Pull Request Test Coverage Report for Build nicola.cabiddu_1844

Details

  • 66 of 66 (100.0%) changed or added relevant lines in 4 files are covered.
  • 70 unchanged lines in 15 files lost coverage.
  • Overall coverage decreased (-0.001%) to 90.937%

Files with Coverage Reduction New Missed Lines %
src/realm/array_mixed.cpp 1 94.26%
src/realm/dictionary.cpp 1 85.52%
src/realm/index_string.hpp 1 93.48%
src/realm/node_header.hpp 1 63.35%
src/realm/object-store/sync/sync_session.cpp 1 92.01%
src/realm/query_engine.hpp 1 94.09%
src/realm/sync/noinst/server/server.cpp 1 74.26%
src/realm/util/serializer.cpp 1 90.43%
src/realm/query_expression.hpp 2 93.82%
src/realm/mixed.cpp 3 86.46%
Totals Coverage Status
Change from base Build jorgen.edelbo_366: -0.001%
Covered Lines: 218212
Relevant Lines: 239959

💛 - Coveralls

StringData t = get_string(s);
if constexpr (case_sensitive_comparison) {
// case insensitive not implemented for: >, >=, <, <=
if (cond(t, m_string_value))
return s;
}
else {
StringData t = get_string(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?

Copy link
Member Author

@nicola-cab nicola-cab Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for operators like BeginsWith and EndsWith. Especially for case-insensitive comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same line of code exists 7 lines up, so it seems redundant?


// special handling for !=, <, <= , >, >= if the leaf is compressed and we have got a compressed string
// id.
if constexpr (realm::is_any_v<TConditionFunction, NotEqual, Greater, Less, GreaterEqual, LessEqual>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these operators provide an integer overload for bool operator()(int64_t v1, int64_t v2) const which already does what you want, no need to reimplement it here. Just call it with the result of the comparison, and the other argument can be 0.

}
// The compressed strings are identical or one is the prefix of the other
return B.size - A.size;
return A.size - B.size;
Copy link
Contributor

@ironage ironage Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the comment below matches this change.
I'm also not a fan of such implicit unsigned to signed casting because the result is actually a large unsigned number when A.size is < B.size and we rely on the type conversion to do the right thing here. I think it does work for all platforms we care about, but if for some reason the return type changed to int64_t then this code would be incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I can change the comment and maybe add an explicit cast. I don't think we can ever have a platform where int matches with int64_t, but never say never.

@nicola-cab nicola-cab requested a review from ironage July 15, 2024 15:32
@nicola-cab nicola-cab merged commit aec4ca0 into feature/string-compression Jul 16, 2024
43 checks passed
@nicola-cab nicola-cab deleted the nc/CORE-2065 branch July 16, 2024 09:25
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants