-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HashSet based filtering optimization to XContentMapValues #17160
base: main
Are you sure you want to change the base?
Conversation
This optimization enhances document filtering when field names are simple (no dots or wildcards in field names and no dots in document keys). In such cases, it uses a HashSet-based implementation instead of automaton matching to prevent TooComplexToDeterminizeException when processing documents with numerous long field names. Changes: - Add HashSet optimization for simple field names - Split filter implementation into set-based and automaton-based - Add helper methods to check field name patterns Signed-off-by: hye-on <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17160 +/- ##
============================================
+ Coverage 72.24% 72.31% +0.07%
- Complexity 65704 65711 +7
============================================
Files 5318 5318
Lines 305674 305698 +24
Branches 44349 44355 +6
============================================
+ Hits 220834 221075 +241
+ Misses 66769 66469 -300
- Partials 18071 18154 +83 ☔ View full report in Codecov by Sentry. |
server/src/main/java/org/opensearch/common/xcontent/support/XContentMapValues.java
Outdated
Show resolved
Hide resolved
…dots Signed-off-by: hye-on <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @hye-on ! This looks great.
I checked the code coverage report and the new code is well-covered the existing tests, which is nice.
We've missed the code freeze date for 2.19, so I think this will ship with 3.0, which we're currently planning as the next release.
Can you please add an entry to the CHANGELOG-3.0.md
file with an end-user-friendly description of the change? Maybe something like "Use simpler matching logic for source fields when explicit field names (no wildcards or dot-paths) are specified".
Signed-off-by: hye-on <[email protected]>
@msfroh Sorry for the late response! Thanks for the suggested changelog entry—I’ve added it! |
❌ Gradle check result for dd9749e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Michael Froh <[email protected]>
Description
This optimization enhances document filtering when field names are simple (no dots or wildcards in field names and no dots in document keys). In such cases, it uses a HashSet-based implementation instead of automaton matching to prevent TooComplexToDeterminizeException when processing documents with numerous long field names.
Changes:
Reason for Checking Dots in Document Keys
Since dots in field names are treated as sub-objects (e.g., if a document contains
a.b
as a property anda
is an include, thena.b
will be kept in the filtered map as per existing JavaDoc below), we check document keys for dots to determine whether to use HashSet-based filtering or fall back to automaton matching.JavaDoc
Note on Testing
Thank you for taking the time to review this PR.
This change focuses on internal optimization, and I believe the existing tests already sufficiently cover the functional scenarios, so I did not add new tests. I considered including tests to verify the implementation selection (HashSet vs. automaton), but I felt such tests might be too closely tied to implementation details, potentially making them fragile. If you believe additional tests are necessary, I would greatly appreciate your feedback.
Comment
I made every effort to ensure the changes align with the style and intent of the existing codebase. If you notice any areas, even minor ones, that could benefit from revision, I would be happy to incorporate your suggestions.
Thank you again for your time and guidance! 😀
Related Issues
Resolves #17114
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.