Fix geospacial queries to use the MongoDB index #171
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #170
TOC
TL;DR
This changes the existing MongoDB geospatial index in "2d" and rework the query logic to use that.
Prior to this PR MongoDB was not able to use the geospatial index when doing boundingbox queries and this lead to a major performance degradation and high RAM usage on the MongoDB server instance. See #170 for details.
Technical Stuff
MongoDB uses the
2dsphere
index only for queries with the$geometry
operator, see https://docs.mongodb.com/manual/tutorial/query-a-2dsphere-index/. For "basic"$polygon
queries only the legacy2d
index can be used.To leverage the
2dsphere
index a newer version of the MongoDB Library has to be used. The legacy version of the MongoDB driver (1.x) (currently used throughout the codebase) can not do that, unfortunately.Benchmarks
On my local machine the query time for a boundingbox query went down from ~400ms to ~30ms. Using the index the mongo instance runs also fine with 512MB RAM in docker.
Sorting Behavior Change
Additionally this change also skips the sorting of the result set by ID when doing boundingbox queries to get even more performance. This is especially a thing when dealing with large result sets, e.v. 500 records or more.