Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad API Query Performance (1000ms for a basic bounding box request) #170

Closed
remuslazar opened this issue Jan 27, 2021 · 15 comments · Fixed by #171
Closed

Bad API Query Performance (1000ms for a basic bounding box request) #170

remuslazar opened this issue Jan 27, 2021 · 15 comments · Fixed by #171

Comments

@remuslazar
Copy link
Contributor

I have noticed a quite severe performance degradation of the current OCM public API. For example:

export ocm_key="some-valid-api-key-here"

time curl -H "X-API-Key: $ocm_key" -s https://api.openchargemap.io/v3/poi'?output=json&maxresults=1&compact=true&verbose=false&camelcase=true&boundingbox=(48.8,9.1),(48.7,9.2)' >/dev/null

curl -H "X-API-Key: $ocm_key" -s  > /dev/null  0,02s user 0,01s system 2% cpu 1,094 total

This very basic request, just using a bounding box for filtering and limiting the result set to one single record takes about one second to complete.

I did also setup a OCM Mirror as described here to see if the issue persists and also to be able to debug it. And, yes, it does:

time curl -H "X-API-Key: $ocm_key" -s https://ocm-mirror.io.ev-freaks.com/v3/poi'?output=json&maxresults=1&compact=true&verbose=false&camelcase=true&boundingbox=(48.8,9.1),(48.7,9.2)' >/dev/null
curl -H "X-API-Key: $ocm_key" -s  > /dev/null  0,01s user 0,00s system 3% cpu 0,480 total

I am getting better results using my own mirror (480ms vs. 1094ms) but this is still quite bad..

I was also looking at the mongoDB requests in docker and I saw this kind of queries:

(..)
		"command" : {
			"$truncated" : "{ find: \"poi\", filter: { $and: [ { SpatialPosition: { $within: { $polygon: [ [ 9.077199999999999, 48.8741 ], [ 9.239000000000001, 48.8741 ], [ 9.239000000000001, 49.0802 ], [ 9.077199999999999, 49.0802 ], [ 9.077199999999999, 48.8741 ] ] } } }, { AddressInfo: { $ne: null } }, { $or: [ { SubmissionStatusTypeID: null }, { SubmissionStatusTypeID: 100 }, { SubmissionStatusTypeID: 200 }, { _id: { $type: -1 } }, { $and: [ { SubmissionStatusTypeID: { $ne: null } }, { _id: { $type: -1 } }, { SubmissionStatusTypeID: null } ] } ] }, { SubmissionStatusTypeID: { $ne: null } }, { SubmissionStatusTypeID: { $ne: 1010 } }, { $or: [ { StatusTypeID: { $ne: 200 } }, { StatusTypeID: { $in: [ -1 ] }, _id: { $type: -1 } } ] }, { Connections: { $elemMatch: { $or: [ { ConnectionType.ID: { $in: [ 25, 1036, 33, 28 ] } }, { _id: { $type: -1 } } ] } } } ] }, sort: { ID: -1 }, skip: 0, limit: 501, noCursorTimeout: false, $db: \"ocm_mirror\", lsid: { id: UUID(\"7e18d1b8-9720-4a14-8ce2-274315bed0bd\") } }"
		},
(..)

IMO $within will not use the 2dsphere index. To leverage the 2dsphere index the $geoWithin operator has to be used, see also

https://docs.mongodb.com/manual/reference/operator/query/geoWithin/

For my projects I was able to get a way better query performance using the "2d" index (instead of the 2dsphere), btw.

Unfortunately I am a Swift Developer and I do not have any .NET expertise :(

@webprofusion-chrisc
Copy link
Member

Thanks for investigating! We use the mongodb SDK (possibly an outdated version) to construct our queries so I'd need to investigate more as to how to optimize that:

poiList = poiList.Where(q => Query.WithinPolygon("SpatialPosition", pointList).Inject());

@webprofusion-chrisc
Copy link
Member

It's also worth noting that on my machine the actual query time for the mongodb part takes an average of 42ms for a bounding box query with 88 results, I think the real bottleneck is probably elsewhere when we are marshalling the output into the format required for the API output. For instance, choosing 'compact=true&verbose=false' requires postprocessing of the db results so we can strip out unwanted parts of the object model, so it transfers less data but takes a little longer (not much though).

Our mirrors are the primary servers for API reads (a couple of fairly low spec linux servers proxied via cloudflare workers) and they are susceptible to being flooded by requests, which is one of the reasons why we'd rather people run their own mirrors if they need specific query performance. Any optimisations are appreciated though.

@remuslazar
Copy link
Contributor Author

It's also worth noting that on my machine the actual query time for the mongodb part takes an average of 42ms for a bounding box query with 88 results

.. and you did this test against the full POI database (117148 record)?

Here in docker the mongodb query time is by far the major bottleneck. As I have enabled mongodb query logging I was able to see that the query time was roughly about 350ms (on my macbook, in docker). Without the boundingbox param mongo was able to use the index and so the query ran in about 10ms(!).

When not using the bounding box param the response time is way better. I was also playing with the compact and verbose params not being able to get a significant difference.

@webprofusion-chrisc
Copy link
Member

Cool, I'm basing my comments on a small benchmark that we have were we start a .net Stopwatch, perform the query, then stop the stopwatch and record the time.

How do you monitor MongoDB query performance? I'm far from being an expert in MongoDB.

@webprofusion-chrisc
Copy link
Member

In case it's relevant, our mirrors run with 2gb of ram + swap space, less than that quickly runs into query performance issues.

@remuslazar
Copy link
Contributor Author

Cool, I'm basing my comments on a small benchmark that we have were we start a .net Stopwatch, perform the query, then stop the stopwatch and record the time.

and to the benchmark is using the full database with 117148 POIs?

I think to root of the problem is that the query is not using the index at all and doing a table full scan lead to a major performance issue. Using more RAM MongoDB can cache the results, so doing the same query over and over again should deliver it from the query cache and then it should run fast. But on a regular basis the boundingbox param changes all the time as the user e.g. pans the map, so this will not help much.

I suggest to use a 2d index (instead of 2dsphere):

  1. Setup the 2d index:

SpatialPosition.coordinates => 2d

  1. And then use a geoWithin query:
"SpatialPosition.coordinates": { "$geoWithin": { "$box": [ [ SW_LON, SW_LAT ] , [NE_LON, NE_LAT ] ] } },
(..)

SW_LON/LAT, NE_LON,LAT being the coordinates of the bounding box.

And it could be safe to use the $hint operator to tell mongo that it should use this index the the specific query, see

https://docs.mongodb.com/manual/reference/operator/meta/hint/

Unfortunately I do not have a .net environment here nor any .net knowledge so I can not help out with a pull request.. else I will love to..

@webprofusion-chrisc
Copy link
Member

Thanks, yes test are with a recent snapshot of the database and with randomised bounding box coordinates to avoid caching.

You can build form source under linux or mac os using dotnet build - as mentioned we don't directly construct the query so we can't manually specify the geoWithin currently. I agree there is plenty of room for more investigation though.

@remuslazar
Copy link
Contributor Author

@webprofusion-chrisc I will go with the MS VisualStudio, this also being available for Mac, to play around with the current codebase and to test out things.

dotnet build is already working, this is what I am doing inside the docker container. But without a decent IDE it will be quite hard (for me) - so let's see how far I will go!

@remuslazar
Copy link
Contributor Author

@webprofusion-chrisc ok, I was able to get along with the lightweight VS Code. About the .net SDK: for the 3.1 version, the LTS is 3.1.405 but the project configuration uses 3.1.100. Any reason not to use the latest 3.1 LTS release? I did change the version in global.json and it seems to be fine.

Anyway, dotnet restore and build is working ok and I'm also getting some IDE features with VS code. And it is .. fast! :)

@remuslazar
Copy link
Contributor Author

@webprofusion-chrisc some update on that: I was able to run the project locally (on the mac, mongodb in docker) and I also have played with the MongoDB query logic.

Unfortunately the project is using the old "mongocsharpdriver" driver, which basically puts a legacy 1.x compatibility layer on top of MongoDB.Driver:

https://mongodb.github.io/mongo-csharp-driver/2.11/getting_started/installation/

The 1.x legacy does not support GeoWithin and consorts AFAICS :(

Do you see any chance to update the project to use the current MongoDB.Driver driver? The latter being asynchronously this should also bring some performance gain when dealing with parallel requests hitting the same worker.

@webprofusion-chrisc
Copy link
Member

webprofusion-chrisc commented Jan 31, 2021

Thanks, I've actually attempted the upgrade before but had to back out it for reasons that I can't remember - I think it turned into a lot of extra work and testing. We will upgrade eventually though, it's just about devoting free time to do it - I haven't been putting a lot of time into OCM over the last few years in the hope that others would step up a little :)

@remuslazar
Copy link
Contributor Author

@webprofusion-chrisc I understand.. the pain of legacy code..

Nevertheless I was able to fix the issue, see #171. I have already tested it locally and on my own OCM mirror and it works nicely! I was also able to lower the RAM constraints for the mongodb docker container to 512M to save some resources and this works without any performance tradeoffs.

BTW, I did also update https://github.com/ev-freaks/ocm-mirror to use a HAProxy container to add gzip compression. This is also a nice add-on, especially for mobile clients, the gzip compression ratio for the compact json payload being about 10:1 ;)

@webprofusion-chrisc
Copy link
Member

Excellent, thanks I've merged that now.

Regarding the gzip compression we automatically serve using Brotolli or Gzip etc so you shouldn't have to do that. Note that if testing with curl you'll get the uncompressed response unless you specify --compressed

@webprofusion-chrisc
Copy link
Member

So it turns out we do need the original index for geoNear queries as otherwise the distance based queries throw an exception, I've patched that now. Your change (for bounding box) is in production now on our API servers. Thanks again.

@remuslazar
Copy link
Contributor Author

Oh, my bad, I was just focusing on the boundingbox queries and did not test anything else :( I just just looking at 818a211, looks good. MongoDB should support both 2d and 2dsphere indexes simultaneously.

remuslazar added a commit to ev-freaks/ocm-system that referenced this issue Nov 10, 2021
This fixes a regression introduced by bounding box queries.

See openchargemap#170 and related
remuslazar added a commit to ev-freaks/ocm-system that referenced this issue Nov 10, 2021
This fixes a regression introduced by bc61dca.

See openchargemap#170 and related
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants