-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not import both whosonfirst-data-latest.db
and whosonfirst-data-admin-$$-latest.db
#469
Comments
Hi @slvlirnoff, So one fix would be to fix the bug and ensure only the global sqlite db is ever downloaded. I know the WOF team is working on a process to produce up to date global sqlite db, so it will be up to date again soon. This approach would probably be the easiest and require changing the least amount of Pelias code. The other approach would be to start using the country sqlite dbs, since they are more up to date and many people have requested a way to import data for only specific countries without having to download the whole planet. We should probably make the easy fix i mentioned first, to get things working as expected, before embarking on the longer term job of incorporating the country specific dbs. |
My 2c would be for option 2 since it is cleaner and more versatile and doesn't rely on WOF to complete their planet db files again. Looks like bundleList.js:getDBList simply needs to filter out the "latest" file, and an additional filter could be added there to specify specific country codes from the config file. |
👍 I would also love it if we could support the new country-specific sqlite database (and also the planet database) indifferently. Julian and I have a full calendar of work up till the end of the year but I'd be happy to support the work (through review and discussion) if someone else were to author it. |
Actually yes, I take it back. Moving to support the country DBs might be better. I didn't realize data from them is actually imported. We would have to do some work for the PIP service, since it's currently hardcoded to use the global sqlite DB. On the other hand, the spatial service is on the horizon. |
Okay, I've been taking a look at SQLite based builds so that we can ensure Pelias works out of the box with up to date WOF data, avoiding issues such as those in pelias/docker#141. At this point I'm pretty convinced we need to do both of the following things:
If anyone has any thoughts on this, let me know. Looking especially to hear from @missinglink and @Joxit |
It's no secret that I'm a big SQLite fan 😄 so I would like to:
This will clean up a lot of the code which has been in place since the inception of the WOF project and during its early development. Great let's do it!
What's the backstory?The original intention was to continue using the full planet file (the only one available at the time) and switch over to During this time the WOF team introduced country-specific SQLite distributions in order to address emails from Github which forced them to reduce the size of the main At this time there were large changes required by the WOF team to split the data into multiple repos, a bunch of changes were made to the bundling code which unfortunately resulted in errors and omissions still being investigated and resolved today. What's the current state of play?We've been chatting with the WOF team throughout and I've highlighted some issues with the per-country bundles. The planet-wide bundle also hasn't been updated since There are two different full-planet SQLite databases kicking around for testing, one produced by me by merging all the per-country repos together and one by Stephen Epps which is produced directly from the filesystem without any merging. My one isn't usable due to the 'per-country' repo bug linked above and @orangejulius has noticed some issues with the one Stephen produced (Julian could you please add more info so I can confirm these?). Either way, there is currently no planet-wide SQLite database available from https://dist.whosonfirst.org which contains data since May 2019. That one also contains some errors (particularly AU state abbreviation issues) which have since been resolved. So what can we do about it?Honestly, not a lot right now. For Geocode Earth we are running older, trusted data and applying patches where necessary. I imagine others are doing this too and it's really not anyone's first choice as we'd like to be pulling in all the changes from the last 6 months rather than having to manually curate our own 'fork' of the canonical WOF data. If the SQLite distributions were all working again what could we do?If those issues are resolved we could switch over to using As Julian noted we have not done any work to support country-specific distributions AFAIK, so this would need to happen. Part of that work will require some understanding of how the country-specific distro files work, specifically the One potential issue to consider is that the default compile-time options for SQLite limits the total amount of attached databases (SQLITE_MAX_ATTACHED) to This may or may not be an issue, it depends on how we implement it, if we're doing a streaming import then it can be avoided by only having one database open at a time, but if we wanted to do a complex join operation or resolve an inter-country hierarchy traversal query then this would be a constraint. I think that's the glorious future of WOF support in Pelias, but requires a few data fixes and a few code updates to get us there, if we can achieve that then it will come with the benefit that building smaller 'region-specific' Pelias builds (such as the |
I'm going to do some work to move this forward:
|
For Jawg Maps, I'm building the SQLite from GitHub repositories since December 2018 (because of whosonfirst/go-whosonfirst-dist#9). I thought using their bundles was too risky, so I chose to make mine from their sources. Now I'm updating my SQLite every 2-3 months.
Definitely agree, doing a full planet import with the per country SQLite can be complex. Hum... Maybe not as complex as I thought, #471 highlights the fact that the placetype order is important. Only The per country database will not be an issue for pelias/wof-admin-lookup. |
The country-specific SQLite DBs currently available for download from https://dist.whosonfirst.org have a few integrity issues. This causes problems in Pelias such as addresses in NYC or Berlin not having proper admin information. The planet-wide SQLite DB, while a little out of date, is overall more valid. While in the long term we probably want to utilize the country-specific DBs, for now it makes sense to filter them out. Fixes #469 Connects #460
The country-specific SQLite DBs currently available for download from https://dist.whosonfirst.org have a few integrity issues. This causes problems in Pelias such as addresses in NYC or Berlin not having proper admin information. The planet-wide SQLite DB, while a little out of date, is overall more valid. While in the long term we probably want to utilize the country-specific DBs, for now it makes sense to filter them out. Fixes #469 Connects #460 Fixes #437
Geocode Earth are now providing distribution files at https://geocode.earth/data/whosonfirst |
From this page it seems that https://dist.whosonfirst.org/sqlite/
whosonfirst-data-latest.db
hasn't been updated since May while manywhosonfirst-data-admin-$$-latest.db
have been updated several times.Is there some information within
whosonfirst-data-latest.db
that is not within all ofwhosonfirst-data-admin-$$-latest.db
?From https://github.com/whosonfirst-data/whosonfirst-data repository README:
With
pelias/docker
and for the whole planet usingsqlite: true
the scripts imports both data-latest and the various data-admin per country (using inventory.json).This seems to cause an issue for Switzerland for instance: many layers have been fixed recently (localities/localadmin whosonfirst-data/whosonfirst-data-admin-ch#4), since the importation order of the file is:
whosonfirst-data-admin-$$-latest.db
(includes the fix)whosonfirst-data-latest.db
. (older records, don't include the fix)The old records are replacing the updated ones since they mostly share the same whosonfirst ids.
Related: #437
Also probably related: pelias/pelias#811
Best,
Cyprien
The text was updated successfully, but these errors were encountered: