-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract data via WOF SQLite database #157
Conversation
Hey @Joxit sorry I took a while to get to this. It's intended to be a replacement for https://github.com/pelias/placeholder/blob/master/cmd/wof_extract.sh but the two scripts have different functionality. The existing shell script The nice thing about this is that it can be broken up into two steps, one to create the 'extract' and one to import the data into placeholder. The node script in this PR performs both of those actions in one step. One other thing I was wondering, is if it would be possible to combine https://github.com/pelias/placeholder/blob/master/cmd/placetype.filter and https://github.com/pelias/placeholder/pull/157/files#diff-2a0ae160a26801a4054b1d47da4defddR9 into a single file so it gets edited in one place. |
Hum, yes, that make sense and this will save the current workflow. So the script will output records on stdout and will use the |
It made a big difference when working with the filesystem because the 'extract' file took about an hour to generate. Any idea how long it takes to extract that data from the sqlite db? |
I just tried a world extraction, it ended with an error (heap out of memory need |
I added one commit. |
I tried a world extraction + placeholder sqlite build with |
I tested this out the other day, and it worked great! I think it took around 25 minutes to build the extract and sqlite DB. That's pretty fast! Is there anything that needs to be done before we can merge this? The PR has been sitting for a while, so I imagine there's at least a little cleanup. |
Yeah! That's cool, I think there is only |
Right. I just set it to 8000 on all the other services, based on the latest SQLite file from WOF, so that would be a good value. |
dc47406
to
642d851
Compare
okay, memory limit updated and rebased to master :) |
🚀 |
Hi @Joxit, |
The package |
Yeah, if you have the time, that would be great. |
back story... originally I wanted to keep repos like turns out that didn't really work and makes them inconsistent with the other repos and harder to deploy 🤷♂. |
Okay, but I think the deprecation of |
Quick update with 54f99fe, this adds the supports for For a full |
I've tested this out pretty extensively, and it appears to create very nice extracts based on the SQLite DB. This does create a requirement to set an extra ENV var (
|
In fact, I used the same variables as the current placeholder/cmd/wof_extract.sh Lines 7 to 9 in 8e9d311
Yes, of course you can add some extra commit 😉 |
Oh, weird. I wonder how it worked before then... |
Hum... You're right, this is an excellent question, with the current pelias/docker configuration, the extract should not work 😅 |
Okay, I think it comes from these lines in the Lines 19 to 20 in 8e9d311
Additionally, there must be something about the way we are calling into |
Nice catch. |
54f99fe
to
dabfb35
Compare
dabfb35
to
c1eb859
Compare
So I was hoping to merge this today but discovered a snag: if we merge this PR, then non-full-planet projects with This is because the current flow for a non-full-planet build is as follows:
We've discussed a shared place to put WOF utilities in the past, and this seems like another case that needs to utilize that. The whosonfirst importer is already an ad-hoc start to this, but we'd really like to avoid cementing more dependencies on the importer, and rather extract shared stuff to a library. @missinglink, is github.com/pelias/wof the start of this? @Joxit do you have other ideas on how to preserve all the functionality we want? |
Hum... There is the function If we want a compatibility for "by country repositories", we should ensure that each id is at least in one of the targeted repositories. For example, France is both an Empire (136253037) in repo XY and a Country (85633147) in repo FR. So the placeholder/cmd/wof_extract_sqlite.js Lines 42 to 45 in c1eb859
|
We need to resolve the issue that all per-country downloads must also pull the |
I guess what I'm saying is that we need to resolve the download issues before we can really tackle the ingestion ones. |
Okay, upon further thought, I think we actually are okay to merge this! The only catch is we can't yet default I think it's ok to merge this code now, then bring in shared code from some whosonfirst tooling to allow Placeholder to read from and filter a SQLite DB directly. Finally, after that, we can worry about combining multiple country-specific SQLite DBs. My guess is there will be a bit of work there, and in the meantime the priority is that people can use the SQLite downloads at all. Let me know if that seems to make sense for everyone else or if I'm missing something. |
Heya, I had a need to for this code today, it sounds like you two @Joxit and @orangejulius are both ok to merge it, so can we ? |
Erp, one comment, the string |
Hey, I can do an update to be more like pelias/wof-admin-lookup#289 |
You know what? since this PR first landed a lot has changed regarding WOF, and now the SQLite bundles are up-to-date and reliable there seems less of a need to provide backwards support for bundles. I'd love to delete the old code which reads from the filesystem but I guess we can't do that yet because it'd break For my purposes I found that it's much easy to accomplish this task now using the #!/bin/bash
set -euxo pipefail
# ensure dependencies are installed
export DEBIAN_FRONTEND=noninteractive
apt-get update -y
apt-get install -y jq pigz
jq --version
# clone & compile wof tools
cd /code
git clone https://github.com/pelias/placeholder.git
# filter records by placetype
# note: the placetype list was lifted from cmd/wof_extract.sh within pelias/placeholder
read -r -d '' SQL <<'EOF'
SELECT geojson.body
FROM geojson
WHERE geojson.is_alt = 0
AND geojson.id IN (
SELECT DISTINCT spr.id
FROM spr
WHERE (
spr.is_deprecated = 0 AND
spr.is_superseded = 0
)
AND spr.placetype IN (
'ocean',
'continent',
'marinearea',
'empire',
'country',
'dependency',
'disputed',
'macroregion',
'region',
'macrocounty',
'county',
'localadmin',
'locality',
'borough',
'macrohood',
'neighbourhood'
)
);
EOF
# generate placeholder extract
wof sqlite export --sql="$SQL" "${sqlite_path}" |\
jq -c -M -f '/code/placeholder/cmd/jq.filter' |\
pigz > ${placeholder_extract_path} |
This is an example of terrible community management on our part, where the github comments are more lines than the code 🤣 sorry about that! Have a look at the |
c1eb859
to
9495956
Compare
Oh, cool. I did the change in placeholder anyway 😅 |
Okay, after pelias/whosonfirst#487 we will definitely will need this. Looks like you integrated my changes from #169, so this works great as is. If you feel compelled to do a bunch of cleanup (we don't even need to check |
a200fea
to
9e453ab
Compare
Now it's ok, I dropped bundle support and sync the branch 👍 |
I tried this out today with
|
Oh yes you're right, since bundle are no longer supported, the default value here can be Lines 17 to 20 in 0250dae
Or should we add (You can pull the new docker image) |
704ee7d
to
da649a6
Compare
BREAKING CHANGE: this drops support for Who's on First bundles. Going forward, only SQLite distributions are supported, and Placeholder will now behave as if `imports.whosonfirst.sqlite` is set to true, regardless of its actual value.
502b722
to
269d6bc
Compare
This upates `wof_extract_sqlite` to support the `imports.whosonfirst.importPlace` property, using similar logic to `wof-admin-lookup`. When `importPlace` specifies a small area, like a city, this speeds up the import process considerably!
269d6bc
to
2fcd7df
Compare
Okay, I've also taken over this PR, rebased it a bunch and edited the commit messages to that the release notes will be nice and clean :) I also added support for |
This feature is to avoid any use of WOF bundles.
It should also speed up placeholder build.