Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localities missing from GB import #508

Closed
tomtaylor opened this issue May 6, 2020 · 13 comments
Closed

Localities missing from GB import #508

tomtaylor opened this issue May 6, 2020 · 13 comments
Labels

Comments

@tomtaylor
Copy link

tomtaylor commented May 6, 2020

I've just set up a test Pelias installation locally using Docker. I'm using this pelias.json to load in the whole of the UK, and running the following commands:

pelias download all
pelias prepare all
pelias import wof

(I don't need street/address geocoding.)

Most of the places I'd expect to be present have loaded in fine, but I'm missing some places that should be present. For example: Nuneaton, Huddersfield, Sittingbourne. They all exist in my local WOF sqlite database, but aren't present in the ElasticSearch index. They work fine on the geocode.earth online tool.

Take Huddersfield. It's not in the ElasticSearch index:

> curl -I http://localhost:9200/pelias/_doc/whosonfirst:locality:101750573
HTTP/1.1 404 Not Found
content-type: application/json; charset=UTF-8
content-length: 87

While a sibling locality, Holme Valley, has loaded just fine:

> curl -I http://localhost:9200/pelias/_doc/whosonfirst:locality:1360754629
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 943

I've run the pelias import wof multiple times, with no errors produced. And I've tried to flush the index too.

Is there a way of debugging why they might not be getting loaded?

@tomtaylor tomtaylor added the bug label May 6, 2020
@tomtaylor
Copy link
Author

OK, so I've inspected the geojson in the sqlite database more thoroughly and it looks like the places that haven't been imported have is_alt set true:

# Holme Valley, works fine
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 1360754629;
1360754629|quattroshapes|0
# Huddersfield, doesn't
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 101750573;
101750573|quattroshapes|1
101750573|quattroshapes_pg|1
# Nuneaton, doesn't
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 101750471;
101750471|quattroshapes_pg|1
101750471|whosonfirst|1
# Sittingbourne, doesn't
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 101853501;
101853501|quattroshapes|1
101853501|quattroshapes_pg|1
101853501|whosonfirst|1
# Hackney, works fine
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 1158857273;
1158857273|gbr-datalondon|0

It looks like this is expected behaviour with the whosonfirst importer.

I'm now thinking this might be an issue with how the SQLite distribution is generated... @missinglink it looks like you might be working on something related?

@missinglink
Copy link
Member

Sounds a lot like the bug I fixed yesterday.
pelias/wof#13

Please try downloading the SQLite database again and checking the same IDs, you should find exactly one record with is_alt=0 per ID.

@tomtaylor
Copy link
Author

tomtaylor commented May 6, 2020

Thanks @missinglink - I don't think whosonfirst-data-admin-gb-latest.db.bz2 has updated yet. I still get the same results with the new file. Is this still rolling out or did something go awry?

@missinglink
Copy link
Member

Can you please post a shasum of the database file and paste a query that shows no is_alt=0, I'll have a look tomorrow.

@missinglink
Copy link
Member

@tomtaylor
Copy link
Author

Sure thing, thanks for that.

> sqlite shasum -a 256 whosonfirst-data-admin-gb-latest.db.bz2 
044dc0e263647a487dc192740f7619ee1536c8cf3f8c927a1d7f09e862cb0c09  whosonfirst-data-admin-gb-latest.db.bz2
>  sqlite shasum -a 256 whosonfirst-data-admin-gb-latest.db
d6a43a27bc6fd6412400d3b679e5c1a417b58fd7fc59a9cf14c05531c00c992b  whosonfirst-data-admin-gb-latest.db
>  sqlite sqlite3 whosonfirst-data-admin-gb-latest.db
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> SELECT id, source, is_alt FROM geojson WHERE id = 101750573;
101750573|quattroshapes|1
101750573|quattroshapes_pg|1
sqlite> 

@missinglink
Copy link
Member

missinglink commented May 7, 2020

Fix merged in pelias/wof#16, data files are being regenerated by @pelias-bot

@tomtaylor
Copy link
Author

Great, thank you!

@missinglink
Copy link
Member

shasum -a 256 whosonfirst-data-admin-gb-latest.db
14d758d982e0d2661563ce761fc7d079df981a4eee1cf11d694fa28dbebf4e69  whosonfirst-data-admin-gb-latest.db
sqlite3 whosonfirst-data-admin-gb-latest.db 'SELECT id, source, alt_label, is_alt FROM geojson WHERE id = 101750573;'
101750573|quattroshapes||0
101750573|quattroshapes|quattroshapes|1
101750573|quattroshapes_pg|quattroshapes_pg|1

looks like it was fixed, the files are generated alphabetically and it's up to 'H' so they'll all get uploaded in the next couple of hours.

@missinglink
Copy link
Member

curl 'https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-gb-latest.db.bz2' | lbunzip2 | tee >(shasum -a 256) > whosonfirst-data-admin-gb-latest.db
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  367M  100  367M    0     0  13.4M      0  0:00:27  0:00:27 --:--:-- 15.8M

14d758d982e0d2661563ce761fc7d079df981a4eee1cf11d694fa28dbebf4e69  -
sqlite3 whosonfirst-data-admin-gb-latest.db 'SELECT id, source, alt_label, is_alt FROM geojson WHERE id = 101750573;'
101750573|quattroshapes||0
101750573|quattroshapes|quattroshapes|1
101750573|quattroshapes_pg|quattroshapes_pg|1

@missinglink
Copy link
Member

missinglink commented May 7, 2020

Thanks for the bug report, the store.sqlite3.gz file we are hosting will also need regeneration so I'll kick that off now, it takes hours to complete.

If the problem is solved for you please close the github issue.
FYI we just recently started an OpenCollective, we are hoping to use the funds to hire someone part time to keep the community assets/code up-to-date.

@missinglink
Copy link
Member

This issue should now be resolved?

@missinglink
Copy link
Member

Please reopen if you find it's not fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants