Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL ERROR: 74 regression(s) detected. #192

Closed
getorca opened this issue Apr 25, 2020 · 14 comments
Closed

FATAL ERROR: 74 regression(s) detected. #192

getorca opened this issue Apr 25, 2020 · 14 comments
Labels
bug Something isn't working

Comments

@getorca
Copy link

getorca commented Apr 25, 2020

Describe the bug
After installing a fresh north american build I get the error FATAL ERROR: 74 regression(s) detected.

A number of the errors seem to be related to canada, at least the ones that don't just look like changes in names:

  ✘ regression [2] "/v1/search?sources=wof&text=Canada": score 1 out of 4
  diff:
    layer
      expected: country
      actual:   locality
    country
      expected: Canada
      actual:   Mexico
    country_a
      expected: CAN
      actual:   MEX
  ✘ regression [4] "/v1/search?text=22 Lloyd George Ave, Toronto Ontario CA": score 0 out of 5
  diff:
    layer
      expected: address
      actual:   county
    country_a
      expected: CAN
      actual:   USA
    locality
      expected: Toronto
      actual:   
    street
      expected: Lloyd George Ave
      actual:   
    housenumber
      expected: 22
      actual:   

Lib postal seems to be parsing things correctly

            "parsed_text": {
                "number": "22",
                "street": "lloyd george ave",
                "city": "toronto",
                "state": "ontario",
                "country": "CAN"
            }

But whosonfirst feature collection is wrong

 "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -97.471799,
                    31.05239
                ]
            },
            "properties": {
                "id": "102087307",
                "gid": "whosonfirst:county:102087307",
                "layer": "county",
                "source": "whosonfirst",
                "source_id": "102087307",
                "name": "Bell County",
                "confidence": 0.4,
                "match_type": "fallback",
                "accuracy": "centroid",
                "country": "United States",
                "country_gid": "whosonfirst:country:85633793",
                "country_a": "USA",
                "region": "Texas",
                "region_gid": "whosonfirst:region:85688753",
                "region_a": "TX",
                "county": "Bell County",
                "county_gid": "whosonfirst:county:102087307",
                "county_a": "BL",
                "label": "Bell County, TX, USA"
            }

This seems to be the case for all of Canada. Canada is never returned as country or the wrong feature collection is returned for a different country.

Steps to Reproduce
Install a fresh docker build of north america.

Expected behavior
Canada being returned as the country.

Environment (please complete the following information):

  • OS: Ubuntu 18.04
  • Docker versions: Docker version 19.03.8, docker-compose version 1.22.0
@getorca getorca added the bug Something isn't working label Apr 25, 2020
@getorca
Copy link
Author

getorca commented May 1, 2020

I can confirm there is the same issue / bug on the planet build.

@missinglink
Copy link
Member

Can you please post the complete output of the test suite as a gist or a pastebin including any errors?

@missinglink
Copy link
Member

There was a major change to the WOF data downloads
https://dist.whosonfirst.org/

The old ones were very old (>1yr), so this error could be due to:

  • more up-to-date data
  • Incorrect data (please check the sqlite database for error)
  • Incorrect configuration
  • Too much data for the PIP service (please check for errors)

Are there any other errors in the docker logs pelias compose logs which may be relevant?

Does this look to be isolated to Canada to you?

@getorca
Copy link
Author

getorca commented May 1, 2020

here is the pastebin for the test suite for a planet build:

https://pastebin.com/G60472XC

@getorca
Copy link
Author

getorca commented May 1, 2020

There was a major change to the WOF data downloads
https://dist.whosonfirst.org/

I'm using the latest version from several days ago that uses geocode.earth

Are there any other errors in the docker logs pelias compose logs which may be relevant?

Possibly, I'm not sure how to debug or resolve the following:

Example 1:
query:
/v1/search?sources=wof&text=Canada
logs:

placeholder_1    | took: 1.308ms
placeholder_1    | parent not found! locality_id -1
placeholder_1    | parent not found! continent_id -1
placeholder_1    | parent not found! country_id -1
placeholder_1    | parent not found! region_id -1
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/search?text=Canada&lang=eng HTTP/1.1 200 51606 - 51.811 ms
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/findbyid?ids=102191575%2C85632713%2C890455299%2C1360274397%2C85670669%2C85632323%2C421203405%2C1360154343%2C85671905%2C1360103365%2C85633293%2C102074151%2C1343941459%2C85686527%2C102078629%2C1343640541%2C85686665%2C102191581%2C85633735%2C1511678941%2C1343550883%2C85687349%2C1511678869%2C1327024295%2C85633129%2C404329281%2C1326861795%2C404227381%2C85682625%2C85633793%2C102083105%2C404522459%2C1326841517%2C85688543%2C102191577%2C85633009%2C102053019%2C1326716699%2C1511777415%2C85681931&lang=eng HTTP/1.1 200 49855 - 6.436 ms

Example 2:
query:
/v1/search?text=22 Lloyd George Ave, Toronto Ontario CA
logs:

placeholder_1    | took: 23.761ms
placeholder_1    | parent not found! country_id 85633111
placeholder_1    | parent not found! region_id 85679209
placeholder_1    | parent not found! locality_id -1
placeholder_1    | parent not found! region_id 85672037
placeholder_1    | parent not found! region_id 85672037
placeholder_1    | parent not found! region_id 85679391
placeholder_1    | parent not found! region_id 85679299
placeholder_1    | parent not found! region_id 85675561
placeholder_1    | parent not found! region_id 85672229
placeholder_1    | parent not found! region_id 85679201
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/search?text=toronto%20ontario%20CAN&lang=eng HTTP/1.1 200 33675 - 57.119 ms
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/findbyid?ids=102191581%2C85633147%2C102072139%2C136253037%2C404382871%2C101750149%2C404227825%2C1108826393%2C85683579%2C102191575%2C85633793%2C102087307%2C85688753%2C102191573%2C85632735%2C1108564375%2C1293330729%2C85675451%2C102191569%2C85632393%2C890461697%2C85632739%2C1108696521%2C85675177%2C85632203%2C102073305%2C1360169861%2C85672027%2C85632287%2C1108785237%2C1344175169%2C85671591%2C1108709601%2C1344101021%2C85632541%2C1092013489%2C1343955595%2C85670063%2C85632245%2C1092012045%2C1343639693%2C85669945&lang=eng HTTP/1.1 200 20791 - 11.664 ms

Does this look to be isolated to Canada to you?

On the North America build it definitely seemed to be related to Canada, the country isn't returned, also the same case for the planet build, maybe related to above.

@getorca
Copy link
Author

getorca commented May 1, 2020

placeholder_1    | ------------------------------------------------------
placeholder_1    | Database schema is out-of-date!
placeholder_1    | Your database files do not match the expected schema.
placeholder_1    | Please follow instructions in the README to obtain new database files.
placeholder_1    | This is the expected behaviour for breaking schema updates.
placeholder_1    | more info: https://github.com/pelias/placeholder
placeholder_1    | ------------------------------------------------------

Strange this is a new build. trying to run pelias prepare placeholder to see if that fixes it.

So looking at the code for pelias prepare placeholder it looks like it's building store.sqlite3 from wof.extract, where as the documentation, https://github.com/pelias/placeholder, tells you to download store.sqlite3 from https://data.geocode.earth/placeholder/store.sqlite3.gz. Am I correct to assume the docker prepare placeholder hasn't been updated, or the wof.extract or sql importer isn't correct for the new schema, and that's why the DB schema out of date error is happening on new builds?

@getorca
Copy link
Author

getorca commented May 1, 2020

Interesting downloading the store.sqlite3 file from geocode.earth seems to have solved some of the issues, but still getting 141 regressions errors vs 140 for planet, here's the latest pastebin, https://pastebin.com/SGhPXW75

It's solved the original Canada issue I was seeing, and the following test are much better:

  • Who's on First regions
  • Who's on First neighbourhoods
  • Who's on First countries

Is the correct order to download, prepare and import each source documented? I believe the order matters as some depend on others?

@missinglink
Copy link
Member

@getorca thanks for the detailed report, I've managed to find the source of the Canada issue and opened a PR to resolve it:
pelias/wof#13

I'll kick off a rebuild of all the Geocode Earth data downloads and they should be available in <1 day

@missinglink
Copy link
Member

missinglink commented May 2, 2020

@getorca I've republished the data, please re-download and try again:

curl -O https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-ca-latest.db.bz2

lbunzip2 whosonfirst-data-admin-ca-latest.db.bz2

shasum -a256 whosonfirst-data-admin-ca-latest.db
b8bdb2a618927e921cb43debcabff1958e10afb438bdfb64af4e13138afc7bc3

sqlite3 whosonfirst-data-admin-ca-latest.db 'SELECT id, source, is_alt FROM geojson WHERE id = 85633041'
85633041|whosonfirst|0                               <--- this is the row which was missing
85633041|naturalearth-display-terrestrial-zoom6|1
85633041|naturalearth|1
85633041|quattroshapes|1
85633041|whosonfirst-reversegeo|1

@getorca
Copy link
Author

getorca commented May 2, 2020

thanks for your hard work, and helping me find the errors. It can be a bit intimidating with the long build times and number of sources/importers to debug. Trying now, will report back

@getorca
Copy link
Author

getorca commented May 4, 2020

Yup, solves the Canada issue.

@missinglink
Copy link
Member

There was unfortunately another bug so I had to regenerate all the data again, should all be 💯 again now.

If the problem is solved for you please close the github issue.
FYI we just recently started an OpenCollective, we are hoping to use the funds to hire someone part time to keep the community assets/code up-to-date.

@getorca
Copy link
Author

getorca commented May 8, 2020

There was unfortunately another bug so I had to regenerate all the data again, should all be again now.

If the problem is solved for you please close the github issue.
FYI we just recently started an OpenCollective, we are hoping to use the funds to hire someone part time to keep the community assets/code up-to-date.

Yup, I will let you know ASAP, it might not be until later today or tomorrow. I destroyed the droplets I had the planet, and the NA build on, and I want to let my current import finish running.

@missinglink
Copy link
Member

I'm going to close this issue due to inactivity.

I suspect that there may be some minor regressions remaining to the test suite related to the change of hosts for WOF dist files but I believe the critical errors have been resolved.

Thanks for your help in detecting the bugs.

Please feel free to open a new issue or let me know if this needs to be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants