Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: invalid regex test #456

Closed
vrozental opened this issue Jun 30, 2019 · 4 comments · Fixed by pelias/polylines#225
Closed

Error: invalid regex test #456

vrozental opened this issue Jun 30, 2019 · 4 comments · Fixed by pelias/polylines#225

Comments

@vrozental
Copy link

The error appears while running npm start in the whosonfirst on Ubuntu 18.04:

2019-06-30T21:53:33.579Z - error: [whosonfirst] doc generator error: invalid regex test, http://en.wikipedia.org/wiki/Narsdorf should not match /https?:\/\//       
2019-06-30T21:53:33.579Z - error: [whosonfirst] {                                                                                                                                               "id": 1125331493,                                                                                                                                                 
  "name": "Narsdorf",                                                                                                                                                                           "name_aliases": [],                                                                                                                                              
  "name_langs": {                                                                                                                                                                                 "li": [                                                                                                                                                         
      "http://en.wikipedia.org/wiki/Narsdorf"                                                                                                                                                     ]                                                                                                                                                               
  },                                                                                                                                                      
  "place_type": "localadmin",                                                                                                                               
  "lat": 51.0167,                                                                                                                                               
  "lon": 12.7167,                                                                                                                                                                               "bounding_box": "12.7167,51.0167,12.7167,51.0167",                                                                                                                                            "population": 1707, 
  "hierarchies": [                                                                                                                                                                                {                                                                                                                                                                                               "continent_id": 102191581,
      "country_id": 85633111,                                                                                                                                                                       "county_id": 102064227,                                                                                                                                                                       "localadmin_id": 1125331493,
      "region_id": 85682523                                   
    }                                                                                                                                                                                           ]                                                                                                                                                                                           }
@missinglink
Copy link
Member

@vrozental you can ignore these 'errors', they are indicating that the document had an URL when we were expecting a name.
The actual error here is that these warnings are too verbose in the logging which makes people assume there was an import error when there wasn't.

duplicate of pelias/docker#89 and pelias/polylines#216.

@vrozental
Copy link
Author

Thank you @missinglink

@stepps00
Copy link

stepps00 commented Jul 1, 2019

I saw this issue come through and confirmed that the record no longer contains this bunk name property.

@missinglink - this test seems very useful.. can you point me to where these tests live?

@missinglink
Copy link
Member

@stepps00 we actually introduced the regex to catch bad data in OSM but it seems to catch errors in all datasets.

The test itself is in pelias/model, it's a simple regex test /https?:\/\// to check a 'name' property doesn't start with http(s)://

orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants