-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: invalid regex test #456
Comments
@vrozental you can ignore these 'errors', they are indicating that the document had an URL when we were expecting a name. duplicate of pelias/docker#89 and pelias/polylines#216. |
Thank you @missinglink |
I saw this issue come through and confirmed that the record no longer contains this bunk name property. @missinglink - this test seems very useful.. can you point me to where these tests live? |
@stepps00 we actually introduced the regex to catch bad data in OSM but it seems to catch errors in all datasets. The test itself is in |
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
The error appears while running
npm start
in thewhosonfirst
on Ubuntu 18.04:The text was updated successfully, but these errors were encountered: