Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(parser): Filter out URLs before sending to pelias/model
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
- Loading branch information