-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Including city name in forward geocoding text search not working as expected. #107
Comments
Hi @gagandeepsingh1105, the 'administrative hierarchy' (ie. the city/province/country) of each record in Pelias is sourced exclusively from the WhosOnFirst dataset through point-in-polygon lookups at index time. |
I believe this is a duplicate of #74 |
I'm not against adding this option to custom builds, the issue is that currently all administrative regions are composed of a source, id and term (with an optional abbreviation). We could use An autoincrement value could work here but would have the disadvantage that two places in the same area would have differing parent IDs. |
It's possible to have multiple associated 'parents' for a single layer, so for example a record can have multiple 'region' records associated. The issue would be that we only return one (ie. the first one), so it would either need to be decided (or configurable) whether the record from the CSV file was returned, or the WOF one, in the case where both data sources returned a match. |
Hello, I am a developer on the original poster's team. I think this is an issue of how WOF is passed back as the first record returned, or how readily it is searched for a 'fallback' match, if a locality name is present despite a focus on a more granular location. I performed the same two searches in the original post excluding the "sources=custom" filter from the API call and encountered the same behaviour. A search for "283 Prince Philip dr NL" (https://geocoder.alpha.phac.gc.ca/api/search?text="283%20prince%20philip%20dr%20NL") resulted in a match from the custom source with confidence 1.0. However, a search for "283 Prince Philip dr St. John's NL" results in a match from WOF, and seemingly ignores a filter on the address layer type: We'd like to use the custom data source in performing batch forward geocoding, and it is useful to pass an 'address, city, province' search term where the inclusion of the city helps refine the search. As identified in the original issue, this does not appear to be what is happening due to the inclusion of the city name. We understand that WOF is the exclusive source for administrative hierarchy in Pelias, but the inclusion of the place name shouldn't cue the fallback behaviour when an accurate match to the desired layer granularity (street address) is available. In this scenario a street address supplemented by a city name should refine the area for a search, but it seems that it prompts a fallback match instead. It also seems to ignore a layer search filter in the API call when the city name is included, triggering the returned fallback result from WOF. Thank you for your help! |
The You can see that the Placeholder service ran, it found a matching {
"controller:placeholder": [
{
"id": 890456615,
"name": "St. John's",
"placetype": "locality",
"population": 99182,
"lineage": [
{
"country": {
"id": 85633041,
"name": "Canada",
"abbr": "CAN",
"languageDefaulted": false
},
"county": {
"id": 1158869009,
"name": "Division No. 1",
"languageDefaulted": false
},
"locality": {
"id": 890456615,
"name": "St. John's",
"languageDefaulted": false
},
"region": {
"id": 85682123,
"name": "Newfoundland and Labrador",
"abbr": "NL",
"languageDefaulted": false
}
}
],
"geom": {
"bbox": "-52.72931,47.54494,-52.68931,47.58494",
"lat": 47.56494,
"lon": -52.70931
},
"languageDefaulted": false
}
]
} Then when the Elasticsearch query is run, the ID of the locality matched above is added as a Filter condition (ie. mandatory condition): {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"terms": {
"parent.locality_id": [
"890456615"
]
}
}
],
"must": [
{
"terms": {
"layer": [
"address"
]
}
}
]
}
}
} Of course this results in 0 hits: {
"controller:search": {
"queryType": {
"address_search_using_ids": {
"es_took": 36,
"response_time": 42,
"retries": 0,
"es_hits": 0,
"es_result_count": 0
}
}
}
} At this point there are zero matches, I forget the exact workflow here but I believe it falls back to a legacy search method which was more lenient. I don't like that the request specifies only |
The geometry of 890456615 St. John's is of type |
Maybe for your usecase you can disable the Placeholder service, or possibly don't add any data to it? |
@the-epeecurean are there better open geo data for that region? the only one I can find is points only, does the CA govt publish something better than this? https://opendata.gov.nl.ca/public/opendata/page/?page-id=datasetdetails&id=265 |
@missinglink There are ... Statistics Canada publishes a hierarchy of delineated boundaries. I've just been evaluating some cherry-picked WOF 'fallback' results we've been seeing in testing. Here's a link to an open REST point for the collected Cartographic Boundary files published by Statistics Canada: And a reference to descriptions of the Cartographic Boundary files made available (at the bottom under "1. Spatial information products"): A polygon for the example cited in the Issue above (St. John's NL) appears at the CSD (census subdivision) and CMA (census metropolitan area) levels. If there is any way that we could help in facilitating this spatial information being included in WOF, please let us know. It would help our usecase greatly to see a broader capture of localities in Canada represented as polygons. |
Adding an issue upstream in Who's On First to help facilitate this work: tl;dr the new 2021 cartographic boundary files from Stats Canada look great and we'd love to import them! |
Hi @missinglink , i am facing a similar issue.
The csv-import jobs ran successfully, and i can see now WOF attributes displayed for most of the address records returned in the api responses. But if i include the city name inside the requested address, i obtain only fallback records based on whosonfirst data (no address records). To be clearer:
Is this behaviour normal ? For example i can see that the returned address record has wof properties concerning only the country (not the city, region, borough, etc.) : Does each CSV record need to be associated explicitly to its city, borough,etc through the parent_json column ? Thank you so much for your help. |
If i compare with the public pelias instance https://pelias.github.io/compare/#/v1/search?text=4+avenue+de+paris+78000+versailles , the address seems returned even if the city is mentioned inside the search string : Regarding my own pelias instance when the address record is returned, it does not have all these wof properties but only the country : When i try this same request that includes the city name From pelias API :
From the placeholder service :
From libpostal:
Thanks again @missinglink |
Hi there,
I am an engineer at Public Health Agency of Canada. We currently have a use case for which we are looking to deploy an instance Pelias Geocoder.
For this use case, we have some custom input data(a csv file) of Canada locations only and we want to use Pelias Geocoder's forward geocoding to convert the text address to longitudes and latitudes.
And for this reason we are trying to deploy csv-importer. Below is the snapshot of input data that we have ingested into our elastic search instance:
While using forward geocoding if we supply street number, street name and province , then the api returns the response with confidence level =1 and source =custom:
Api request: https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr nl"&sources=custom
But if we also include the city name in the input text, then the confidence level drops to 0.6 and the match type changes to fall back. As you may have already noted that we do have a column named 'city' in our input data but somehow csv-importer is not able to read it and falls back to whosonfirst data source.
We have tried a couple of things at our end to resolve this issue:
Snapshot of pelias.json file:
"csv": {
"datapath": "/data/csv-importer-files",
"files": ["NLFD_test_changed.csv"],
"docs": [
{
"name": "LAT",
"type": "number",
"required": true
},
{
"name": "LON",
"type": "number",
"required": true
},
{
"name": "SOURCE",
"type": "number",
"required": true
},
{
"name": "LAYER",
"type": "number",
"required": true
},
{
"name": "NUMBER",
"type": "string",
"required": false,
"es_field": "address.number"
},
{
"name": "STREET",
"type": "string",
"required": false,
"es_field": "address.street"
},
{
"name": "CITY",
"type": "string",
"required": false,
"es_field": "address.city"
},
{
"name": "NAME",
"type": "string",
"required": false,
"es_field": "address.name"
},
{
"name": "MAIL_PROV_ABVN",
"type": "string",
"required": false,
"es_field": "address.region"
},
{
"name": "POSTALCODE",
"type": "string",
"required": false,
"es_field": "address.postalcode"
}
],
"download": []
}
Snapshot of pelias.json file
{
"imports": {
"csv": {
"datapath": "/data",
"files": [
"canada-locations.csv"
],
"mappings": "/code/csv_mapping.json"
}
}
}
and then defined the column mappings in a separate file:
{
"mappings": {
"id": "id",
"latitude": "latitude",
"longitude": "longitude",
"number": "house_number",
"street": "street",
"city": "city",
"region": "region",
"province": "province",
"country": "country",
"postalcode": "postalcode",
"category": "category",
"name": "name",
"layer": "address"
}
}
Steps to Reproduce
https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr nl"&sources=custom
https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr st john's nl"&sources=custom
Expected behavior
Including city name in the search text should also give confidence=1 and source=custom
Environment (please complete the following information):
We are currently running an instance of Pelias Geocoder on a kubernetes cluster on Google Cloud Platform
Please do let us know in case you require any additional information to debug this issue.
Thanks in advance.
The text was updated successfully, but these errors were encountered: