-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-3081: Add political/id-lookup endpoint #616
Merged
Merged
Changes from 45 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
b94f3fa
Initial attempt at stub route for geoencoder
dmannarino 505cc16
Turn URL params into query params; add boundary version, source;
dmannarino 0514d42
Initial test, not expected to pass
dmannarino 9d2ae13
Disable geoencoder tests to get it to deploy
dmannarino 44a253f
WIP: Run a query instead of canned response
dmannarino 9c2e0ca
WIP: refactor to make better use of helper fcns
dmannarino 899e772
More error handling and tests
dmannarino d633596
Correct region/subregion->name fields
dmannarino 7f5e9f3
Correct case of multiple WHEREs
dmannarino 2f2facd
Update pre-commit packages, disable docformatter until it's fixed
dmannarino 15de81a
Too much, TBH: Add limiting query to specified admin level; enforce s…
dmannarino 40f7772
pipenv finally updated packages; add unidecode for geoencode endpoint
dmannarino 2cee550
Optionally unaccent names in request to geoencode endpoint
dmannarino e07c4f4
Update lockfile for new raterio/numpy
dmannarino bb69f18
Don't pass Nones to unidecode
dmannarino 1b95ca2
Actually search the unaccented columns
dmannarino 79ae7c3
Add output example from ticket as a test, and adjust code to pass
dmannarino 09e628e
Get regular fields, not unaccented ones
dmannarino 2934fd6
Fix bug introduced in last commit: GET name fields, MATCH on (potenti…
dmannarino e09cf01
Add a test for getting from unaccented fields
dmannarino 2d979a2
Hide extraneous fields
dmannarino 0c6d541
Add docstrings, add a test, and slightly improve error message on non…
dmannarino ed5f2cd
Decapitalize as part of normalization; add tests
dmannarino 34c41f8
Return GIDs as relative, not complete values. eg: GID_1=1 instead of …
dmannarino 1952fc0
Minor doc addition
dmannarino c6384fd
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino 4aef63c
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino 0f80b9e
WIP: Move geoencode query params into a model in order to implement a…
dmannarino b23bf7f
Fix resolving version to string
dmannarino 275ff6e
WIP: Temporarily include geoencode route in docs
dmannarino 53ddba5
Fix for last commit: PRepend 'v' to version string, again.
dmannarino 3bc92f5
WIP: Add lookup_admin_source_version helper, duplicating some code
dmannarino fbacd70
Raise ValueErrors instead of AssertionErrors on bad params
dmannarino 9c5fc87
After much pain and gnashing of teeth, get validator working again
dmannarino 849d68a
Add models for Geoencoder responses and children
dmannarino 4668790
Use dolar sign quoting to avoid PostgreSQL complaining about apostrop…
dmannarino 68f6590
Add type hint per Dan's suggestion
dmannarino d35bb02
Re-enable docformatter precommit hook
dmannarino 0bdb3e5
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino a9cf2df
Improve error messages
dmannarino 83aa0a0
Move geoencoder to /political/geoencoder
dmannarino 0effc5b
Break forming Geoencoder response out into a helper
dmannarino 84bf869
Rename geoencoder endpoint to id-lookup
dmannarino af3c244
Set version of GADM 4.1 in various environments
dmannarino 498ecc6
Implement Gary's suggestions to rename from geoencoder -> admin id lo…
dmannarino d9e0cda
Use AdminIDLookupResponseData properly and add a few type hints
dmannarino 55e1a2f
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino ffe9b6b
Use this branch's Pipfile, I need unidecode
dmannarino File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
from typing import List, Optional | ||
|
||
from fastapi.params import Query | ||
from pydantic import Field, root_validator | ||
|
||
from app.models.pydantic.base import StrictBaseModel | ||
from app.models.pydantic.responses import Response | ||
from app.settings.globals import ENV, per_env_admin_boundary_versions | ||
|
||
|
||
class AdminIDLookupQueryParams(StrictBaseModel): | ||
admin_source: str = Field( | ||
"GADM", | ||
description=( | ||
"The source of administrative boundaries to use " | ||
"(currently the only valid choice is 'GADM')." | ||
), | ||
) | ||
admin_version: str = Query( | ||
..., | ||
description=( | ||
"The version of the administrative boundaries to use " | ||
"(note that this represents the release of the source dataset, " | ||
"not the GFW Data API's idea of the version in the database)." | ||
), | ||
) | ||
country: str = Query( | ||
..., | ||
description="Name of the country to match.", | ||
) | ||
region: Optional[str] = Query( | ||
None, | ||
description="Name of the region to match.", | ||
) | ||
subregion: Optional[str] = Query( | ||
None, | ||
description="Name of the subregion to match.", | ||
) | ||
normalize_search: bool = Query( | ||
True, | ||
description=( | ||
"Whether or not to perform a case- and accent-insensitive search." | ||
), | ||
) | ||
|
||
@root_validator(pre=True) | ||
def validate_params(cls, values): | ||
source = values.get("admin_source") | ||
if source is None: | ||
raise ValueError( | ||
"You must provide admin_source or leave unset for the " | ||
"default value of 'GADM'." | ||
) | ||
|
||
version = values.get("admin_version") | ||
if version is None: | ||
raise ValueError("You must provide an admin_version") | ||
|
||
sources_in_this_env = per_env_admin_boundary_versions[ENV] | ||
|
||
versions_of_source_in_this_env = sources_in_this_env.get(source) | ||
if versions_of_source_in_this_env is None: | ||
raise ValueError( | ||
f"Invalid administrative boundary source {source}. Valid " | ||
f"sources in this environment are {[v for v in sources_in_this_env.keys()]}" | ||
) | ||
|
||
deployed_version_in_data_api = versions_of_source_in_this_env.get(version) | ||
if deployed_version_in_data_api is None: | ||
raise ValueError( | ||
f"Invalid version {version} for administrative boundary source " | ||
f"{source}. Valid versions for this source in this environment are " | ||
f"{[v for v in versions_of_source_in_this_env.keys()]}" | ||
) | ||
|
||
return values | ||
|
||
|
||
class AdminIDLookupMatchElement(StrictBaseModel): | ||
id: str | None | ||
name: str | None | ||
|
||
|
||
class AdminIDLookupMatch(StrictBaseModel): | ||
country: AdminIDLookupMatchElement | ||
region: AdminIDLookupMatchElement | ||
subregion: AdminIDLookupMatchElement | ||
|
||
|
||
class AdminIDLookupResponseData(StrictBaseModel): | ||
adminSource: str | ||
adminVersion: str | ||
matches: List[AdminIDLookupMatch] | ||
|
||
|
||
class AdminIDLookupResponse(Response): | ||
data: AdminIDLookupResponseData |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
from typing import Annotated, Any, Dict, List | ||
|
||
from fastapi import APIRouter, HTTPException, Query | ||
from unidecode import unidecode | ||
|
||
from app.models.pydantic.political import ( | ||
AdminIDLookupQueryParams, | ||
AdminIDLookupResponse, | ||
) | ||
from app.routes.datasets.queries import _query_dataset_json | ||
from app.settings.globals import ENV, per_env_admin_boundary_versions | ||
|
||
router = APIRouter() | ||
|
||
|
||
@router.get("/id-lookup", status_code=200, include_in_schema=False) | ||
async def id_lookup(params: Annotated[AdminIDLookupQueryParams, Query()]): | ||
"""Look up administrative boundary IDs matching a specified country name | ||
(and region name and subregion name, if specified).""" | ||
admin_source_to_dataset: Dict[str, str] = {"GADM": "gadm_administrative_boundaries"} | ||
|
||
try: | ||
dataset: str = admin_source_to_dataset[params.admin_source] | ||
except KeyError: | ||
raise HTTPException( | ||
status_code=400, | ||
detail=( | ||
"Invalid admin boundary source. Valid sources:" | ||
f" {[source for source in admin_source_to_dataset.keys()]}" | ||
), | ||
) | ||
|
||
version_str: str = lookup_admin_source_version( | ||
params.admin_source, params.admin_version | ||
) | ||
|
||
names: List[str | None] = normalize_names( | ||
params.normalize_search, params.country, params.region, params.subregion | ||
) | ||
|
||
adm_level: int = determine_admin_level(*names) | ||
|
||
sql: str = _admin_boundary_lookup_sql( | ||
adm_level, params.normalize_search, dataset, *names | ||
) | ||
|
||
json_data: List[Dict[str, Any]] = await _query_dataset_json( | ||
dataset, version_str, sql, None | ||
) | ||
|
||
return form_admin_id_lookup_response( | ||
params.admin_source, params.admin_version, adm_level, json_data | ||
) | ||
|
||
|
||
def normalize_names( | ||
normalize_search: bool, | ||
country: str | None, | ||
region: str | None, | ||
subregion: str | None, | ||
) -> List[str | None]: | ||
"""Turn any empty strings into Nones, enforces the admin level hierarchy, | ||
and optionally unaccents and decapitalizes names.""" | ||
names: List[str | None] = [] | ||
|
||
if subregion and not region: | ||
raise HTTPException( | ||
status_code=400, | ||
detail="If subregion is specified, region must be specified as well.", | ||
) | ||
|
||
for name in (country, region, subregion): | ||
if name and normalize_search: | ||
names.append(unidecode(name).lower()) | ||
elif name: | ||
names.append(name) | ||
else: | ||
names.append(None) | ||
return names | ||
|
||
|
||
def determine_admin_level( | ||
country: str | None, region: str | None, subregion: str | None | ||
) -> int: | ||
"""Infer the native admin level of a request based on the presence of non- | ||
empty fields.""" | ||
if subregion: | ||
return 2 | ||
elif region: | ||
return 1 | ||
elif country: | ||
return 0 | ||
else: # Shouldn't get here if FastAPI route definition worked | ||
raise HTTPException(status_code=400, detail="Country MUST be specified.") | ||
|
||
|
||
def _admin_boundary_lookup_sql( | ||
adm_level: int, | ||
normalize_search: bool, | ||
dataset: str, | ||
country_name: str, | ||
region_name: str | None, | ||
subregion_name: str | None, | ||
) -> str: | ||
"""Generate the SQL required to look up administrative boundary IDs by | ||
name.""" | ||
name_fields: List[str] = ["country", "name_1", "name_2"] | ||
if normalize_search: | ||
match_name_fields = [name_field + "_normalized" for name_field in name_fields] | ||
else: | ||
match_name_fields = name_fields | ||
|
||
sql = ( | ||
f"SELECT gid_0, gid_1, gid_2, {name_fields[0]}, {name_fields[1]}, {name_fields[2]}" | ||
f" FROM {dataset} WHERE {match_name_fields[0]}=$country${country_name}$country$" | ||
) | ||
if region_name is not None: | ||
sql += f" AND {match_name_fields[1]}=$region${region_name}$region$" | ||
if subregion_name is not None: | ||
sql += f" AND {match_name_fields[2]}=$subregion${subregion_name}$subregion$" | ||
|
||
sql += f" AND adm_level='{adm_level}'" | ||
|
||
return sql | ||
|
||
|
||
def lookup_admin_source_version(source, version) -> str: | ||
# The AdminIDLookupQueryParams validator should have already ensured | ||
# that the following is safe | ||
deployed_version_in_data_api = per_env_admin_boundary_versions[ENV][source][version] | ||
|
||
return deployed_version_in_data_api | ||
|
||
|
||
def form_admin_id_lookup_response( | ||
admin_source, admin_version, adm_level, match_list | ||
) -> AdminIDLookupResponse: | ||
matches = [] | ||
|
||
for match in match_list: | ||
country = {"id": extract_level_gid(0, match), "name": match["country"]} | ||
|
||
if adm_level < 1: | ||
region = {"id": None, "name": None} | ||
else: | ||
region = {"id": extract_level_gid(1, match), "name": match["name_1"]} | ||
|
||
if adm_level < 2: | ||
subregion = {"id": None, "name": None} | ||
else: | ||
subregion = {"id": extract_level_gid(2, match), "name": match["name_2"]} | ||
|
||
matches.append({"country": country, "region": region, "subregion": subregion}) | ||
|
||
data = { | ||
"adminSource": admin_source, | ||
"adminVersion": admin_version, | ||
"matches": matches, | ||
} | ||
resp = AdminIDLookupResponse(**{"data": data}) | ||
return resp | ||
|
||
|
||
def extract_level_gid(gid_level, match): | ||
gid_level_name = f"gid_{gid_level}" | ||
return (match[gid_level_name].rsplit("_")[0]).split(".")[gid_level] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (optional): I guess this code works in Pydantic, but the types would be clearer if 'data' was a AdminIDLookupResponseData, which I guess you could do via:
data = AdminIDLookupResponseData(**{
"adminSource": admin_source,
"adminVersion": admin_version,
"matches": matches,
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will implement, thanks!