Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-3081: Add political/id-lookup endpoint #616

Merged
merged 48 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b94f3fa
Initial attempt at stub route for geoencoder
dmannarino Dec 17, 2024
505cc16
Turn URL params into query params; add boundary version, source;
dmannarino Dec 18, 2024
0514d42
Initial test, not expected to pass
dmannarino Dec 18, 2024
9d2ae13
Disable geoencoder tests to get it to deploy
dmannarino Dec 19, 2024
44a253f
WIP: Run a query instead of canned response
dmannarino Dec 19, 2024
9c2e0ca
WIP: refactor to make better use of helper fcns
dmannarino Dec 20, 2024
899e772
More error handling and tests
dmannarino Dec 20, 2024
d633596
Correct region/subregion->name fields
dmannarino Dec 21, 2024
7f5e9f3
Correct case of multiple WHEREs
dmannarino Dec 21, 2024
2f2facd
Update pre-commit packages, disable docformatter until it's fixed
dmannarino Dec 21, 2024
15de81a
Too much, TBH: Add limiting query to specified admin level; enforce s…
dmannarino Dec 23, 2024
40f7772
pipenv finally updated packages; add unidecode for geoencode endpoint
dmannarino Dec 26, 2024
2cee550
Optionally unaccent names in request to geoencode endpoint
dmannarino Dec 26, 2024
e07c4f4
Update lockfile for new raterio/numpy
dmannarino Dec 26, 2024
bb69f18
Don't pass Nones to unidecode
dmannarino Dec 26, 2024
1b95ca2
Actually search the unaccented columns
dmannarino Dec 27, 2024
79ae7c3
Add output example from ticket as a test, and adjust code to pass
dmannarino Dec 27, 2024
09e628e
Get regular fields, not unaccented ones
dmannarino Dec 27, 2024
2934fd6
Fix bug introduced in last commit: GET name fields, MATCH on (potenti…
dmannarino Dec 27, 2024
e09cf01
Add a test for getting from unaccented fields
dmannarino Dec 27, 2024
2d979a2
Hide extraneous fields
dmannarino Dec 27, 2024
0c6d541
Add docstrings, add a test, and slightly improve error message on non…
dmannarino Dec 27, 2024
ed5f2cd
Decapitalize as part of normalization; add tests
dmannarino Dec 27, 2024
34c41f8
Return GIDs as relative, not complete values. eg: GID_1=1 instead of …
dmannarino Dec 28, 2024
1952fc0
Minor doc addition
dmannarino Dec 28, 2024
c6384fd
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Dec 28, 2024
4aef63c
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 7, 2025
0f80b9e
WIP: Move geoencode query params into a model in order to implement a…
dmannarino Jan 12, 2025
b23bf7f
Fix resolving version to string
dmannarino Jan 12, 2025
275ff6e
WIP: Temporarily include geoencode route in docs
dmannarino Jan 12, 2025
53ddba5
Fix for last commit: PRepend 'v' to version string, again.
dmannarino Jan 13, 2025
3bc92f5
WIP: Add lookup_admin_source_version helper, duplicating some code
dmannarino Jan 13, 2025
fbacd70
Raise ValueErrors instead of AssertionErrors on bad params
dmannarino Jan 13, 2025
9c5fc87
After much pain and gnashing of teeth, get validator working again
dmannarino Jan 14, 2025
849d68a
Add models for Geoencoder responses and children
dmannarino Jan 14, 2025
4668790
Use dolar sign quoting to avoid PostgreSQL complaining about apostrop…
dmannarino Jan 14, 2025
68f6590
Add type hint per Dan's suggestion
dmannarino Jan 21, 2025
d35bb02
Re-enable docformatter precommit hook
dmannarino Jan 24, 2025
0bdb3e5
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 24, 2025
a9cf2df
Improve error messages
dmannarino Jan 24, 2025
83aa0a0
Move geoencoder to /political/geoencoder
dmannarino Jan 24, 2025
0effc5b
Break forming Geoencoder response out into a helper
dmannarino Jan 24, 2025
84bf869
Rename geoencoder endpoint to id-lookup
dmannarino Jan 24, 2025
af3c244
Set version of GADM 4.1 in various environments
dmannarino Jan 27, 2025
498ecc6
Implement Gary's suggestions to rename from geoencoder -> admin id lo…
dmannarino Jan 27, 2025
d9e0cda
Use AdminIDLookupResponseData properly and add a few type hints
dmannarino Jan 28, 2025
55e1a2f
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 29, 2025
ffe9b6b
Use this branch's Pipfile, I need unidecode
dmannarino Jan 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
line_length = 88
multi_line_output = 3
include_trailing_comma = True
known_third_party = _pytest,aenum,affine,alembic,asgi_lifespan,async_lru,asyncpg,aws_utils,boto3,botocore,click,docker,ee,errors,fastapi,fiona,gdal_utils,geoalchemy2,geojson,gfw_pixetl,gino,gino_starlette,google,httpx,httpx_auth,logger,logging_utils,moto,numpy,orjson,osgeo,pandas,pendulum,pglast,psutil,psycopg2,pydantic,pyproj,pytest,pytest_asyncio,rasterio,shapely,sqlalchemy,sqlalchemy_utils,starlette,tileputty,tiles_geojson,typer
known_third_party = _pytest,aenum,affine,alembic,asgi_lifespan,async_lru,asyncpg,aws_utils,boto3,botocore,click,docker,ee,errors,fastapi,fiona,gdal_utils,geoalchemy2,geojson,gfw_pixetl,gino,gino_starlette,google,httpx,httpx_auth,logger,logging_utils,moto,numpy,orjson,osgeo,pandas,pendulum,pglast,psutil,psycopg2,pydantic,pyproj,pytest,pytest_asyncio,rasterio,shapely,sqlalchemy,sqlalchemy_utils,starlette,tileputty,tiles_geojson,typer,unidecode
20 changes: 10 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,32 @@ repos:
hooks:
- id: isort
- repo: https://github.com/myint/docformatter
rev: v1.4
rev: eb1df347edd128b30cd3368dddc3aa65edcfac38 # pragma: allowlist secret
hooks:
- id: docformatter
args: [--in-place]
- repo: https://github.com/ambv/black
rev: 22.12.0
rev: 24.10.0
hooks:
- id: black
language_version: python3.10
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
rev: v5.0.0
hooks:
- id: detect-aws-credentials
- id: detect-private-key
- id: trailing-whitespace
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
rev: 7.1.1
hooks:
- id: flake8
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.971
rev: v1.14.1
hooks:
- id: mypy
- id: mypy
- repo: https://github.com/Yelp/detect-secrets
rev: v1.3.0
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline'] # run: `pip install detect-secrets` to establish baseline
exclude: Pipfile.lock
- id: detect-secrets
args: ['--baseline', '.secrets.baseline'] # run: `pip install detect-secrets` to establish baseline
exclude: Pipfile.lock
2 changes: 1 addition & 1 deletion .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@
}
]
},
"version": "1.3.0",
"version": "1.5.0",
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
Expand Down
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ sqlalchemy = "<1.4"
sqlalchemy-utils = "*"
starlette = "*"
typer = "*"
unidecode = "*"
uvicorn = {version = "*", extras = ["standard"]}

[requires]
Expand Down
4,349 changes: 2,211 additions & 2,138 deletions Pipfile.lock

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from starlette.middleware.base import BaseHTTPMiddleware

from app.errors import http_error_handler
from app.routes.political import id_lookup

from .application import app
from .middleware import no_cache_response_header, redirect_latest, set_db_mode
Expand Down Expand Up @@ -128,6 +129,13 @@ async def rve_error_handler(
app.include_router(r, prefix="/dataset")


################
# POLITICAL API #
################

app.include_router(id_lookup.router, prefix="/political")


###############
# ASSET API
###############
Expand Down
97 changes: 97 additions & 0 deletions app/models/pydantic/political.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
from typing import List, Optional

from fastapi.params import Query
from pydantic import Field, root_validator

from app.models.pydantic.base import StrictBaseModel
from app.models.pydantic.responses import Response
from app.settings.globals import ENV, per_env_admin_boundary_versions


class AdminIDLookupQueryParams(StrictBaseModel):
admin_source: str = Field(
"GADM",
description=(
"The source of administrative boundaries to use "
"(currently the only valid choice is 'GADM')."
),
)
admin_version: str = Query(
...,
description=(
"The version of the administrative boundaries to use "
"(note that this represents the release of the source dataset, "
"not the GFW Data API's idea of the version in the database)."
),
)
country: str = Query(
...,
description="Name of the country to match.",
)
region: Optional[str] = Query(
None,
description="Name of the region to match.",
)
subregion: Optional[str] = Query(
None,
description="Name of the subregion to match.",
)
normalize_search: bool = Query(
True,
description=(
"Whether or not to perform a case- and accent-insensitive search."
),
)

@root_validator(pre=True)
def validate_params(cls, values):
source = values.get("admin_source")
if source is None:
raise ValueError(
"You must provide admin_source or leave unset for the "
"default value of 'GADM'."
)

version = values.get("admin_version")
if version is None:
raise ValueError("You must provide an admin_version")

sources_in_this_env = per_env_admin_boundary_versions[ENV]

versions_of_source_in_this_env = sources_in_this_env.get(source)
if versions_of_source_in_this_env is None:
raise ValueError(
f"Invalid administrative boundary source {source}. Valid "
f"sources in this environment are {[v for v in sources_in_this_env.keys()]}"
)

deployed_version_in_data_api = versions_of_source_in_this_env.get(version)
if deployed_version_in_data_api is None:
raise ValueError(
f"Invalid version {version} for administrative boundary source "
f"{source}. Valid versions for this source in this environment are "
f"{[v for v in versions_of_source_in_this_env.keys()]}"
)

return values


class AdminIDLookupMatchElement(StrictBaseModel):
id: str | None
name: str | None


class AdminIDLookupMatch(StrictBaseModel):
country: AdminIDLookupMatchElement
region: AdminIDLookupMatchElement
subregion: AdminIDLookupMatchElement


class AdminIDLookupResponseData(StrictBaseModel):
adminSource: str
adminVersion: str
matches: List[AdminIDLookupMatch]


class AdminIDLookupResponse(Response):
data: AdminIDLookupResponseData
Empty file.
166 changes: 166 additions & 0 deletions app/routes/political/id_lookup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
from typing import Annotated, Any, Dict, List

from fastapi import APIRouter, HTTPException, Query
from unidecode import unidecode

from app.models.pydantic.political import (
AdminIDLookupQueryParams,
AdminIDLookupResponse,
)
from app.routes.datasets.queries import _query_dataset_json
from app.settings.globals import ENV, per_env_admin_boundary_versions

router = APIRouter()


@router.get("/id-lookup", status_code=200, include_in_schema=False)
async def id_lookup(params: Annotated[AdminIDLookupQueryParams, Query()]):
"""Look up administrative boundary IDs matching a specified country name
(and region name and subregion name, if specified)."""
admin_source_to_dataset: Dict[str, str] = {"GADM": "gadm_administrative_boundaries"}

try:
dataset: str = admin_source_to_dataset[params.admin_source]
except KeyError:
raise HTTPException(
status_code=400,
detail=(
"Invalid admin boundary source. Valid sources:"
f" {[source for source in admin_source_to_dataset.keys()]}"
),
)

version_str: str = lookup_admin_source_version(
params.admin_source, params.admin_version
)

names: List[str | None] = normalize_names(
params.normalize_search, params.country, params.region, params.subregion
)

adm_level: int = determine_admin_level(*names)

sql: str = _admin_boundary_lookup_sql(
adm_level, params.normalize_search, dataset, *names
)

json_data: List[Dict[str, Any]] = await _query_dataset_json(
dataset, version_str, sql, None
)

return form_admin_id_lookup_response(
params.admin_source, params.admin_version, adm_level, json_data
)


def normalize_names(
normalize_search: bool,
country: str | None,
region: str | None,
subregion: str | None,
) -> List[str | None]:
"""Turn any empty strings into Nones, enforces the admin level hierarchy,
and optionally unaccents and decapitalizes names."""
names: List[str | None] = []

if subregion and not region:
raise HTTPException(
status_code=400,
detail="If subregion is specified, region must be specified as well.",
)

for name in (country, region, subregion):
if name and normalize_search:
names.append(unidecode(name).lower())
elif name:
names.append(name)
else:
names.append(None)
return names


def determine_admin_level(
country: str | None, region: str | None, subregion: str | None
) -> int:
"""Infer the native admin level of a request based on the presence of non-
empty fields."""
if subregion:
return 2
elif region:
return 1
elif country:
return 0
else: # Shouldn't get here if FastAPI route definition worked
raise HTTPException(status_code=400, detail="Country MUST be specified.")


def _admin_boundary_lookup_sql(
adm_level: int,
normalize_search: bool,
dataset: str,
country_name: str,
region_name: str | None,
subregion_name: str | None,
) -> str:
"""Generate the SQL required to look up administrative boundary IDs by
name."""
name_fields: List[str] = ["country", "name_1", "name_2"]
if normalize_search:
match_name_fields = [name_field + "_normalized" for name_field in name_fields]
else:
match_name_fields = name_fields

sql = (
f"SELECT gid_0, gid_1, gid_2, {name_fields[0]}, {name_fields[1]}, {name_fields[2]}"
f" FROM {dataset} WHERE {match_name_fields[0]}=$country${country_name}$country$"
)
if region_name is not None:
sql += f" AND {match_name_fields[1]}=$region${region_name}$region$"
if subregion_name is not None:
sql += f" AND {match_name_fields[2]}=$subregion${subregion_name}$subregion$"

sql += f" AND adm_level='{adm_level}'"

return sql


def lookup_admin_source_version(source, version) -> str:
# The AdminIDLookupQueryParams validator should have already ensured
# that the following is safe
deployed_version_in_data_api = per_env_admin_boundary_versions[ENV][source][version]

return deployed_version_in_data_api


def form_admin_id_lookup_response(
admin_source, admin_version, adm_level, match_list
) -> AdminIDLookupResponse:
matches = []

for match in match_list:
country = {"id": extract_level_gid(0, match), "name": match["country"]}

if adm_level < 1:
region = {"id": None, "name": None}
else:
region = {"id": extract_level_gid(1, match), "name": match["name_1"]}

if adm_level < 2:
subregion = {"id": None, "name": None}
else:
subregion = {"id": extract_level_gid(2, match), "name": match["name_2"]}

matches.append({"country": country, "region": region, "subregion": subregion})

data = {
"adminSource": admin_source,
"adminVersion": admin_version,
"matches": matches,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): I guess this code works in Pydantic, but the types would be clearer if 'data' was a AdminIDLookupResponseData, which I guess you could do via:

data = AdminIDLookupResponseData(**{
"adminSource": admin_source,
"adminVersion": admin_version,
"matches": matches,
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will implement, thanks!

resp = AdminIDLookupResponse(**{"data": data})
return resp


def extract_level_gid(gid_level, match):
gid_level_name = f"gid_{gid_level}"
return (match[gid_level_name].rsplit("_")[0]).split(".")[gid_level]
27 changes: 26 additions & 1 deletion app/settings/globals.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import json
from pathlib import Path
from typing import Optional
from typing import Dict, Optional

from starlette.config import Config
from starlette.datastructures import Secret
Expand Down Expand Up @@ -185,3 +185,28 @@
RASTER_ANALYSIS_STATE_MACHINE_ARN = config(
"RASTER_ANALYSIS_STATE_MACHINE_ARN", cast=str, default=None
)

# TODO: Find a good home for this:
per_env_admin_boundary_versions: Dict[str, Dict[str, Dict]] = {
"test": {
"GADM": {
"3.6": "v3.6",
"4.1": "v4.1.64",
}
},
"dev": {
"GADM": {
"4.1": "v4.1.64",
}
},
"staging": {
"GADM": {
"4.1": "v4.1.64",
}
},
"production": {
"GADM": {
"4.1": "v4.1.64",
}
},
}
Empty file.
Empty file.
Loading
Loading