Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-3081: Add political/id-lookup endpoint #616

Merged
merged 48 commits into from
Jan 29, 2025
Merged
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b94f3fa
Initial attempt at stub route for geoencoder
dmannarino Dec 17, 2024
505cc16
Turn URL params into query params; add boundary version, source;
dmannarino Dec 18, 2024
0514d42
Initial test, not expected to pass
dmannarino Dec 18, 2024
9d2ae13
Disable geoencoder tests to get it to deploy
dmannarino Dec 19, 2024
44a253f
WIP: Run a query instead of canned response
dmannarino Dec 19, 2024
9c2e0ca
WIP: refactor to make better use of helper fcns
dmannarino Dec 20, 2024
899e772
More error handling and tests
dmannarino Dec 20, 2024
d633596
Correct region/subregion->name fields
dmannarino Dec 21, 2024
7f5e9f3
Correct case of multiple WHEREs
dmannarino Dec 21, 2024
2f2facd
Update pre-commit packages, disable docformatter until it's fixed
dmannarino Dec 21, 2024
15de81a
Too much, TBH: Add limiting query to specified admin level; enforce s…
dmannarino Dec 23, 2024
40f7772
pipenv finally updated packages; add unidecode for geoencode endpoint
dmannarino Dec 26, 2024
2cee550
Optionally unaccent names in request to geoencode endpoint
dmannarino Dec 26, 2024
e07c4f4
Update lockfile for new raterio/numpy
dmannarino Dec 26, 2024
bb69f18
Don't pass Nones to unidecode
dmannarino Dec 26, 2024
1b95ca2
Actually search the unaccented columns
dmannarino Dec 27, 2024
79ae7c3
Add output example from ticket as a test, and adjust code to pass
dmannarino Dec 27, 2024
09e628e
Get regular fields, not unaccented ones
dmannarino Dec 27, 2024
2934fd6
Fix bug introduced in last commit: GET name fields, MATCH on (potenti…
dmannarino Dec 27, 2024
e09cf01
Add a test for getting from unaccented fields
dmannarino Dec 27, 2024
2d979a2
Hide extraneous fields
dmannarino Dec 27, 2024
0c6d541
Add docstrings, add a test, and slightly improve error message on non…
dmannarino Dec 27, 2024
ed5f2cd
Decapitalize as part of normalization; add tests
dmannarino Dec 27, 2024
34c41f8
Return GIDs as relative, not complete values. eg: GID_1=1 instead of …
dmannarino Dec 28, 2024
1952fc0
Minor doc addition
dmannarino Dec 28, 2024
c6384fd
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Dec 28, 2024
4aef63c
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 7, 2025
0f80b9e
WIP: Move geoencode query params into a model in order to implement a…
dmannarino Jan 12, 2025
b23bf7f
Fix resolving version to string
dmannarino Jan 12, 2025
275ff6e
WIP: Temporarily include geoencode route in docs
dmannarino Jan 12, 2025
53ddba5
Fix for last commit: PRepend 'v' to version string, again.
dmannarino Jan 13, 2025
3bc92f5
WIP: Add lookup_admin_source_version helper, duplicating some code
dmannarino Jan 13, 2025
fbacd70
Raise ValueErrors instead of AssertionErrors on bad params
dmannarino Jan 13, 2025
9c5fc87
After much pain and gnashing of teeth, get validator working again
dmannarino Jan 14, 2025
849d68a
Add models for Geoencoder responses and children
dmannarino Jan 14, 2025
4668790
Use dolar sign quoting to avoid PostgreSQL complaining about apostrop…
dmannarino Jan 14, 2025
68f6590
Add type hint per Dan's suggestion
dmannarino Jan 21, 2025
d35bb02
Re-enable docformatter precommit hook
dmannarino Jan 24, 2025
0bdb3e5
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 24, 2025
a9cf2df
Improve error messages
dmannarino Jan 24, 2025
83aa0a0
Move geoencoder to /political/geoencoder
dmannarino Jan 24, 2025
0effc5b
Break forming Geoencoder response out into a helper
dmannarino Jan 24, 2025
84bf869
Rename geoencoder endpoint to id-lookup
dmannarino Jan 24, 2025
af3c244
Set version of GADM 4.1 in various environments
dmannarino Jan 27, 2025
498ecc6
Implement Gary's suggestions to rename from geoencoder -> admin id lo…
dmannarino Jan 27, 2025
d9e0cda
Use AdminIDLookupResponseData properly and add a few type hints
dmannarino Jan 28, 2025
55e1a2f
Merge branch 'develop' into gtc-3081_geoencoder_endpoint
dmannarino Jan 29, 2025
ffe9b6b
Use this branch's Pipfile, I need unidecode
dmannarino Jan 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
More error handling and tests
dmannarino committed Dec 20, 2024
commit 899e772cb5a9737916ae3910e9980f1c831e044f
58 changes: 48 additions & 10 deletions app/routes/thematic/geoencoder.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmannarino I know you said you didn't like thematic. I was thinking of other ideas, is this admin area specific endpoints, maybe we just put it under like something like "political"? E.g. /political/geoencoder. Then the future can include additional GADM endpoints, WDPA, concessions, etc. Not sure if we need to distinguish it from the rest of the API, or we can just have it all in under a header in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I do think geoencoder is a little vague if it only works for admin areas, geoencoding implies converting any text of the place into coordinates: https://en.wikipedia.org/wiki/Address_geocoding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to /political/id-lookup

from typing import Optional, Any, Dict, List

from fastapi import APIRouter, HTTPException, Query

from app.crud.versions import get_version, get_version_names
from app.errors import RecordNotFoundError
from app.models.pydantic.responses import Response
from app.routes import VERSION_REGEX
from app.routes.datasets.queries import _query_dataset_json


@@ -20,7 +25,7 @@ async def geoencode(
description="The source of administrative boundaries to use."
),
admin_version: str = Query(
None,
...,
description="Version of the administrative boundaries dataset to use.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, is there a way to document the choices available? I guess this may get more confusing if ever have multiple providers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wild idea: should we consolidate admin version and admin dataset to one field, and have the options be like: "GADM 3.6", "GADM 4.1", "geoBoundaries 1.0", "middleEarth 3.2". Then it'll be clear what you're getting from a set of options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining the provider and version has merit (middleEarth 3.2). It has excellent cohesion.
This is such a simple scenario that it's hard to justify breaking it up.

However, we'd be encoding a string with special information (i.e., providerspaceversion).
It's generally a good practice to split information up even though it makes for a bit more complexity.
It's more flexible in the long run (although I'm starting to violate YAGNI with this statement).
Finally, we are splitting this information up in the area/geostore microservice interface for all of the reasons mentioned above.

That's my two cents.

),
country: str = Query(
@@ -46,13 +51,15 @@ async def geoencode(
dataset = admin_source_to_dataset[admin_source.upper()]
except KeyError:
raise HTTPException(
status_code=404,
status_code=400,
detail=f"Invalid admin boundary source. Valid sources: {admin_source_to_dataset.keys()}"
)

version_str = "v" + str(admin_version).lstrip("v")

sql: str = await admin_boundary_lookup_sql(
await version_is_valid(dataset, version_str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned about the documentation above, should it be well known in advance which versions we support for providers, rather than just trying and throwing back an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fundamentally agree that your suggestion is a good one, but it turns out to be difficult to do in practice. Especially considering the limited use of this endpoint and the fact that we will be omitting it from the docs.


sql: str = _admin_boundary_lookup_sql(
admin_source,
country,
region,
@@ -63,14 +70,16 @@ async def geoencode(
dataset, version_str, sql, None
)

return {
"adminSource": admin_source,
"adminVersion": admin_version,
"matches": json_data
}
return Response(
data={
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be cleaner as a pydantic model instead of a dict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a model, and a helper function to construct the response in hopefully a clear way.

"adminSource": admin_source,
"adminVersion": admin_version,
"matches": json_data
}
)


async def admin_boundary_lookup_sql(
def _admin_boundary_lookup_sql(
dataset: str,
country_name: str,
region_name: Optional[str],
@@ -81,11 +90,40 @@ async def admin_boundary_lookup_sql(
"""
sql = (
f"SELECT gid_0, gid_1, gid_2, country, name_1, name_2 FROM {dataset}"
f" AND WHERE country='{country_name}'"
f" WHERE country='{country_name}'"
)
if region_name is not None:
sql += f" AND WHERE region='{region_name}'"
if subregion_name is not None:
sql += f" AND WHERE subregion='{subregion_name}'"

return sql


async def version_is_valid(
dataset: str,
version: str,
) -> None:
"""

"""
if re.match(VERSION_REGEX, version) is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per above, maybe just have pre-validated versions rather than checking the version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now specified on a per-env basis!

raise HTTPException(
status_code=400,
detail=(
"Invalid version name. Version names begin with a 'v' and "
"consist of one to three integers separated by periods. "
"eg. 'v1', 'v7.1', 'v4.1.0', 'v20240801'"
)
)

try:
_ = await get_version(dataset, version)
except RecordNotFoundError:
raise HTTPException(
status_code=400,
detail=(
"Version not found. Existing versions for this dataset "
f"include {await get_version_names(dataset)}"
)
)
132 changes: 104 additions & 28 deletions tests_v2/unit/app/routes/thematic/geoencoder/test_geoencoder.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,104 @@
# import pytest
# from httpx import AsyncClient
#
#
# @pytest.mark.asyncio
# async def test_geoencoder_no_version(async_client: AsyncClient) -> None:
# params = {"country": "Canada"}
#
# resp = await async_client.get("/thematic/geoencode", params=params)
#
# assert resp.status_code == 400
#
#
# @pytest.mark.asyncio
# async def test_geoencoder_fake_country_no_matches(async_client: AsyncClient) -> None:
#
# params = {"admin_version": "4.1", "country": "Canadiastan"}
#
# resp = await async_client.get("/thematic/geoencode", params=params)
#
# assert resp.json() == {
# "status": "success",
# "data": {
# "adminVersion": "4.1",
# "matches": []
# }
# }
# assert resp.status_code == 200
from typing import Optional, Any, Dict, List

import pytest
from httpx import AsyncClient

from app.models.pydantic.geostore import GeostoreCommon
from app.routes.thematic import geoencoder
from app.routes.thematic.geoencoder import _admin_boundary_lookup_sql


@pytest.mark.asyncio
async def test__admin_boundary_lookup_sql() -> None:
sql = _admin_boundary_lookup_sql(
"some_dataset", "some_country", "some_region", "some_subregion"
)
assert sql == (
"SELECT gid_0, gid_1, gid_2, country, name_1, name_2 FROM some_dataset "
"WHERE country='some_country' "
"AND WHERE region='some_region' "
"AND WHERE subregion='some_subregion'"
)


@pytest.mark.asyncio
async def test_geoencoder_no_admin_version(async_client: AsyncClient) -> None:
params = {"country": "Canada"}

resp = await async_client.get("/thematic/geoencode", params=params)

assert resp.status_code == 422


@pytest.mark.asyncio
async def test_geoencoder_invalid_version_pattern(async_client: AsyncClient) -> None:
params = {"country": "Canada", "admin_version": "fails_regex"}

resp = await async_client.get("/thematic/geoencode", params=params)

assert resp.json().get("message", {}).startswith("Invalid version")
assert resp.status_code == 400


@pytest.mark.asyncio
async def test_geoencoder_nonexistant_version(async_client: AsyncClient) -> None:
params = {"country": "Canada", "admin_version": "v4.0"}

resp = await async_client.get("/thematic/geoencode", params=params)

assert resp.json().get("message", {}).startswith("Version not found")
assert resp.status_code == 400


@pytest.mark.asyncio
async def test_geoencoder_bad_boundary_source(async_client: AsyncClient) -> None:
params = {
"admin_source": "bobs_boundaries",
"admin_version": "4.1",
"country": "Canadiastan"
}

resp = await async_client.get("/thematic/geoencode", params=params)

assert resp.json().get("message", {}).startswith("Invalid admin boundary source")
assert resp.status_code == 400


@pytest.mark.asyncio
async def test_geoencoder_no_matches(
async_client: AsyncClient,
monkeypatch: pytest.MonkeyPatch
) -> None:
admin_source = "gadm"
admin_version = "v4.1"

params = {
"admin_source": admin_source,
"admin_version": admin_version,
"country": "Canadiastan"
}

async def mock_version_is_valid(dataset: str, version: str): return None
monkeypatch.setattr(geoencoder, "version_is_valid", mock_version_is_valid)
monkeypatch.setattr(geoencoder, "_query_dataset_json", _query_dataset_json_mocked_results)

resp = await async_client.get("/thematic/geoencode", params=params)

assert resp.json() == {
"status": "success",
"data": {
"adminSource": admin_source,
"adminVersion": admin_version,
"matches": []
}
}
assert resp.status_code == 200


async def _query_dataset_json_mocked_results(
dataset: str,
version: str,
sql: str,
geostore: Optional[GeostoreCommon],
) -> List[Dict[str, Any]]:
return []