Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIKE, IN, BETWEEN operators -- filter extension #178

Merged
merged 7 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- LIKE search operator to Filter extension [#178](https://github.com/stac-utils/stac-fastapi-elasticsearch/pull/178)

### Changed

- Elasticsearch drivers from 7.17.9 to 8.11.0 [#169](https://github.com/stac-utils/stac-fastapi-elasticsearch/pull/169)
Expand Down
7 changes: 6 additions & 1 deletion stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@
settings = ElasticsearchSettings()
session = Session.create_from_settings(settings)

filter_extension = FilterExtension(client=EsAsyncBaseFiltersClient())
filter_extension.conformance_classes.append(
"http://www.opengis.net/spec/cql2/1.0/req/advanced-comparison-operators"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be advertised unless IN and BETWEEN are also implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have implemented IN and BETWEEN in the latest commit.

The conformance class endpoint above aligns with what STAC Browser expects. The Filter extension README has the conformance class endpoint as http://www.opengis.net/spec/cql2/1.0/conf/advanced-comparison-operators. Which should be used? I get a 404 response from both...

I followed this example which has the BETWEEN bounds being passed as an array.

{
  "filter-lang": "cql2-json",
  "filter": {
    "op": "between",
    "args": [
      { "property": "eo:cloud_cover" },
      [ 0, 50 ]
    ]
  }
}

This required adjustments to the Clause filter class to allow lists and some typing checks in IN and BETWEEN

class Clause(BaseModel):
    """Filter extension clause."""

    op: Union[LogicalOp, ComparisonOp, AdvancedComparisonOp, SpatialIntersectsOp]
    args: List[Union[Arg, List[Arg]]]
elif self.op == AdvancedComparisonOp.between:
            if not isinstance(self.args[1], List):
                raise RuntimeError(f"Arg {self.args[1]} is not a list")
            return {
                "range": {
                    to_es(self.args[0]): {
                        "gte": to_es(self.args[1][0]),
                        "lte": to_es(self.args[1][1]),
                    }
                }
            }
elif self.op == AdvancedComparisonOp._in:
    if not isinstance(self.args[1], List):
        raise RuntimeError(f"Arg {self.args[1]} is not a list")
    return {
        "terms": {to_es(self.args[0]): [to_es(arg) for arg in self.args[1]]}
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have implemented IN and BETWEEN in the latest commit.

nice, will review.

The conformance class endpoint above aligns with what STAC Browser expects. The Filter extension README has the conformance class endpoint as http://www.opengis.net/spec/cql2/1.0/conf/advanced-comparison-operators. Which should be used? I get a 404 response from both...

Ah, that seems to be a bug in STAC Browser - they should be the urls with conf in them, not req. They aren't resolvable as a URL -- many of the OGC URL-like conformance classes aren't.

I followed this example which has the BETWEEN bounds being passed as an array.

This has been changed in newer versions of CQL2 (or v1.0.0-beta.5 had bad examples?). The current version of Filter Extension is here, specifically BETWEEN: https://github.com/stac-api-extensions/filter/blob/main/README.md#example-11-using-between
I think you'll probably need to make some changes to work with this instead of the beta.5 version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thank you, Phil.

Updated BETWEEN syntax and the conformance class endpoint in e7330dc

)

extensions = [
TransactionExtension(client=TransactionsClient(session=session), settings=settings),
BulkTransactionExtension(client=BulkTransactionsClient(session=session)),
Expand All @@ -32,7 +37,7 @@
SortExtension(),
TokenPaginationExtension(),
ContextExtension(),
FilterExtension(client=EsAsyncBaseFiltersClient()),
filter_extension,
]

post_request_model = create_post_request_model(extensions)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
Basic CQL2 (AND, OR, NOT), comparison operators (=, <>, <, <=, >, >=), and IS NULL.
The comparison operators are allowed against string, numeric, boolean, date, and datetime types.

Advanced CQL2 LIKE comparison operator (http://www.opengis.net/spec/cql2/1.0/req/advanced-comparison-operators).
The LIKE comparison operator is allowed against string types.

Basic Spatial Operators (http://www.opengis.net/spec/cql2/1.0/conf/basic-spatial-operators)
defines the intersects operator (S_INTERSECTS).
"""
Expand Down Expand Up @@ -78,6 +81,15 @@ def to_es(self):
)


class AdvancedComparisonOp(str, Enum):
"""Advanced Comparison operator.

CQL2 advanced comparison operator like (~).
"""

like = "like"


class SpatialIntersectsOp(str, Enum):
"""Spatial intersections operator s_intersects."""

Expand Down Expand Up @@ -152,7 +164,7 @@ def validate(cls, v):
class Clause(BaseModel):
"""Filter extension clause."""

op: Union[LogicalOp, ComparisonOp, SpatialIntersectsOp]
op: Union[LogicalOp, ComparisonOp, AdvancedComparisonOp, SpatialIntersectsOp]
args: List[Arg]

def to_es(self):
Expand All @@ -171,6 +183,16 @@ def to_es(self):
"must_not": [{"term": {to_es(self.args[0]): to_es(self.args[1])}}]
}
}
elif self.op == AdvancedComparisonOp.like:
return {
"wildcard": {
to_es(self.args[0]): {
"value": cql2_like_to_es(str(to_es(self.args[1]))),
"boost": 1.0,
"case_insensitive": "true",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be case sensitive

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that case_insensitive should be set to false?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since AFAIK LIKE is a case sensitive operation (e.g., b% matches ba but not Ba

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Updated in e7330dc

}
}
}
elif (
self.op == ComparisonOp.lt
or self.op == ComparisonOp.lte
Expand Down Expand Up @@ -210,3 +232,22 @@ def to_es(arg: Arg):
return arg
else:
raise RuntimeError(f"unknown arg {repr(arg)}")


def cql2_like_to_es(input_string):
"""Convert arugument in CQL2 ('_' and '%') to Elasticsearch wildcard operators ('?' and '*', respectively). Handle escape characters and handle Elasticsearch wildcards directly."""
es_string = ""
escape = False

for char in input_string:
if char == "\\":
escape = True
elif char == "_" and not escape:
es_string += "?"
elif char == "%" and not escape:
es_string += "*"
else:
es_string += char
escape = False

return es_string
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"op": "like",
"args": [
{
"property": "scene_id"
},
"LC82030282019133%"
]
}
10 changes: 10 additions & 0 deletions stac_fastapi/elasticsearch/tests/extensions/cql2/example20.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"op": "like",
"args": [
{
"property": "scene_id"
},
"LC82030282019133LGN0_"
]
}

96 changes: 96 additions & 0 deletions stac_fastapi/elasticsearch/tests/extensions/test_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,3 +213,99 @@ async def test_search_filter_extension_floats_post(app_client, ctx):

assert resp.status_code == 200
assert len(resp.json()["features"]) == 1


@pytest.mark.asyncio
async def test_search_filter_extension_wildcard_cql2(app_client, ctx):
single_char = ctx.item["id"][:-1] + "_"
multi_char = ctx.item["id"][:-3] + "%"

params = {
"filter": {
"op": "and",
"args": [
{"op": "=", "args": [{"property": "id"}, ctx.item["id"]]},
{
"op": "like",
"args": [
{"property": "id"},
single_char,
],
},
{
"op": "like",
"args": [
{"property": "id"},
multi_char,
],
},
],
}
}

resp = await app_client.post("/search", json=params)

assert resp.status_code == 200
assert len(resp.json()["features"]) == 1


@pytest.mark.asyncio
async def test_search_filter_extension_wildcard_es(app_client, ctx):
single_char = ctx.item["id"][:-1] + "?"
multi_char = ctx.item["id"][:-3] + "*"

params = {
"filter": {
"op": "and",
"args": [
{"op": "=", "args": [{"property": "id"}, ctx.item["id"]]},
{
"op": "like",
"args": [
{"property": "id"},
single_char,
],
},
{
"op": "like",
"args": [
{"property": "id"},
multi_char,
],
},
],
}
}

resp = await app_client.post("/search", json=params)

assert resp.status_code == 200
assert len(resp.json()["features"]) == 1


@pytest.mark.asyncio
async def test_search_filter_extension_escape_chars(app_client, ctx):
esc_chars = (
ctx.item["properties"]["landsat:product_id"].replace("_", "\\_")[:-1] + "_"
)

params = {
"filter": {
"op": "and",
"args": [
{"op": "=", "args": [{"property": "id"}, ctx.item["id"]]},
{
"op": "like",
"args": [
{"property": "properties.landsat:product_id"},
esc_chars,
],
},
],
}
}

resp = await app_client.post("/search", json=params)

assert resp.status_code == 200
assert len(resp.json()["features"]) == 1