-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter by datetime objects instead of date strings #32
base: main
Are you sure you want to change the base?
Filter by datetime objects instead of date strings #32
Conversation
It is recommended to use Mongo datetime objects instead of strings. The previous implementation of queries using date strings didn't work when the MongoDB uses datetime objects and not date strings.
Greetings currently the I've tackled this in the past but ended up using a solution I didn't really like cause it was a bit messy code, which involved updating
# ...
import dateutil.parser
# ...
@staticmethod
def apply_datetime_filter(search: MongoSearchAdapter, datetime_search):
"""Apply a filter to search based on datetime field.
Args:
search (Search): The search object to filter.
datetime_search (dict): The datetime filter criteria.
Returns:
Search: The filtered search object.
"""
datetime_filter = {}
if "eq" in datetime_search:
datetime_filter["$eq"] = dateutil.parser.parse(datetime_search["eq"])
else:
if "gte" in datetime_search and datetime_search["gte"]:
datetime_filter["$gte"] = dateutil.parser.parse(datetime_search["gte"])
if "lte" in datetime_search and datetime_search["lte"]:
datetime_filter["$lte"] = dateutil.parser.parse(datetime_search["lte"])
if datetime_filter:
search.add_filter({"properties.datetime": datetime_filter})
# ...
async def execute_search(
self,
search: MongoSearchAdapter,
limit: int,
token: Optional[str],
sort: Optional[Dict[str, Dict[str, str]]],
collection_ids: Optional[List[str]],
ignore_unavailable: bool = True,
) -> Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]:
"""Execute a search query with limit and other optional parameters.
Args:
search (Search): The search query to be executed.
limit (int): The maximum number of results to be returned.
token (Optional[str]): The token used to return the next set of results.
sort (Optional[Dict[str, Dict[str, str]]]): Specifies how the results should be sorted.
collection_ids (Optional[List[str]]): The collection ids to search.
ignore_unavailable (bool, optional): Whether to ignore unavailable collections. Defaults to True.
Returns:
Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]: A tuple containing:
- An iterable of search results, where each result is a dictionary with keys and values representing the
fields and values of each document.
- The total number of results (if the count could be computed), or None if the count could not be
computed.
- The token to be used to retrieve the next set of results, or None if there are no more results.
Raises:
NotFoundError: If the collections specified in `collection_ids` do not exist.
"""
db = self.client[DATABASE]
collection = db[ITEMS_INDEX]
query = {"$and": search.filters} if search and search.filters else {}
for sub_query in query.get('$and', []):
for field in DATETIME_FIELDS:
properties_field = f"properties.{field}"
if properties_field in sub_query:
field_data = sub_query.pop(properties_field)
sub_query[properties_field] = {
key: dateutil.parser.parse(value) if isinstance(value, str) else value
for key, value in field_data.items()
}
if collection_ids:
query["collection"] = {"$in": collection_ids}
sort_criteria = sort if sort else [("id", 1)] # Default sort
try:
if token:
last_id = decode_token(token)
skip_count = int(last_id)
else:
skip_count = 0
cursor = (
collection.find(query)
.sort(sort_criteria)
.skip(skip_count)
.limit(limit + 1)
)
items = await cursor.to_list(length=limit + 1)
next_token = None
if len(items) > limit:
next_skip = skip_count + limit
next_token = base64.urlsafe_b64encode(str(next_skip).encode()).decode()
items = items[:-1]
maybe_count = None
if not token:
maybe_count = await collection.count_documents(query)
return items, maybe_count, next_token
except PyMongoError as e:
print(f"Database operation failed: {e}")
raise
# ...
async def create_item(self, item: Item, refresh: bool = False):
"""
Asynchronously inserts a STAC item into MongoDB, ensuring the item does not already exist.
Args:
item (Item): The STAC item to be created.
refresh (bool, optional): Not used for MongoDB, kept for compatibility with Elasticsearch interface.
Raises:
ConflictError: If the item with the same ID already exists within the collection
NotFoundError: If the specified collection does not exist in MongoDB.
"""
db = self.client[DATABASE]
items_collection = db[ITEMS_INDEX]
collections_collection = db[COLLECTIONS_INDEX]
collection_exists = await collections_collection.count_documents(
{"id": item["collection"]}, limit=1
)
if not collection_exists:
raise NotFoundError(f"Collection {item['collection']} does not exist")
new_item = item.copy()
new_item["_id"] = item.get("_id", ObjectId())
for field in DATETIME_FIELDS:
if field in new_item["properties"]:
new_item["properties"][field] = dateutil.parser.parse(new_item["properties"][field])
existing_item = await items_collection.find_one({"_id": new_item["_id"]})
if existing_item:
raise ConflictError(f"Item with _id {item['_id']} already exists")
await items_collection.insert_one(new_item)
item = serialize_doc(item)
# ... |
Also datetime strings do work in mongo, the issue is that your query needs to use same format |
Thanks for the insights @pedro-cf .
Yes the method
Yes but datetime objects are recommended over datetime strings for MongoDB. Datetime objects are easier to use in Python, it avoids the need for converting strings to python datetime objects and vice-versa. To not break the current behaviour for people who use datetime strings in their MongoDB, I see two possibilities here:
|
The CI pipeline fails because the unit tests related to date/time failed. I guess it's because the test MongoDB is fed with datetime strings instead of datetime objects, but I didn't manage in the code where this happens. |
Actually I found out that it is easy to work with datetime strings in MongoDB, it is transparent, it just required adding the So maybe this PR is not needed after all. But we should at least document that the MongoDB is expected to have datetime strings and not datetime objects. |
@GorgiAstro Definitely sounds like a good idea to document this I think. |
Related Issue(s):
Description:
utilities.py/parse_datestring method now returns datetime and not str, otherwise time filtering does not work for MongoDB databases using datetime objects (which is recommended over using date strings)
PR Checklist:
pre-commit run --all-files
)make test
): no, for the same reason as in create indexes in MongoDB at startup only if env variable (default: true) #31