chore(breadbox): Celery checker #144

jessica-cheng · 2024-12-05T15:24:22Z

Add util function for celery to check if celery is running
For all functions that call a celery task, call that util function to check if celery is running first. The goal is to fail fast if celery isn't running
Added a health check endpoint to check if celery is running (TBD if should keep because the health_check/ok endpoint is already calling that utils function)

To be done: Currently tests failing with celery. Need to modify behavior

…lery task

pgm

Looks fine but I personally think the try/catches repeated are unnecessary and are distracting:

try:
        check_celery()
except CeleryConnectionError as err:
        raise err

What's the motivation for those?

breadbox/breadbox/api/dataset_uploads.py

pgm · 2024-12-05T19:25:08Z

breadbox/breadbox/api/downloads.py

@@ -67,7 +69,10 @@ def export_dataset(
    user: str = Depends(get_user),
    settings: Settings = Depends(get_settings),
 ):
-
+    try:


I wonder if we could consolidate these calls.

Instead of calling utils.check_celery() before job submission, what do you think about putting it in get_task_status?

That's when the worker being offline is a problem: the status will be reported as pending forever.

Interesting! I figured the fail fast approach would be better. I also considered creating a decorator or using fastapi's dependency injection Depends() but that would still modify each endpoint regardless

Ideally, I would put them in celery's delay() function to avoid repeated calls in each function using celery but patching that function felt too hacky…

pgm · 2024-12-05T19:27:13Z

breadbox/breadbox/celery_task/utils.py

+    Pings to see if any worker responds and returns true if successful response detected
+    """
+    inspect = app.control.inspect()
+    try:


I don't know that we need to explicitly connect to the broker before submitting a task. If the broker is offline, then we already get an exception thrown when it tries to submit the task to the broker.

That being said, I suppose it's harmless to try. Probably just costs a few milliseconds and I can't imagine it having any real downside.

@pgm Usually if celery isn't running (at least on development) we've forgotten to start the celery broker. I figured returning an explicit error to the client could be helpful.. This does have 3 retries to connect to the broker which takes a few seconds rather than a few milliseconds though so maybe I should decrease the amount of retries

pgm · 2024-12-05T19:29:43Z

breadbox/breadbox/celery_task/utils.py

+            "Failed to connect to celery redis broker!"
+        ) from exc
+    # Pings workers to see if any of them respond. Returns None if no response
+    ping = inspect.ping()


That's interesting... The worker is asynchronous but this call is clearly synchronous. There must be a timeout baked into this.

Regardless if you've verified that this is reliably identifying when the worker is offline, then looks good to me.

I looked up inspect.ping() and from what I see in this SO post, I think any time that there's a backlog of work, will result in an exception being thrown.

Imagine we have 1 worker, and two users. User A submits a "custom analysis" job and while that's running user B submits a batch download. I think user B will get an exception saying "Celery workers are not responding" instead of A's job running and then B's job running.

Is that behavior intentional? Perhaps it's not bad, but I think at the very least, we'd want a longer timeout then the default of 1 second.

Rejecting requests when there's a backlog has some nice properties ("load shedding") but it also has the downside that when the server is busy, instead of jobs taking longer, users will have their job fail.

I could be wrong though. It's something I think would good to be explicitly try to make sure we understand what's going to happen before deploying this change.

@pgm yes ping() has a default timeout of 1 second. HOWEVER, I just saw a concerning post where calling ping() may not return any response if all workers are busy running tasks. I don't think this is the behavior we want since this is meant to check if workers exist. I'm thinking the stats() like the top stackoverflow post suggested would be better

@pgm Oops I didn't see your subsequent comments and you've basically addressed everything I was pointing out including the SO post 😅 You're right, the behavior of throwing an error when workers are occupied wasn't my intention. I think using stats() like the top SO reply might be what I want since it seems to return None if there are no workers present. I will test this first before making the change official

jessica-cheng · 2024-12-06T14:58:26Z

Looks fine but I personally think the try/catches repeated are unnecessary and are distracting:
try:
        check_celery()
except CeleryConnectionError as err:
        raise err
What's the motivation for those?

I can get rid of them! I think this was before I decided to put the try/excepts in check_celery() instead. I've had a habit of trying to put these try/excepts in the api layer before deciding otherwise

jessica-cheng added 2 commits December 5, 2024 09:29

Add check to see if celery broker or workers are running

6b51a55

Add check if celery is running first for all functions that call a ce…

23fe105

…lery task

jessica-cheng requested review from pgm and alimourey December 5, 2024 15:24

Fix tests with celery check mock

9c7c2f9

pgm reviewed Dec 5, 2024

View reviewed changes

jessica-cheng added 2 commits December 6, 2024 10:04

Remove unnecessary try excepts

2b54521

Change from ping() to stats() for inspecting workers

97962fc

jessica-cheng changed the title ~~Celery pinger~~ chore(breadbox): Celery checker Dec 10, 2024

pgm approved these changes Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(breadbox): Celery checker #144

chore(breadbox): Celery checker #144

jessica-cheng commented Dec 5, 2024 •

edited

Loading

pgm left a comment

pgm Dec 5, 2024

jessica-cheng Dec 6, 2024 •

edited

Loading

pgm Dec 5, 2024

jessica-cheng Dec 6, 2024

pgm Dec 5, 2024

pgm Dec 5, 2024

pgm Dec 5, 2024

jessica-cheng Dec 6, 2024 •

edited

Loading

jessica-cheng Dec 6, 2024

jessica-cheng commented Dec 6, 2024 •

edited

Loading

chore(breadbox): Celery checker #144

Are you sure you want to change the base?

chore(breadbox): Celery checker #144

Conversation

jessica-cheng commented Dec 5, 2024 • edited Loading

pgm left a comment

Choose a reason for hiding this comment

pgm Dec 5, 2024

Choose a reason for hiding this comment

jessica-cheng Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

pgm Dec 5, 2024

Choose a reason for hiding this comment

jessica-cheng Dec 6, 2024

Choose a reason for hiding this comment

pgm Dec 5, 2024

Choose a reason for hiding this comment

pgm Dec 5, 2024

Choose a reason for hiding this comment

pgm Dec 5, 2024

Choose a reason for hiding this comment

jessica-cheng Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

jessica-cheng Dec 6, 2024

Choose a reason for hiding this comment

jessica-cheng commented Dec 6, 2024 • edited Loading

jessica-cheng commented Dec 5, 2024 •

edited

Loading

jessica-cheng Dec 6, 2024 •

edited

Loading

jessica-cheng Dec 6, 2024 •

edited

Loading

jessica-cheng commented Dec 6, 2024 •

edited

Loading