Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off by one hour issue with daily workflow trigger #116

Closed
hellais opened this issue Jan 23, 2025 · 2 comments
Closed

Off by one hour issue with daily workflow trigger #116

hellais opened this issue Jan 23, 2025 · 2 comments
Assignees
Labels
bug Something isn't working correctly funder/drl2022-2024 priority/high Important issue that needs attention soon

Comments

@hellais
Copy link
Member

hellais commented Jan 23, 2025

While performing some QA on the data generated by the new pipeline, I realized that since the switch to airflow there is an off-by-one issue with the schedule.

The root cause for this is that the schedule is defined as @daily, which means that it's going to trigger at midnight of the day of interest.
Since the measurement uploader lags behind by 1 hour (it only uploads things to s3 after the last hour has elapsed) this may result in the bucket not being fully populated (or not populated at all) which results in the last hour of the day not getting processed.

As a temporary solution we should adjust the daily schedule to run at 3 AM like we used to with temporal.

The more long term solution would be to specify a dependency relationship between the uploader and the observation generation workflows as specified in: ooni/backend#926.

This issue should also be considered when switching observation generation to hourly (see: #111).

@hellais hellais added bug Something isn't working correctly funder/drl2022-2024 priority/high Important issue that needs attention soon labels Jan 23, 2025
@hellais hellais self-assigned this Jan 23, 2025
@hellais hellais added this to Roadmap Jan 23, 2025
@hellais hellais moved this to Sprint Backlog in Roadmap Jan 23, 2025
@hellais
Copy link
Member Author

hellais commented Jan 23, 2025

It's worth noting that the uploader timer is triggered at 10 minutes past the hour https://github.com/ooni/backend/blob/master/api/debian/ooni-api-uploader.timer#L7.

A recent measurement upload run is taking 7 minutes, so if we offset the schedule by 30 minutes we should be quite safe that the uploader task has finished running.

hellais added a commit that referenced this issue Jan 23, 2025
* Fixes off-by one in workflow trigger: #116
* Closes: #109
* Closes: #111
@hellais
Copy link
Member Author

hellais commented Jan 24, 2025

A fix for this has been deployed.

@hellais hellais closed this as completed Jan 24, 2025
@github-project-automation github-project-automation bot moved this from Sprint Backlog to Done in Roadmap Jan 24, 2025
@hellais hellais moved this from Done to TMP in Roadmap Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly funder/drl2022-2024 priority/high Important issue that needs attention soon
Projects
Archived in project
Development

No branches or pull requests

1 participant