Off by one hour issue with daily workflow trigger #116
Labels
bug
Something isn't working correctly
funder/drl2022-2024
priority/high
Important issue that needs attention soon
While performing some QA on the data generated by the new pipeline, I realized that since the switch to airflow there is an off-by-one issue with the schedule.
The root cause for this is that the schedule is defined as
@daily
, which means that it's going to trigger at midnight of the day of interest.Since the measurement uploader lags behind by 1 hour (it only uploads things to s3 after the last hour has elapsed) this may result in the bucket not being fully populated (or not populated at all) which results in the last hour of the day not getting processed.
As a temporary solution we should adjust the daily schedule to run at 3 AM like we used to with temporal.
The more long term solution would be to specify a dependency relationship between the uploader and the observation generation workflows as specified in: ooni/backend#926.
This issue should also be considered when switching observation generation to hourly (see: #111).
The text was updated successfully, but these errors were encountered: