You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
possibly offer a way for an automated "lateness" alert based on past runs time, for example alert anything that is >3 std from avg as a simple toggle
reason: sometimes the download might take much longer due to more data produced (could be 100x more in db migration/reconciliation cases), sometimes, the download might be stuck. In a different case (airflow separating scheduler from runner) the SLA check is done by the scheduler because the runner might be dead (zombie - dead but did not report failure, or running but disconnected from reporting success)
It would be nice to also collect any variables that were sent to the resource when data was requested, so we can report back which instance/request/date/segment of the task failed.
We need user-readable logs
and callback parameters
Pipeline/ task name
env name - In airflow you have a base url that tells you which instance is failing.
logs url
Schema changes
error message
success message:
Partial success:
Retry:
The text was updated successfully, but these errors were encountered: