Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:fail early for any required script failure #198

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

say-paul
Copy link
Member

When greenboot encounters an error in required scripts, it breaks out the healtcheck loop and moves to red script execution. Logs are added to notify users of the skipped scripts due to critical failure.

fixes: #143

@say-paul say-paul force-pushed the required_failed_early branch from a5d67cc to 8500356 Compare January 28, 2025 13:12
@say-paul say-paul force-pushed the required_failed_early branch from 8500356 to 6b5427e Compare January 28, 2025 13:22
Copy link
Member

@runcom runcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, will wait for @djach7 ACK to clear up @say-paul 's comment
although, tests are failing somehow

@say-paul say-paul force-pushed the required_failed_early branch from c30ca82 to 1e6f7c2 Compare January 29, 2025 06:35
@LorbusChris
Copy link
Member

I just wanted to note that having all tests run and not breaking after the first failure is actually a feature, not a bug, as in case of multiple failures each one will be reported (otherwise following failures will only be uncovered when the first one is fixed and the device is booted again). Maybe that isn't needed, but OTOH this could be made optional with e.g. an --exit-immediately flag

if [[ "$required_hc_failed" == true && "$mode" == "strict" ]]; then
echo "<5>'$(basename "$script")' was skipped due to a previous failure"
elif is_disabled "$(basename "$script")"; then
echo "<5>'$(basename "$script")' was skipped, as specified in config"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pls update warning /error msg on failure on line 48 & 52.
"relaxed")
echo "<2>$failure_msg. Continuing..." >&2 -> continuing since relaxed mode or something
and for
"strict")
required_hc_failed=true
echo "<0>$failure_msg. Continuing..." >&2 -> Exiting due to required script failure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal to greenboot as when strict/relaxed is used , it logs for users in terms of required/wanted/green/red scripts adding relax/strict will add to the confusion.

usr/libexec/greenboot/greenboot Outdated Show resolved Hide resolved
@@ -82,10 +64,12 @@ case "$1" in
"check")
rc=0
for health_check_path in "${SCRIPTS_CHECK_PATHS[@]}"; do
script_runner "$health_check_path/required.d" "strict" "Running Required Health Check Scripts..." || rc=1
script_runner "$health_check_path/required.d" "strict" "Running Required Health Check Scripts..." || {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the logic part here but why do we need OR (||) operator here?
can we not just use something like
If return val of script_runner !=0 i.e failure then break .
(or is it the bash thing !?)

Copy link
Member Author

@say-paul say-paul Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both approach yields same result, I don't see what if..then .. offers more.

@runcom
Copy link
Member

runcom commented Jan 29, 2025

I just wanted to note that having all tests run and not breaking after the first failure is actually a feature, not a bug, as in case of multiple failures each one will be reported (otherwise following failures will only be uncovered when the first one is fixed and the device is booted again). Maybe that isn't needed, but OTOH this could be made optional with e.g. an --exit-immediately flag

right, this could be enhanced for sure to have something that specifies that we want to exit as soon as a failure happens. Although if even one of the healthcheck fails, we're going to rollback anyway and surely at the next upgrade something else may fail but that's the flow anyway right?

@say-paul say-paul force-pushed the required_failed_early branch from 1e6f7c2 to 283683e Compare February 3, 2025 11:37
@say-paul
Copy link
Member Author

say-paul commented Feb 3, 2025

I agree with @runcom that even if one or multiple scripts fail, greenboot will follow the same path of running red scripts.. then restart/rollback.
@LorbusChris what issue do you see if all the scripts are not checked?

When greenboot encounters an error in required scripts,it braks out of
the healthcheck loop and moves to redscript excution skipping the other
required and wanted scripts.Logs are added to notify users of the
skipped required scripts due to critical failure.Test modified to
validate only one error is reported.Also cleasning code base of redundant
codes.

Signed-off-by: saypaul <[email protected]>
@say-paul say-paul force-pushed the required_failed_early branch from 4e9e152 to b943b39 Compare February 7, 2025 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Greenboot does not fail when first error is found
5 participants