-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sof-test] add set -e to all test cases (Unless You Love Debugging) #312
Comments
This should be done very gradually on a test by test basis, starting of course with all new test cases like #296 |
Now that we have commit 1aef6b9 ("print a call trace on every failure") any premature exit is easy to locate even if doesn't print any error. Incremental progress towards issue thesofproject#312 "add set -e to all tests". Provides some decent coverage of case-lib/*.sh routines while carefully avoiding any big disruption. Will also help detect failures to start the logger when combined with some other, incoming work. Signed-off-by: Marc Herbert <[email protected]>
Now that we have commit 1aef6b9 ("print a call trace on every failure") any premature exit is easy to locate even if doesn't print any error. Incremental progress towards issue thesofproject#312 "add set -e to all tests". Provides some decent coverage of case-lib/*.sh routines while carefully avoiding any big disruption. Will also help detect failures to start the logger when combined with some other, incoming work. Signed-off-by: Marc Herbert <[email protected]>
Now that we have commit 1aef6b9 ("print a call trace on every failure") any premature exit is easy to locate even if doesn't print any error. Incremental progress towards issue #312 "add set -e to all tests". Provides some decent coverage of case-lib/*.sh routines while carefully avoiding any big disruption. Will also help detect failures to start the logger when combined with some other, incoming work. Signed-off-by: Marc Herbert <[email protected]>
Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>
Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>
Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>
Fixes: #312 Signed-off-by: RuiqingX <[email protected]>
I think there is still a fair number of tests without |
@marc-hb @aiChaoSONG @keqiaozhang Is this feature done? |
No but I'm going to reduce the priority because more than half the tests have |
@marc-hb I found some cases are hard to use |
Sorry for the delay, too much email. For something=$( ... | grep something || true) Please provide more details/sample code if this does not seem possible. |
I just added a very long description with rationale, guidelines, workarounds and a good reference. |
Notably remove top-level "expr" because "expr 0" is surprisingly a failure and expr is antiquited anyway (SC2003) Progress towards #312. Also add a dlogc() before killing so no one mistakes the deaths of aplay and arecord for failures. Add shellcheck source=case-lib/lib.sh, removes 5 warnings. Signed-off-by: Marc Herbert <[email protected]>
https://sof-ci.01.org/softestpr/PR663/build673/devicetest/ is an interesting list of tests that don't have |
For reasons beyond the scope of this commit, our Jenkins CI does not know when a test script has finished to run. Instead of checking the exit status like a normal test runner it waits for a "FAIL" or "PASS" or "N/A" string to be printed. When the test aborts without printing one of these, Jenkins falls back on a fairly long timeout - per test. This unusual approach has been causing issues with `set -e` (thesofproject#312) In some relatively rare but possible failures the DSP or the drivers do not load at all, examples: https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/ https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/ https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/ In that case the $logFile is not found and many tests abort without printing FAIL. This commit prints FAIL when $logfile is missing and avoids the many per-test Jenkins timeouts. Signed-off-by: Marc Herbert <[email protected]>
For reasons beyond the scope of this commit, our Jenkins CI does not know when a test script has finished to run. Instead of checking the exit status like a normal test runner it waits for a "FAIL" or "PASS" or "N/A" string to be printed. When the test aborts without printing one of these, Jenkins falls back on a fairly long timeout - per test. This unusual approach has been causing issues with `set -e` (#312) In some relatively rare but possible failures the DSP or the drivers do not load at all, examples: https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/ https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/ https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/ In that case the $logFile is not found and many tests abort without printing FAIL. This commit prints FAIL when $logfile is missing and avoids the many per-test Jenkins timeouts. Signed-off-by: Marc Herbert <[email protected]>
To help with thesofproject#312, commit 2e24024 ("test-case: add set -e to 7 function test") blindly added "set -e" without reviewing the code of any test and without making sure it's actually compatible with set -e. In this test this meant missing the rtcwake exit code. Signed-off-by: Marc Herbert <[email protected]>
To help with #312, commit 2e24024 ("test-case: add set -e to 7 function test") blindly added "set -e" without reviewing the code of any test and without making sure it's actually compatible with set -e. In this test this meant missing the rtcwake exit code. Signed-off-by: Marc Herbert <[email protected]>
Reviewing the tests here: https://github.com/thesofproject/sof-test/tree/main/test-case The remaining tests without
Also, the location of |
Partial fix for thesofproject#312 Signed-off-by: Marc Herbert <[email protected]>
Partial fix for thesofproject#312 Signed-off-by: Marc Herbert <[email protected]>
Partial fix for #312 Signed-off-by: Marc Herbert <[email protected]>
This was its intended use--did I even test this??
Add set -e to all of our test case for better error handling.
EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add
set -e
to a test before it's shellcheck-clean (#729)@marc-hb 's very long description added here:
Rationale
The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.
There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue:
set -e
which is a shortcut forset -o errexit
. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using
set -e
requires some minor coding changes described below.Why stop immediately when something fails?
https://en.wikipedia.org/wiki/Fail-fast
This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s).
Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more errors at random points in the future until some error gets finally caught manually. This means
a) The last, only error message users look at is not relevant and very confusing, see 1. above.
b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example.
Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.
An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.
Why not just check the exit code of every command like in C?
Because it makes the code extremely verbose.
Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.
Should I still manually check the exit code of some commands (which ones) even when using set -e?
You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:
This will print:
Why do I need "some_command || true" and other tricks?
There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding
set -e
to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.grep is a typical example, this does NOT work, the script stops silently:
OK:
|| true
works becauseset -e
is obviously disabled when checking the exit code manually.Also OK:
The return value of
expr
can sometimes be non-zero for surprising reasons. Simple fix: don't useexpr
at all, it's antiquated. Use$(( ))
instead. See https://github.com/koalaman/shellcheck/wiki/SC2003How do I use $?
Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181
But sometimes you really need
$?
, for instance because you want to treat different error codes differently. Solution:Avoid cmd1 && cmd2
https://www.shellcheck.net/wiki/SC2015
set -e
is compatible with&&
and it is possible to use both in the same script, however the interaction between&&
andset -e
can be quite tricky. Prefer||
andif then else
over&&
when possible.Especially avoid negations like:
test not good_thing && handle_bad
, always replace this with the simpler, "assert-style":test good_thing || handle_bad
.Examples:
To remember why set -e has no effect when the cmd1 fails, just remember that
cmd1 && cmd2
is similar toif cmd1; then cmd2; fi
and thatset -e
obviously doesn't affect tests and conditions, otherwise conditionals would not be possible.Demo:
You especially want to avoid
&&
on the last line of the script for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuitingMust separate local/export and $( )
https://github.com/koalaman/shellcheck/wiki/SC2155
More
http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions
Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to
set -e
.The text was updated successfully, but these errors were encountered: