Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

Open
aiChaoSONG opened this issue Aug 4, 2020 · 9 comments · Fixed by #400
Open

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

aiChaoSONG opened this issue Aug 4, 2020 · 9 comments · Fixed by #400
Labels
False Pass / green failure good first issue Good for newcomers P2 Critical bugs or normal features type:bug Something doesn't work as expected

Comments

@aiChaoSONG
Copy link

aiChaoSONG commented Aug 4, 2020

Add set -e to all of our test case for better error handling.

EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add set -e to a test before it's shellcheck-clean (#729)

@marc-hb 's very long description added here:

Rationale

The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.

There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue: set -e which is a shortcut for set -o errexit. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.

errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using set -e requires some minor coding changes described below.

Why stop immediately when something fails?

https://en.wikipedia.org/wiki/Fail-fast

  1. This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s).

  2. Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more errors at random points in the future until some error gets finally caught manually. This means
    a) The last, only error message users look at is not relevant and very confusing, see 1. above.
    b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example.

some_directory=$(failing_command)  
# without set -e the entire $basedir gets deleted!
rm -rf $basedir/$some_directory 

Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.

An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.

Why not just check the exit code of every command like in C?

Because it makes the code extremely verbose.

Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.

Should I still manually check the exit code of some commands (which ones) even when using set -e?

You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:

cmd1_with_bad_error_messages "$somefile" ||
    die 'Failed to run cmd1 on file %s\n' "$somefile"

This will print:

File not found               # from cmd1
Failed to run cmd1 on file some_file  # from you

Why do I need "some_command || true" and other tricks?

There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding set -e to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.
grep is a typical example, this does NOT work, the script stops silently:

set -e

found_maybe=$( some_command | grep something) # aborts when not found
if [ -n "$something"] ; then ...  # does not run when nothing found

OK:

set -e

found_maybe=$( some_command | grep something || true)
if [ -n "$found_maybe"] ; then ...

|| true works because set -e is obviously disabled when checking the exit code manually.
Also OK:

set -e

if [ -n "$( some_command | grep something)" ] ; then ...

The return value of expr can sometimes be non-zero for surprising reasons. Simple fix: don't use expr at all, it's antiquated. Use $(( )) instead. See https://github.com/koalaman/shellcheck/wiki/SC2003

How do I use $?

Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181

But sometimes you really need $?, for instance because you want to treat different error codes differently. Solution:

ret=0
some_command || ret=$?
case $ret in ...

Avoid cmd1 && cmd2

https://www.shellcheck.net/wiki/SC2015

set -e is compatible with && and it is possible to use both in the same script, however the interaction between && and set -e can be quite tricky. Prefer || and if then else over && when possible.

Especially avoid negations like: test not good_thing && handle_bad, always replace this with the simpler, "assert-style": test good_thing || handle_bad.

Examples:

# set -e aborts the script as expected after cmd2_fails
cmd1_OK && cmd2_fails

# ... but set -e has no effect when the first command fails,
# the script keeps running after this!
cmd1_fails && cmd2_does_not_run

To remember why set -e has no effect when the cmd1 fails, just remember that cmd1 && cmd2 is similar to if cmd1; then cmd2; fi and that set -e obviously doesn't affect tests and conditions, otherwise conditionals would not be possible.

Demo:

set -e
false && true # does not abort
true && false # aborts

You especially want to avoid && on the last line of the script for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuiting

Must separate local/export and $( )

# set -e does NOT catch this failure!
local foo=$(failed_command) 
# FIXED: set -e does catch this failure
local foo; foo=$(failed_command)

https://github.com/koalaman/shellcheck/wiki/SC2155

More

http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions

Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to set -e.

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 4, 2020

This should be done very gradually on a test by test basis, starting of course with all new test cases like #296

marc-hb added a commit to marc-hb/sof-test that referenced this issue Aug 12, 2020
Now that we have commit 1aef6b9 ("print a call trace on every
failure") any premature exit is easy to locate even if doesn't print any
error.

Incremental progress towards issue thesofproject#312 "add set -e to all
tests". Provides some decent coverage of case-lib/*.sh routines while
carefully avoiding any big disruption.

Will also help detect failures to start the logger when combined with
some other, incoming work.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof-test that referenced this issue Aug 13, 2020
Now that we have commit 1aef6b9 ("print a call trace on every
failure") any premature exit is easy to locate even if doesn't print any
error.

Incremental progress towards issue thesofproject#312 "add set -e to all
tests". Provides some decent coverage of case-lib/*.sh routines while
carefully avoiding any big disruption.

Will also help detect failures to start the logger when combined with
some other, incoming work.

Signed-off-by: Marc Herbert <[email protected]>
aiChaoSONG pushed a commit that referenced this issue Aug 13, 2020
Now that we have commit 1aef6b9 ("print a call trace on every
failure") any premature exit is easy to locate even if doesn't print any
error.

Incremental progress towards issue #312 "add set -e to all
tests". Provides some decent coverage of case-lib/*.sh routines while
carefully avoiding any big disruption.

Will also help detect failures to start the logger when combined with
some other, incoming work.

Signed-off-by: Marc Herbert <[email protected]>
@mengdonglin mengdonglin added type:enhancement New framework feature or request P1 Blocker bugs or important features labels Sep 14, 2020
RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 21, 2020
RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 21, 2020
RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 23, 2020
xiulipan pushed a commit that referenced this issue Sep 24, 2020
@marc-hb
Copy link
Collaborator

marc-hb commented Sep 24, 2020

I think there is still a fair number of tests without set -e.

@marc-hb marc-hb reopened this Sep 24, 2020
@mengdonglin
Copy link
Contributor

@marc-hb @aiChaoSONG @keqiaozhang Is this feature done?

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 19, 2020

No but I'm going to reduce the priority because more than half the tests have set -e now.

@marc-hb marc-hb added good first issue Good for newcomers negative testing P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Oct 19, 2020
@libinyang
Copy link
Contributor

@marc-hb I found some cases are hard to use set -e, such as cases with grep. Do you have some good suggestion? My current user space cases met such issue.

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 24, 2020

Sorry for the delay, too much email.

For grep it's usually as easy as:

something=$( ... | grep something || true)

Please provide more details/sample code if this does not seem possible.

@marc-hb marc-hb changed the title [sof-test] add set -e to all test cases [sof-test] add set -e to all test cases (Unless You Love Debugging) Oct 26, 2020
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 26, 2020

Do you have some good suggestion?

I just added a very long description with rationale, guidelines, workarounds and a good reference.

aiChaoSONG pushed a commit that referenced this issue Nov 17, 2020
Notably remove top-level "expr" because "expr 0" is surprisingly a
failure and expr is antiquited anyway (SC2003)

Progress towards #312.

Also add a dlogc() before killing so no one mistakes the deaths of aplay
and arecord for failures.

Add shellcheck source=case-lib/lib.sh, removes 5 warnings.

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 23, 2021

https://sof-ci.01.org/softestpr/PR663/build673/devicetest/ is an interesting list of tests that don't have set -e. Almost all tests should have failed on BYT but some did not because they don't have set -e.

@marc-hb marc-hb added type:bug Something doesn't work as expected and removed type:enhancement New framework feature or request labels Jun 10, 2021
marc-hb added a commit to marc-hb/sof-test that referenced this issue Sep 17, 2021
For reasons beyond the scope of this commit, our Jenkins CI does not
know when a test script has finished to run. Instead of checking the
exit status like a normal test runner it waits for a "FAIL" or "PASS" or
"N/A" string to be printed. When the test aborts without printing one of
these, Jenkins falls back on a fairly long timeout - per test.

This unusual approach has been causing issues with `set -e` (thesofproject#312)

In some relatively rare but possible failures the DSP or the drivers do
not load at all, examples:

https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/
https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/
https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/

In that case the $logFile is not found and many tests abort without
printing FAIL.

This commit prints FAIL when $logfile is missing and avoids the many
per-test Jenkins timeouts.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit that referenced this issue Sep 21, 2021
For reasons beyond the scope of this commit, our Jenkins CI does not
know when a test script has finished to run. Instead of checking the
exit status like a normal test runner it waits for a "FAIL" or "PASS" or
"N/A" string to be printed. When the test aborts without printing one of
these, Jenkins falls back on a fairly long timeout - per test.

This unusual approach has been causing issues with `set -e` (#312)

In some relatively rare but possible failures the DSP or the drivers do
not load at all, examples:

https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/
https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/
https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/

In that case the $logFile is not found and many tests abort without
printing FAIL.

This commit prints FAIL when $logfile is missing and avoids the many
per-test Jenkins timeouts.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof-test that referenced this issue May 2, 2022
To help with thesofproject#312, commit 2e24024 ("test-case: add set -e to 7
function test") blindly added "set -e" without reviewing the code of any
test and without making sure it's actually compatible with set -e. In
this test this meant missing the rtcwake exit code.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit that referenced this issue May 4, 2022
To help with #312, commit 2e24024 ("test-case: add set -e to 7
function test") blindly added "set -e" without reviewing the code of any
test and without making sure it's actually compatible with set -e. In
this test this meant missing the rtcwake exit code.

Signed-off-by: Marc Herbert <[email protected]>
@greg-intel
Copy link
Contributor

Reviewing the tests here: https://github.com/thesofproject/sof-test/tree/main/test-case

The remaining tests without set -e:

  • check-alsabat.sh
  • check-runtime-pm-double-active.sh
  • check-runtime-pm-status.sh
  • check-xrun-injection.sh
  • simultaneous-playback-capture.sh
  • test-speaker.sh
  • verify-ucm-config.sh
  • volume-basic-test.sh

Also, the location of set -e bounces between different spots in the files. sometimes before the comment block, sometimes after...

marc-hb added a commit to marc-hb/sof-test that referenced this issue Jun 27, 2023
Partial fix for thesofproject#312

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof-test that referenced this issue Jun 29, 2023
Partial fix for thesofproject#312

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit that referenced this issue Jul 4, 2023
Partial fix for #312

Signed-off-by: Marc Herbert <[email protected]>
marc-hb referenced this issue in caldwell/build-emacs Feb 10, 2024
This was its intended use--did I even test this??
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
False Pass / green failure good first issue Good for newcomers P2 Critical bugs or normal features type:bug Something doesn't work as expected
Projects
None yet
5 participants