[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

aiChaoSONG · 2020-08-04T06:18:34Z

Add set -e to all of our test case for better error handling.

EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add set -e to a test before it's shellcheck-clean (#729)

@marc-hb 's very long description added here:

Rationale

The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.

There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue: set -e which is a shortcut for set -o errexit. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.

errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using set -e requires some minor coding changes described below.

Why stop immediately when something fails?

https://en.wikipedia.org/wiki/Fail-fast

This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s).
Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more errors at random points in the future until some error gets finally caught manually. This means
a) The last, only error message users look at is not relevant and very confusing, see 1. above.
b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example.

some_directory=$(failing_command)  
# without set -e the entire $basedir gets deleted!
rm -rf $basedir/$some_directory

Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.

An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.

Why not just check the exit code of every command like in C?

Because it makes the code extremely verbose.

Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.

Should I still manually check the exit code of some commands (which ones) even when using set -e?

You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:

cmd1_with_bad_error_messages "$somefile" ||
    die 'Failed to run cmd1 on file %s\n' "$somefile"

This will print:

File not found               # from cmd1
Failed to run cmd1 on file some_file  # from you

Why do I need "some_command || true" and other tricks?

There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding set -e to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.
grep is a typical example, this does NOT work, the script stops silently:

set -e

found_maybe=$( some_command | grep something) # aborts when not found
if [ -n "$something"] ; then ...  # does not run when nothing found

OK:

set -e

found_maybe=$( some_command | grep something || true)
if [ -n "$found_maybe"] ; then ...

|| true works because set -e is obviously disabled when checking the exit code manually.
Also OK:

set -e

if [ -n "$( some_command | grep something)" ] ; then ...

The return value of expr can sometimes be non-zero for surprising reasons. Simple fix: don't use expr at all, it's antiquated. Use $(( )) instead. See https://github.com/koalaman/shellcheck/wiki/SC2003

How do I use $?

Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181

But sometimes you really need $?, for instance because you want to treat different error codes differently. Solution:

ret=0
some_command || ret=$?
case $ret in ...

Avoid cmd1 && cmd2

https://www.shellcheck.net/wiki/SC2015

set -e is compatible with && and it is possible to use both in the same script, however the interaction between && and set -e can be quite tricky. Prefer || and if then else over && when possible.

Especially avoid negations like: test not good_thing && handle_bad, always replace this with the simpler, "assert-style": test good_thing || handle_bad.

Examples:

# set -e aborts the script as expected after cmd2_fails
cmd1_OK && cmd2_fails

# ... but set -e has no effect when the first command fails,
# the script keeps running after this!
cmd1_fails && cmd2_does_not_run

To remember why set -e has no effect when the cmd1 fails, just remember that cmd1 && cmd2 is similar to if cmd1; then cmd2; fi and that set -e obviously doesn't affect tests and conditions, otherwise conditionals would not be possible.

Demo:

set -e
false && true # does not abort
true && false # aborts

You especially want to avoid && on the last line of the script for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuiting

Must separate local/export and $( )

# set -e does NOT catch this failure!
local foo=$(failed_command) 
# FIXED: set -e does catch this failure
local foo; foo=$(failed_command)

https://github.com/koalaman/shellcheck/wiki/SC2155

More

http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions

Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to set -e.

The text was updated successfully, but these errors were encountered:

marc-hb · 2020-08-04T19:13:31Z

This should be done very gradually on a test by test basis, starting of course with all new test cases like #296

Now that we have commit 1aef6b9 ("print a call trace on every failure") any premature exit is easy to locate even if doesn't print any error. Incremental progress towards issue thesofproject#312 "add set -e to all tests". Provides some decent coverage of case-lib/*.sh routines while carefully avoiding any big disruption. Will also help detect failures to start the logger when combined with some other, incoming work. Signed-off-by: Marc Herbert <[email protected]>

Now that we have commit 1aef6b9 ("print a call trace on every failure") any premature exit is easy to locate even if doesn't print any error. Incremental progress towards issue #312 "add set -e to all tests". Provides some decent coverage of case-lib/*.sh routines while carefully avoiding any big disruption. Will also help detect failures to start the logger when combined with some other, incoming work. Signed-off-by: Marc Herbert <[email protected]>

Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>

Fixes: #312 Signed-off-by: RuiqingX <[email protected]>

marc-hb · 2020-09-24T05:10:25Z

I think there is still a fair number of tests without set -e.

mengdonglin · 2020-10-19T06:52:57Z

@marc-hb @aiChaoSONG @keqiaozhang Is this feature done?

marc-hb · 2020-10-19T17:14:11Z

No but I'm going to reduce the priority because more than half the tests have set -e now.

libinyang · 2020-10-20T01:51:03Z

@marc-hb I found some cases are hard to use set -e, such as cases with grep. Do you have some good suggestion? My current user space cases met such issue.

marc-hb · 2020-10-24T05:34:08Z

Sorry for the delay, too much email.

For grep it's usually as easy as:

something=$( ... | grep something || true)

Please provide more details/sample code if this does not seem possible.

marc-hb · 2020-10-26T07:17:23Z

Do you have some good suggestion?

I just added a very long description with rationale, guidelines, workarounds and a good reference.

Notably remove top-level "expr" because "expr 0" is surprisingly a failure and expr is antiquited anyway (SC2003) Progress towards #312. Also add a dlogc() before killing so no one mistakes the deaths of aplay and arecord for failures. Add shellcheck source=case-lib/lib.sh, removes 5 warnings. Signed-off-by: Marc Herbert <[email protected]>

marc-hb · 2021-04-23T06:31:20Z

https://sof-ci.01.org/softestpr/PR663/build673/devicetest/ is an interesting list of tests that don't have set -e. Almost all tests should have failed on BYT but some did not because they don't have set -e.

For reasons beyond the scope of this commit, our Jenkins CI does not know when a test script has finished to run. Instead of checking the exit status like a normal test runner it waits for a "FAIL" or "PASS" or "N/A" string to be printed. When the test aborts without printing one of these, Jenkins falls back on a fairly long timeout - per test. This unusual approach has been causing issues with `set -e` (thesofproject#312) In some relatively rare but possible failures the DSP or the drivers do not load at all, examples: https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/ https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/ https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/ In that case the $logFile is not found and many tests abort without printing FAIL. This commit prints FAIL when $logfile is missing and avoids the many per-test Jenkins timeouts. Signed-off-by: Marc Herbert <[email protected]>

For reasons beyond the scope of this commit, our Jenkins CI does not know when a test script has finished to run. Instead of checking the exit status like a normal test runner it waits for a "FAIL" or "PASS" or "N/A" string to be printed. When the test aborts without printing one of these, Jenkins falls back on a fairly long timeout - per test. This unusual approach has been causing issues with `set -e` (#312) In some relatively rare but possible failures the DSP or the drivers do not load at all, examples: https://sof-ci.01.org/sofpr/PR4766/build10353/devicetest/ https://sof-ci.01.org/sofpr/PR4768/build10348/devicetest/ https://sof-ci.01.org/sofpr/PR4782/build10387/devicetest/ In that case the $logFile is not found and many tests abort without printing FAIL. This commit prints FAIL when $logfile is missing and avoids the many per-test Jenkins timeouts. Signed-off-by: Marc Herbert <[email protected]>

To help with thesofproject#312, commit 2e24024 ("test-case: add set -e to 7 function test") blindly added "set -e" without reviewing the code of any test and without making sure it's actually compatible with set -e. In this test this meant missing the rtcwake exit code. Signed-off-by: Marc Herbert <[email protected]>

To help with #312, commit 2e24024 ("test-case: add set -e to 7 function test") blindly added "set -e" without reviewing the code of any test and without making sure it's actually compatible with set -e. In this test this meant missing the rtcwake exit code. Signed-off-by: Marc Herbert <[email protected]>

greg-intel · 2023-02-15T22:31:17Z

Reviewing the tests here: https://github.com/thesofproject/sof-test/tree/main/test-case

The remaining tests without set -e:

check-alsabat.sh
check-runtime-pm-double-active.sh
check-runtime-pm-status.sh
check-xrun-injection.sh
simultaneous-playback-capture.sh
test-speaker.sh
verify-ucm-config.sh
volume-basic-test.sh

Also, the location of set -e bounces between different spots in the files. sometimes before the comment block, sometimes after...

Partial fix for thesofproject#312 Signed-off-by: Marc Herbert <[email protected]>

Partial fix for #312 Signed-off-by: Marc Herbert <[email protected]>

This was its intended use--did I even test this??

aiChaoSONG mentioned this issue Aug 4, 2020

test-case: add case for DSM feature test #296

Merged

marc-hb mentioned this issue Aug 5, 2020

test-case: add echo reference test case #299

Merged

marc-hb mentioned this issue Aug 12, 2020

Add set -e to 5 basic tests #336

Merged

mengdonglin added type:enhancement New framework feature or request P1 Blocker bugs or important features labels Sep 14, 2020

RuiqingX mentioned this issue Sep 21, 2020

test-case: add set -e to 5 basic tests #387

Closed

RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 21, 2020

test-case: add set -e to 5 basic test

54f3598

Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>

RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 21, 2020

test-case: add set -e to 5 basic test

d858d48

Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>

RuiqingX mentioned this issue Sep 21, 2020

test-case: add set -e to 5 function test #388

Closed

RuiqingX added a commit to RuiqingX/sof-test that referenced this issue Sep 23, 2020

test-case: add set -e to 7 function test

0a2db44

Fixes: thesofproject#312 Signed-off-by: RuiqingX <[email protected]>

RuiqingX mentioned this issue Sep 23, 2020

test-case: add set -e to 7 function test #400

Merged

xiulipan closed this as completed in #400 Sep 24, 2020

xiulipan pushed a commit that referenced this issue Sep 24, 2020

test-case: add set -e to 7 function test

2e24024

Fixes: #312 Signed-off-by: RuiqingX <[email protected]>

marc-hb reopened this Sep 24, 2020

marc-hb added good first issue Good for newcomers negative testing P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Oct 19, 2020

marc-hb changed the title ~~[sof-test] add set -e to all test cases~~ [sof-test] add set -e to all test cases (Unless You Love Debugging) Oct 26, 2020

marc-hb mentioned this issue Oct 27, 2020

add User space test cases #363

Closed

marc-hb mentioned this issue Mar 29, 2021

[FEATURE] sof-test needs to support Fedora 34 #633

Closed

marc-hb mentioned this issue Apr 12, 2021

ASoC: Intel: boards: use software node API thesofproject/linux#2810

Merged

marc-hb mentioned this issue Apr 23, 2021

hijack.sh: make late logger config changes to nudge stuck DMA traces #663

Merged

This was referenced Jun 7, 2021

[BUG] multiple-pause-resume misses failure due to random wait times #694

Open

Collect exit status of (aplay) commands started in the background #702

Open

marc-hb added type:bug Something doesn't work as expected and removed type:enhancement New framework feature or request labels Jun 10, 2021

marc-hb mentioned this issue Jun 21, 2021

topology: align all cAVS nocodec topologies thesofproject/sof#4378

Merged

marc-hb mentioned this issue Jul 3, 2021

fix all shellcheck warnings #729

Open

marc-hb added the False Pass / green failure label Jul 3, 2021

marc-hb mentioned this issue Sep 17, 2021

hijack.sh: print FAIL on missing logfile to avoid long CI timeout #769

Merged

marc-hb mentioned this issue Jan 31, 2022

[BUG] envcheck.sh doesn't detect octave packages on Fedora 34 #780

Closed

plbossart mentioned this issue Feb 10, 2022

kconfig: add two scripts to update distro config to sof and sof-dev thesofproject/kconfig#68

Merged

marc-hb mentioned this issue Jul 13, 2022

[RFC]: Add i.MX tests for module insert/removal #927

Merged

marc-hb mentioned this issue Mar 15, 2023

Tools: Testbench: xt-testbench thesofproject/sof#6513

Merged

marc-hb added a commit to marc-hb/sof-test that referenced this issue Jun 27, 2023

volume-basic-test.sh: add set -e

d5b902b

Partial fix for thesofproject#312 Signed-off-by: Marc Herbert <[email protected]>

marc-hb added a commit to marc-hb/sof-test that referenced this issue Jun 29, 2023

volume-basic-test.sh: add set -e

a4f0a22

Partial fix for thesofproject#312 Signed-off-by: Marc Herbert <[email protected]>

marc-hb added a commit that referenced this issue Jul 4, 2023

volume-basic-test.sh: add set -e

ae04c01

Partial fix for #312 Signed-off-by: Marc Herbert <[email protected]>

marc-hb mentioned this issue Oct 27, 2023

Add docker compose based test environment for Linux zephyrproject-rtos/west#687

Closed

marc-hb referenced this issue in caldwell/build-emacs Feb 10, 2024

launch-nw: Fix so it can be used as a symlink target

d806d0c

This was its intended use--did I even test this??

marc-hb mentioned this issue Mar 14, 2024

LLEXT full ARM relocation support (adding executable) zephyrproject-rtos/zephyr#70171

Closed

marc-hb mentioned this issue Apr 4, 2024

case-lib: is_ipc4: use new debugfs file instead of kernel parameter #1167

Merged

marc-hb removed the area:negative testing label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

aiChaoSONG commented Aug 4, 2020 •

edited by marc-hb

Loading

marc-hb commented Aug 4, 2020

marc-hb commented Sep 24, 2020

mengdonglin commented Oct 19, 2020

marc-hb commented Oct 19, 2020 •

edited

Loading

libinyang commented Oct 20, 2020

marc-hb commented Oct 24, 2020

marc-hb commented Oct 26, 2020

marc-hb commented Apr 23, 2021

greg-intel commented Feb 15, 2023

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

Comments

aiChaoSONG commented Aug 4, 2020 • edited by marc-hb Loading

Rationale

Why stop immediately when something fails?

Why not just check the exit code of every command like in C?

Should I still manually check the exit code of some commands (which ones) even when using set -e?

Why do I need "some_command || true" and other tricks?

How do I use $?

Avoid cmd1 && cmd2

Must separate local/export and $( )

More

marc-hb commented Aug 4, 2020

marc-hb commented Sep 24, 2020

mengdonglin commented Oct 19, 2020

marc-hb commented Oct 19, 2020 • edited Loading

libinyang commented Oct 20, 2020

marc-hb commented Oct 24, 2020

marc-hb commented Oct 26, 2020

marc-hb commented Apr 23, 2021

greg-intel commented Feb 15, 2023

aiChaoSONG commented Aug 4, 2020 •

edited by marc-hb

Loading

marc-hb commented Oct 19, 2020 •

edited

Loading