Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test flakiness #3

Open
lydell opened this issue Mar 12, 2022 · 1 comment
Open

Improve test flakiness #3

lydell opened this issue Mar 12, 2022 · 1 comment

Comments

@lydell
Copy link
Owner

lydell commented Mar 12, 2022

The tests are very comprehensive, and I’m very happy that they helped me find so many edge cases. They are written at a very high level, which gives a lot of confidence (and should make a potential rewrite in another language nice). However, the high level involves real time passing, real file system watching and real Web Sockets. While that did help me understand for example file watching better (like, how many watcher events do you get if the same file changes rapidly?), it does make the tests a bit flaky. I’ve used some pretty … clever … hacks to stabilize many tests, but not all.

Currently, the tests pass locally on Linux, Windows and macOS. In CI, jest.retryTimes(2) is needed but even with that one or two jobs usually fail and manually restarting them it’s possible to get all green checkmarks. So the tests still give a lot of confidence, they’re just a little bit annoying.

It doesn’t help that I got tired of testing and “fixed” some tests with arbitrary sleeps (in the tests, not in the source code). The readme says “elm-watch is serious about stability” – so this is a bit embarrassing.

As soon as I get some more energy I want to get back to this and clean the tests up.

lydell added a commit that referenced this issue Jul 16, 2022
As a stop-gap solution for #3, retrying flaky tests improves confidence by getting those green checkmarks.

The retried tests are logged so I should be able to scrape which ones are the flakiest and improve them over time.

The Windows tests seem to fail consistently in CI though (but not locally) even with retry.
@lydell
Copy link
Owner Author

lydell commented Jul 23, 2022

Here is a script to see which tests are retried the most:

set token FILL_ME_IN

set dir (status dirname)/scrape
mkdir -p $dir

set workflow_runs (curl -H "Accept: application/vnd.github+json" "https://api.github.com/repos/lydell/elm-watch/actions/runs?per_page=100&created=>=2022-07-16&exclude_pull_requests=true" | jq -c '.workflow_runs[]')

set count (count $workflow_runs)

for i in (seq $count)
    set workflow_run $workflow_runs[$i]
    set name (string join \n -- $workflow_run | jq -r '.name')
    if test $name != Test
        echo "### $i/$count: Skipping: $name"
        continue
    end
    set created_at (string join \n -- $workflow_run | jq -r '.created_at')
    set logs_url (string join \n -- $workflow_run | jq -r '.logs_url')
    set subdir $dir/$created_at
    set zip $subdir/logs.zip
    echo "### $i/$count: Download logs from $created_at to $subdir"
    rm -rf $subdir
    mkdir -p $subdir
    curl -L -H "Authorization: token $token" -H "Accept: application/vnd.github+json" $logs_url >$zip
    unzip -d $subdir $zip
end

set results_file $dir/results.tsv
rg 'RETRY ERRORS  (.+)' $dir -or '$1' | rg '([^/]+Z)[^(]+\(([^,]+), (\d+)\)[^:]+:(.+)' -or '$1'\t'$2'\t'$3'\t'$4' >$results_file
cut -f 4 $results_file | sort | uniq -c | sort -nr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant