ci: standardize and optimize Check workflow #749

trueNAHO · 2025-01-05T22:03:01Z

NAHO (4):
  ci: standardize output redirection formatting
  ci: optimize CI time by running only relevant checks
  ci: avoid downloading the repository twice
  ci: consistently install repository with actions/checkout GitHub
    Action

 .github/workflows/check.yml | 124 +++++++++++++++++++++++++-----------
 1 file changed, 87 insertions(+), 37 deletions(-)

See the individual commit messages for more information.

Note

These commits should be merged individually, not squashed, to ensure a clear and easily reversible changelog. In case of change requests, review the commits separately.

I have successfully tested the CI of this patchset in a dummy repository.

Since we seem to be currently out of GitHub CI budget, I have successfully tested this patchset with:

nix run github:trueNAHO/stylix/ci-standardize-and-optimize-check-workflow#nix-flake-check

This PR could be merged with the following merge commit message:

ci: standardize and optimize Check workflow

Link: https://github.com/danth/stylix/pull/749

Approved-by: Daniel Thwaites <[email protected]>

Significantly optimize the CI time by running only the relevant checks based on the changed files. The lightweight and compatible implementation for getting the changed files is adapted from [1]. [1]: https://stackoverflow.com/questions/74265821

Avoid downloading the repository twice by leveraging the repository already installed by the actions/checkout GitHub Action, instead of downloading it again from GitHub.

trueNAHO · 2025-01-06T01:03:33Z

Changelog

v1: `16b69eb`

Add the /stylix/ directory to the critical files.
Re-add accidentally removed empty lines

v0: `00c1b73`

danth · 2025-01-06T16:14:48Z

Since we seem to be currently out of GitHub CI budget

GitHub Actions is unlimited for public repositories. I think the reason the CI is not running is because it's only configured to run on pushes to branches within the main repository:

stylix/.github/workflows/check.yml

Lines 4 to 5 in a6b53aa

    
           on:  # yamllint disable-line rule:truthy 
        
             push:

To run on pull requests from forks, we must add:

on:
  push:
  pull_request:

And then to avoid running twice for pull requests which aren't from forks, limit the push event to protected branches only:

on:
  push:
    branches:
      - master
      - release-**
  pull_request:

This is how it was configured in the old workflow:

stylix/.github/workflows/build.yml

Lines 3 to 8 in 7c8874b

    
           on: 
        
             push: 
        
               branches: 
        
                 - master 
        
                 - release-** 
        
             pull_request:

danth

Only running supposedly changed derivations defeats part of the purpose of the CI, as it's able to notice errors caused by other modules.

For example, say module A reuses a template file from module B. Module B is updated, and the author is not aware that the template is used elsewhere. The current CI would run all testbeds, and encounter a failure for module A. This filtered CI would not pick up on that.

Even if we try to discourage dependencies between modules, a situation like this may arise without anyone being aware of it.

I would recommend just combining the separate build tasks into a single job. Although this would lose some parallelism, a lot of the testbeds share common dependencies which can be reused, reducing the total minutes spent.

Also, we can only spawn 100 jobs within a workflow, so at some point we will need to combine them into a single job anyway as we will have more than 100 outputs to build.

Also, combining them would reduce the number of cache queries, potentially making #745 obsolete.

.github/workflows/check.yml

danth · 2025-01-06T16:19:37Z

.github/workflows/check.yml

+                      if .key == "nix-flake-check" then
+                        ($changed_files | any(. == "flake.nix"))
+
+                      elif .key == "docs" then
+                        ($changed_files | any(startswith("docs/")))
+
+                      elif .key == "palette-generator" then
+                        ($changed_files | any(startswith("palette-generator/")))
+
+                      elif (.key | test("^testbed-[^-]+-")) then
+                        (
+                          .key | capture("^testbed-(?<module>[^-]+)") | .module
+                        ) as $module |
+                        (
+                          $changed_files |
+                          any(startswith("modules/\($module)/"))
+                        )
+
+                      # Always keep the git-hooks derivation.
+                      elif .key == "git-hooks" then
+                        true
+
+                      else
+                        error(
+                          "Derivation must be handled or explicitly ignored: " +
+                          .key
+                        )


This increases the maintenance burden compared to the previous workflow, as this will need to be updated whenever files are renamed or new outputs are introduced.

The last branch will trigger an error when something unexpected happens. When a new derivation has to be ignored, the second-last branch can be extended:

# Always keep these derivations. elif .key == "git-hooks" or .key == "<NEW_DERIVATION>" then true

However, this probably happens very rarely and is a very quick fix.

.github/workflows/check.yml

Run the CI on PRs, while preventing it from running twice for non-fork PRs by limiting the push event to protected branches. [1] This restores the workflow trigger that was accidentally modified in commit 2b85a56 ("ci: simplify workflows"). [1]: danth#749 (comment) Fixes: 2b85a56 ("ci: simplify workflows")

Run the CI on PRs, while preventing it from running twice for non-fork PRs by limiting the push event to protected branches. [1] This restores the workflow trigger that was accidentally modified in commit 2b85a56 ("ci: simplify workflows"). [1]: danth#749 (comment) Fixes: 2b85a56 ("ci: simplify workflows") Link: danth#751

trueNAHO · 2025-01-06T18:18:12Z

I think the reason the CI is not running is because it's only configured to run on pushes to branches within the main repository: [...]

I opened #751.

Only running supposedly changed derivations defeats part of the purpose of the CI, as it's able to notice errors caused by other modules.

For example, say module A reuses a template file from module B. Module B is updated, and the author is not aware that the template is used elsewhere. The current CI would run all testbeds, and encounter a failure for module A. This filtered CI would not pick up on that.

Even if we try to discourage dependencies between modules, a situation like this may arise without anyone being aware of it.

AFAIK, Nixpkgs uses https://github.com/NixOS/nixpkgs-vet in their CI to ensure that parent files are not accessed. For example, ../../flake.nix should be replaced with something like "${self}/flake.nix".

However, I agree that this is heuristics-based and may miss some edge cases.

I would recommend just combining the separate build tasks into a single job. Although this would lose some parallelism, a lot of the testbeds share common dependencies which can be reused, reducing the total minutes spent.

Also, we can only spawn 100 jobs within a workflow, so at some point we will need to combine them into a single job anyway as we will have more than 100 outputs to build.

Also, combining them would reduce the number of cache queries, potentially making #745 obsolete.

IIRC, GitHub Action runners have 2 CPUs. The Last time I tested, nix flake check and nix run .#nix-flake-check were significantly slower than using the GitHub matrix pattern because 2 CPUs do not scale as much as 100 jobs.

I assume that once we exceed the maximum number of jobs within a workflow, we can evenly distribute the work among the 100 jobs. Otherwise, we can look at how other projects, like Nixpkgs, are handling this.

Run the CI on PRs, while preventing it from running twice for non-fork PRs by limiting the push event to protected branches. [1] This restores the workflow trigger that was accidentally modified in commit 2b85a56 ("ci: simplify workflows"). [1]: #749 (comment)

danth · 2025-01-06T21:55:16Z

we can evenly distribute the work among the 100 jobs

IMO, this would be more confusing for contributors as the individual status checks lose their meaning. (Why did job X, in particular, fail?) Also, quite complex to implement.

In most cases contributors are required to wait for a review anyway, so the additional time taken to run the checks sequentially would not be a huge deal. With a single job, we can also set it as a requirement to merge, which means we can use the auto-merge feature to merge once the checks are complete.

I believe OfBorg is the tool used by Nixpkgs.

trueNAHO · 2025-01-06T22:51:30Z

we can evenly distribute the work among the 100 jobs

IMO, this would be more confusing for contributors as the individual status checks lose their meaning. (Why did job X, in particular, fail?) Also, quite complex to implement.

In most cases contributors are required to wait for a review anyway, so the additional time taken to run the checks sequentially would not be a huge deal. With a single job, we can also set it as a requirement to merge, which means we can use the auto-merge feature to merge once the checks are complete.

True. However, if the CI takes over 30 minutes to run, it can be rather annoying to wait. We should try to make the CI reasonably fast. For reference, nix run .#nix-flake-check takes 70 seconds on my machine.

I believe OfBorg is the tool used by Nixpkgs.

They are actually currently migrating away from OfBorg: NixOS/nixpkgs#355847.

trueNAHO · 2025-01-07T18:54:43Z

Since we seem to be currently out of GitHub CI budget

GitHub Actions is unlimited for public repositories.

Should we close this PR for now and pick it back up if our CI no longer scales with the number of tests, or becomes horribly slow?

In that case, I have cherry-picked the "ci: standardize output redirection formatting" commit from this PR into #756.

ci: standardize output redirection formatting

dafaaea

trueNAHO requested a review from danth January 5, 2025 22:03

trueNAHO added 3 commits January 6, 2025 01:56

ci: avoid downloading the repository twice

f99342b

Avoid downloading the repository twice by leveraging the repository already installed by the actions/checkout GitHub Action, instead of downloading it again from GitHub.

ci: consistently install repository with actions/checkout GitHub Action

16b69eb

trueNAHO force-pushed the ci-standardize-and-optimize-check-workflow branch from 00c1b73 to 16b69eb Compare January 6, 2025 01:01

trueNAHO added the backport release-24.11 label Jan 6, 2025

danth requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: standardize and optimize Check workflow #749

ci: standardize and optimize Check workflow #749

trueNAHO commented Jan 5, 2025 •

edited

Loading

trueNAHO commented Jan 6, 2025

danth commented Jan 6, 2025

danth left a comment

danth Jan 6, 2025

trueNAHO Jan 6, 2025 •

edited

Loading

trueNAHO commented Jan 6, 2025

danth commented Jan 6, 2025

trueNAHO commented Jan 6, 2025

trueNAHO commented Jan 7, 2025

ci: standardize and optimize Check workflow #749

Are you sure you want to change the base?

ci: standardize and optimize Check workflow #749

Conversation

trueNAHO commented Jan 5, 2025 • edited Loading

trueNAHO commented Jan 6, 2025

Changelog

v1: 16b69eb

v0: 00c1b73

danth commented Jan 6, 2025

danth left a comment

Choose a reason for hiding this comment

danth Jan 6, 2025

Choose a reason for hiding this comment

trueNAHO Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

trueNAHO commented Jan 6, 2025

danth commented Jan 6, 2025

trueNAHO commented Jan 6, 2025

trueNAHO commented Jan 7, 2025

trueNAHO commented Jan 5, 2025 •

edited

Loading

v1: `16b69eb`

v0: `00c1b73`

trueNAHO Jan 6, 2025 •

edited

Loading