Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: standardize and optimize Check workflow #749

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

trueNAHO
Copy link
Collaborator

@trueNAHO trueNAHO commented Jan 5, 2025

NAHO (4):
  ci: standardize output redirection formatting
  ci: optimize CI time by running only relevant checks
  ci: avoid downloading the repository twice
  ci: consistently install repository with actions/checkout GitHub
    Action

 .github/workflows/check.yml | 124 +++++++++++++++++++++++++-----------
 1 file changed, 87 insertions(+), 37 deletions(-)

See the individual commit messages for more information.

Note

These commits should be merged individually, not squashed, to ensure a clear and easily reversible changelog. In case of change requests, review the commits separately.


I have successfully tested the CI of this patchset in a dummy repository.

Since we seem to be currently out of GitHub CI budget, I have successfully tested this patchset with:

nix run github:trueNAHO/stylix/ci-standardize-and-optimize-check-workflow#nix-flake-check

This PR could be merged with the following merge commit message:

ci: standardize and optimize Check workflow

Link: https://github.com/danth/stylix/pull/749

Approved-by: Daniel Thwaites <[email protected]>

@trueNAHO trueNAHO requested a review from danth January 5, 2025 22:03
Significantly optimize the CI time by running only the relevant checks
based on the changed files.

The lightweight and compatible implementation for getting the changed
files is adapted from [1].

[1]: https://stackoverflow.com/questions/74265821
Avoid downloading the repository twice by leveraging the repository
already installed by the actions/checkout GitHub Action, instead of
downloading it again from GitHub.
@trueNAHO trueNAHO force-pushed the ci-standardize-and-optimize-check-workflow branch from 00c1b73 to 16b69eb Compare January 6, 2025 01:01
@trueNAHO
Copy link
Collaborator Author

trueNAHO commented Jan 6, 2025

Changelog

v1: 16b69eb

  • Add the /stylix/ directory to the critical files.
  • Re-add accidentally removed empty lines

v0: 00c1b73

@danth
Copy link
Owner

danth commented Jan 6, 2025

Since we seem to be currently out of GitHub CI budget

GitHub Actions is unlimited for public repositories. I think the reason the CI is not running is because it's only configured to run on pushes to branches within the main repository:

on: # yamllint disable-line rule:truthy
push:

To run on pull requests from forks, we must add:

on:
  push:
  pull_request:

And then to avoid running twice for pull requests which aren't from forks, limit the push event to protected branches only:

on:
  push:
    branches:
      - master
      - release-**
  pull_request:

This is how it was configured in the old workflow:

on:
push:
branches:
- master
- release-**
pull_request:

Copy link
Owner

@danth danth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only running supposedly changed derivations defeats part of the purpose of the CI, as it's able to notice errors caused by other modules.

For example, say module A reuses a template file from module B. Module B is updated, and the author is not aware that the template is used elsewhere. The current CI would run all testbeds, and encounter a failure for module A. This filtered CI would not pick up on that.

Even if we try to discourage dependencies between modules, a situation like this may arise without anyone being aware of it.

I would recommend just combining the separate build tasks into a single job. Although this would lose some parallelism, a lot of the testbeds share common dependencies which can be reused, reducing the total minutes spent.

Also, we can only spawn 100 jobs within a workflow, so at some point we will need to combine them into a single job anyway as we will have more than 100 outputs to build.

Also, combining them would reduce the number of cache queries, potentially making #745 obsolete.

.github/workflows/check.yml Show resolved Hide resolved
Comment on lines +76 to +102
if .key == "nix-flake-check" then
($changed_files | any(. == "flake.nix"))

elif .key == "docs" then
($changed_files | any(startswith("docs/")))

elif .key == "palette-generator" then
($changed_files | any(startswith("palette-generator/")))

elif (.key | test("^testbed-[^-]+-")) then
(
.key | capture("^testbed-(?<module>[^-]+)") | .module
) as $module |
(
$changed_files |
any(startswith("modules/\($module)/"))
)

# Always keep the git-hooks derivation.
elif .key == "git-hooks" then
true

else
error(
"Derivation must be handled or explicitly ignored: " +
.key
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases the maintenance burden compared to the previous workflow, as this will need to be updated whenever files are renamed or new outputs are introduced.

Copy link
Collaborator Author

@trueNAHO trueNAHO Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last branch will trigger an error when something unexpected happens. When a new derivation has to be ignored, the second-last branch can be extended:

# Always keep these derivations.
elif .key == "git-hooks" or .key == "<NEW_DERIVATION>" then
  true

However, this probably happens very rarely and is a very quick fix.

.github/workflows/check.yml Show resolved Hide resolved
trueNAHO added a commit to trueNAHO/stylix that referenced this pull request Jan 6, 2025
Run the CI on PRs, while preventing it from running twice for non-fork
PRs by limiting the push event to protected branches. [1]

This restores the workflow trigger that was accidentally modified in
commit 2b85a56 ("ci: simplify workflows").

[1]: danth#749 (comment)

Fixes: 2b85a56 ("ci: simplify workflows")
trueNAHO added a commit to trueNAHO/stylix that referenced this pull request Jan 6, 2025
Run the CI on PRs, while preventing it from running twice for non-fork
PRs by limiting the push event to protected branches. [1]

This restores the workflow trigger that was accidentally modified in
commit 2b85a56 ("ci: simplify workflows").

[1]: danth#749 (comment)

Fixes: 2b85a56 ("ci: simplify workflows")
Link: danth#751
@trueNAHO
Copy link
Collaborator Author

trueNAHO commented Jan 6, 2025

I think the reason the CI is not running is because it's only configured to run on pushes to branches within the main repository: [...]

I opened #751.

Only running supposedly changed derivations defeats part of the purpose of the CI, as it's able to notice errors caused by other modules.

For example, say module A reuses a template file from module B. Module B is updated, and the author is not aware that the template is used elsewhere. The current CI would run all testbeds, and encounter a failure for module A. This filtered CI would not pick up on that.

Even if we try to discourage dependencies between modules, a situation like this may arise without anyone being aware of it.

AFAIK, Nixpkgs uses https://github.com/NixOS/nixpkgs-vet in their CI to ensure that parent files are not accessed. For example, ../../flake.nix should be replaced with something like "${self}/flake.nix".

However, I agree that this is heuristics-based and may miss some edge cases.

I would recommend just combining the separate build tasks into a single job. Although this would lose some parallelism, a lot of the testbeds share common dependencies which can be reused, reducing the total minutes spent.

Also, we can only spawn 100 jobs within a workflow, so at some point we will need to combine them into a single job anyway as we will have more than 100 outputs to build.

Also, combining them would reduce the number of cache queries, potentially making #745 obsolete.

IIRC, GitHub Action runners have 2 CPUs. The Last time I tested, nix flake check and nix run .#nix-flake-check were significantly slower than using the GitHub matrix pattern because 2 CPUs do not scale as much as 100 jobs.

I assume that once we exceed the maximum number of jobs within a workflow, we can evenly distribute the work among the 100 jobs. Otherwise, we can look at how other projects, like Nixpkgs, are handling this.

danth pushed a commit that referenced this pull request Jan 6, 2025
Run the CI on PRs, while preventing it from running twice for non-fork
PRs by limiting the push event to protected branches. [1]

This restores the workflow trigger that was accidentally modified in
commit 2b85a56 ("ci: simplify workflows").

[1]: #749 (comment)
@danth
Copy link
Owner

danth commented Jan 6, 2025

we can evenly distribute the work among the 100 jobs

IMO, this would be more confusing for contributors as the individual status checks lose their meaning. (Why did job X, in particular, fail?) Also, quite complex to implement.

In most cases contributors are required to wait for a review anyway, so the additional time taken to run the checks sequentially would not be a huge deal. With a single job, we can also set it as a requirement to merge, which means we can use the auto-merge feature to merge once the checks are complete.

I believe OfBorg is the tool used by Nixpkgs.

@trueNAHO
Copy link
Collaborator Author

trueNAHO commented Jan 6, 2025

we can evenly distribute the work among the 100 jobs

IMO, this would be more confusing for contributors as the individual status checks lose their meaning. (Why did job X, in particular, fail?) Also, quite complex to implement.

In most cases contributors are required to wait for a review anyway, so the additional time taken to run the checks sequentially would not be a huge deal. With a single job, we can also set it as a requirement to merge, which means we can use the auto-merge feature to merge once the checks are complete.

True. However, if the CI takes over 30 minutes to run, it can be rather annoying to wait. We should try to make the CI reasonably fast. For reference, nix run .#nix-flake-check takes 70 seconds on my machine.

I believe OfBorg is the tool used by Nixpkgs.

They are actually currently migrating away from OfBorg: NixOS/nixpkgs#355847.

@trueNAHO
Copy link
Collaborator Author

trueNAHO commented Jan 7, 2025

Since we seem to be currently out of GitHub CI budget

GitHub Actions is unlimited for public repositories.

Should we close this PR for now and pick it back up if our CI no longer scales with the number of tests, or becomes horribly slow?

In that case, I have cherry-picked the "ci: standardize output redirection formatting" commit from this PR into #756.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants