Skip to content

Commit

Permalink
WIP: Use HPC for CI (#386)
Browse files Browse the repository at this point in the history
Works on #415 

Added sync with Gitlab, now also for pull requests

---------

Co-authored-by: Robert Speck <[email protected]> and Thomas Baumann <[email protected]>
  • Loading branch information
jakob-fritz and brownbaerchen authored Apr 22, 2024
1 parent fb4b745 commit cdb77d4
Show file tree
Hide file tree
Showing 17 changed files with 157 additions and 35 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/ci_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,6 @@ jobs:
# run: |
# pytest --continue-on-collection-errors -v --durations=0 pySDC/tests -m ${{ matrix.env }}



# # - name: Get and prepare artifacts
# # run: |
# # pipeline_id=$(curl --header "PRIVATE-TOKEN: ${{ secrets.GITLAB_SECRET_H }}" --silent "https://gitlab.hzdr.de/api/v4/projects/3525/repository/commits/${{ github.head_ref || github.ref_name }}" | jq '.last_pipeline.id')
Expand Down Expand Up @@ -196,7 +194,7 @@ jobs:
run: |
ls -artl cpu-test-artifacts
cp cpu-test-artifacts/data_3.10/* data/.
python -m coverage combine cpu-test-artifacts/coverage_*_3.10.dat
python -m coverage combine cpu-test-artifacts/coverage_*.dat
python -m coverage xml
python -m coverage html
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/gitlab_ci.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
---

name: Mirror to Gitlab to trigger CI

on:
Expand Down Expand Up @@ -37,7 +39,7 @@ jobs:
echo 'Permissions granted'
exit 0
else
echo 'Not enough permissions'
echo 'Not enough permissions. Please ask a member of Parallel-in-Time to rerun the job.'
exit 1
fi
- name: Pass if workflow from push or schedule
Expand Down Expand Up @@ -105,7 +107,7 @@ jobs:
- name: Mirror and wait for Gitlab-CI
uses: jakob-fritz/[email protected]
env:
MODE: 'all' # Either 'mirror', 'get_status', 'get_artifact', or 'all'
MODE: 'all' # Either 'mirror', 'get_status', 'get_artifact', or 'all'
GITLAB_TOKEN: ${{ secrets.GITLAB_SECRET }}
FORCE_PUSH: "true"
GITLAB_HOSTNAME: "gitlab.jsc.fz-juelich.de"
Expand Down
84 changes: 80 additions & 4 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,82 @@
---

stages:
- test
- benchmark
- execute
- upload

#test_kit:

# job_juwels_compute:
# stage: execute
# variables:
# SCHEDULER_PARAMETERS: '--account=cstma --nodes=1 --partition=devel'
# tags:
# - juwels
# - jacamar
# - compute
# - slurm
# artifacts:
# paths:
# - test.file
# script:
# - echo $SYSTEMNAME
# - touch test.file
# after_script:
# - hostname
# - id


variables:
JUWELS_ACCOUNT: "cstma"


test_JUWELS:
stage: benchmark
rules:
- if: $CI_COMMIT_MESSAGE !~ /.*\[CI-no-benchmarks\]/
tags:
- jacamar
- juwels
- login
- shell
parallel:
matrix:
- SHELL_SCRIPT: ['benchmark', 'cupy']
artifacts:
when: always
paths:
- coverage_*.dat
- sbatch.err
- sbatch.out
before_script:
- mkdir -p benchmarks
# load the latest Python module (currently 3.11)
- module --force purge
- module load Stages/2024
- module load GCC
- module load OpenMPI
- module load FFTW
- module load mpi4py
- module load SciPy-Stack
- module load CuPy
- pip install -e .
- pip install pytest-benchmark coverage
script:
# - touch benchmarks/output.json
- echo $SYSTEMNAME
- sbatch --wait etc/juwels_${SHELL_SCRIPT}.sh
- touch .coverage.empty
- python -m coverage combine
- mv .coverage coverage_${SHELL_SCRIPT}.dat
after_script:
- echo "Following Errors occured:"
- cat sbatch.err
- echo "Following was written to stdout:"
- cat sbatch.out


# test_kit:
# image: rcaspart/micromamba-cuda
# stage: benchmark
# variables:
Expand Down Expand Up @@ -34,7 +107,7 @@ stages:
# - mv data data_${PYTHON}


#test_kit_bare:
# test_kit_bare:
# stage: test
# rules:
# - if: $CI_COMMIT_MESSAGE !~ /.*\[CI-no-GPU\].*/
Expand Down Expand Up @@ -64,6 +137,9 @@ stages:
benchmark:
image: mambaorg/micromamba
stage: benchmark
when: manual
tags:
- docker
rules:
- if: $CI_COMMIT_MESSAGE !~ /.*\[CI-no-benchmarks\]/
artifacts:
Expand All @@ -79,7 +155,7 @@ benchmark:
- mkdir -p benchmarks
- pytest --continue-on-collection-errors -v pySDC/tests -m "benchmark" --benchmark-json=benchmarks/output.json

#benchmark_kit:
# benchmark_kit:
# image: rcaspart/micromamba
# stage: benchmark
# variables:
Expand All @@ -104,7 +180,7 @@ benchmark:
# - pytest --continue-on-collection-errors -v pySDC/tests -m "benchmark" --benchmark-json=benchmarks/output.json


#bundle:
# bundle:
# image: mambaorg/micromamba
# stage: upload
# artifacts:
Expand Down
14 changes: 7 additions & 7 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ representative at an online or offline event.
## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement
reported to the community leaders responsible for enforcement
[here](mailto:[email protected]).
All complaints will be reviewed and investigated promptly and fairly.

Expand Down Expand Up @@ -118,15 +118,15 @@ the community.

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
<https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
enforcement ladder](<https://github.com/mozilla/diversity>).

[homepage]: https://www.contributor-covenant.org
[homepage]: <https://www.contributor-covenant.org>

For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.
<https://www.contributor-covenant.org/faq>. Translations are available at
<https://www.contributor-covenant.org/translations>.

:arrow_left: [Back to main page](./README.md)
:arrow_left: [Back to main page](./README.md)
31 changes: 31 additions & 0 deletions docs/contrib/02_continuous_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,37 @@ pytest -v pySDC/tests
> pytest -v pySDC/tests/test_nodes.py # only test nodes generation
> ```
## Running CI on HPC from pull requests
By syncing the GitHub repository to a certain Gitlab instance, CI-Jobs can be run on HPC machines. This can be helpful for benchmarks or when running on accelerators that are not available as GitHub runners.
For security and accounting reasons, a few extra steps are needed in order to run the contents of a pull request on HPC:
- The pull request needs to have the tag "gitlab-mirror" assigned to it.
- A person with write-permission for the Parallel-in-Time pySDC repository needs to trigger the workflow. Ask for someone with the required permissions to rerun the workflow if needed.
- The workflow checks if the code can be merged. If this is not the case, the code is not mirrored and the workflow fails. In this case, please merge upstream changes, fix all conflicts, and rerun the workflow.
> :bell: Note that direct pushes to Parallel-in-Time/pySDC will always trigger the HPC pipeline on Gitlab
Regardless of why the Gitlab pipeline was triggered, the following holds true:
- The return-state from Gitlab is transmitted to GitHub (Success/Failure) leading to the same result in GitHub
- Logs from Gitlab are also transferred. The full logs of all jobs can be read from within GitHub. For better overview, these are folded, so unfolding is needed before reading.
- Artifacts from Gitlab jobs are also transferred back to GitHub
- Information, such as coverage is transferred to GitHub, but not yet merged across multiple GitHub workflows. Therefore, there is no complete summary of e.g. coverage-reports across all jobs in all workflows.
> :warning: The coverage report from the HPC tests is not yet merged with other reports. The test coverage will not show up on the respective website or in the badge. We are working on this.
### HPC test environments
In order to run tests on GPUs, please use the pytest marker `cupy`.
If you want to create a new HPC test environment, the following steps need to be completed:
- Create a new slurm job-script in `etc/juwels_*.sh`. The name and location of the file is important.
- Adapt `.gitlab-ci.yml` to include the new job-script. For this, add a name in the job "test_JUWELS" in the section `parallel: matrix: SHELL_SCRIPT`. The name there must match the name of the newly created file.
As a starting point it is recommended to copy and adapt an existing file (e.g. `etc/juwels_cupy.sh`).
## Code coverage
This stage allows to checks how much of the `pySDC` code is tested by the previous stage. It is based on the [coverage](https://pypi.org/project/coverage/) library and currently applied to the following directories :
Expand Down
9 changes: 9 additions & 0 deletions etc/juwels_benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash -x
#SBATCH --account=cstma
#SBATCH --nodes=1
#SBATCH --time=00:10:00
#SBATCH --partition=devel
#SBATCH --output=sbatch.out
#SBATCH --error=sbatch.err

srun python -m pytest --continue-on-collection-errors -v pySDC/tests -m "benchmark" --benchmark-json=benchmarks.json
9 changes: 9 additions & 0 deletions etc/juwels_cupy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash -x
#SBATCH --account=cstma
#SBATCH --nodes=1
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --output=sbatch.out
#SBATCH --error=sbatch.err

srun python -m coverage run -m pytest --continue-on-collection-errors -v pySDC/tests -m "cupy"
8 changes: 2 additions & 6 deletions pySDC/implementations/problem_classes/AllenCahn_1D_FD.py
Original file line number Diff line number Diff line change
Expand Up @@ -762,9 +762,7 @@ def eval_f(self, u, t):
f = self.dtype_f(self.init)
f.impl[:] = self.A.dot(u)
f.expl[:] = (
-2.0 / self.eps**2 * u * (1.0 - u) * (1.0 - 2.0 * u)
- 6.0 * self.dw * u * (1.0 - u)
+ 0.0 / self.eps**2 * u
-2.0 / self.eps**2 * u * (1.0 - u) * (1.0 - 2.0 * u) - 6.0 * self.dw * u * (1.0 - u) + 0.0 / self.eps**2 * u
)
self.work_counters['rhs']()
return f
Expand Down Expand Up @@ -849,9 +847,7 @@ def eval_f(self, u, t):
f = self.dtype_f(self.init)
f.comp1[:] = self.A.dot(u)
f.comp2[:] = (
-2.0 / self.eps**2 * u * (1.0 - u) * (1.0 - 2.0 * u)
- 6.0 * self.dw * u * (1.0 - u)
+ 0.0 / self.eps**2 * u
-2.0 / self.eps**2 * u * (1.0 - u) * (1.0 - 2.0 * u) - 6.0 * self.dw * u * (1.0 - u) + 0.0 / self.eps**2 * u
)
self.work_counters['rhs']()
return f
Expand Down
2 changes: 2 additions & 0 deletions pySDC/implementations/problem_classes/Battery.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ class battery(battery_n_capacitors):
----
This class has the same attributes as the class it inherits from.
"""

dtype_f = imex_mesh

def __init__(self, ncapacitors=1, Vs=5.0, Rs=0.5, C=None, R=1.0, L=1.0, alpha=1.2, V_ref=None):
Expand Down Expand Up @@ -455,6 +456,7 @@ class battery_implicit(battery):
work_counters : WorkCounter
Counts different things, here: Number of Newton iterations is counted.
"""

dtype_f = mesh

def __init__(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,9 +132,7 @@ def formJacobian(self, snes, X, J, P):
row.field = 1
col.field = 1
val = 1.0 - self.factor * (
self.prob.Dv * (-2.0 / self.dx**2 - 2.0 / self.dy**2)
+ 2.0 * x[i, j, 0] * x[i, j, 1]
- self.prob.B
self.prob.Dv * (-2.0 / self.dx**2 - 2.0 / self.dy**2) + 2.0 * x[i, j, 0] * x[i, j, 1] - self.prob.B
)
P.setValueStencil(row, col, val)
row.field = 1
Expand Down
1 change: 1 addition & 0 deletions pySDC/implementations/problem_classes/GrayScott_MPIFFT.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,7 @@ class grayscott_mi_linear(grayscott_imex_linear):
The problem in this class will be treated in a *multi-implicit* way for time-stepping, i.e., for the system containing
the diffusion part will be solved by FFT, and for the linear part a Newton solver is used.
"""

dtype_f = comp2_mesh

def __init__(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ class heatNd_unforced(heatNd_forced):
The implementation is this class uses the ``CuPy`` package in order to make ``pySDC`` available for GPUs.
"""

dtype_f = cupy_mesh

def eval_f(self, u, t):
Expand Down
1 change: 1 addition & 0 deletions pySDC/implementations/problem_classes/LogisticEquation.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ class logistics_equation(ptype):
stop_at_nan : bool, optional
Indicates if the Newton solver stops when nan values arise.
"""

dtype_u = mesh
dtype_f = mesh

Expand Down
5 changes: 1 addition & 4 deletions pySDC/implementations/problem_classes/PenningTrap_3D.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,7 @@ def fast_interactions(N, pos, sig, q):

for j in range(N):
dist2 = (
(pos[0, i] - pos[0, j]) ** 2
+ (pos[1, i] - pos[1, j]) ** 2
+ (pos[2, i] - pos[2, j]) ** 2
+ sig**2
(pos[0, i] - pos[0, j]) ** 2 + (pos[1, i] - pos[1, j]) ** 2 + (pos[2, i] - pos[2, j]) ** 2 + sig**2
)
contrib += q[j] * (pos[:, i] - pos[:, j]) / dist2**1.5

Expand Down
1 change: 1 addition & 0 deletions pySDC/implementations/problem_classes/generic_ND_FD.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ class GenericNDimFinDiff(ptype):
xvalues : np.1darray
Values of spatial grid.
"""

dtype_u = mesh
dtype_f = mesh

Expand Down
8 changes: 5 additions & 3 deletions pySDC/projects/Resilience/work_precision.py
Original file line number Diff line number Diff line change
Expand Up @@ -1109,9 +1109,11 @@ def get_configs(mode, problem):
configurations[-1] = {
'strategies': [
ERKStrategy(useMPI=True),
ARKStrategy(useMPI=True)
if problem.__name__ in ['run_Schroedinger', 'run_AC']
else ESDIRKStrategy(useMPI=True),
(
ARKStrategy(useMPI=True)
if problem.__name__ in ['run_Schroedinger', 'run_AC']
else ESDIRKStrategy(useMPI=True)
),
],
'num_procs': 1,
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,7 @@ def eval_jacobian(self, u):
"""

# noinspection PyTypeChecker
dfdu = self.A[1:-1, 1:-1] + sp.diags(
self.lambda0**2 - self.lambda0**2 * (self.nu + 1) * u**self.nu, offsets=0
)
dfdu = self.A[1:-1, 1:-1] + sp.diags(self.lambda0**2 - self.lambda0**2 * (self.nu + 1) * u**self.nu, offsets=0)

return dfdu

Expand Down

0 comments on commit cdb77d4

Please sign in to comment.