Skip to content

Commit

Permalink
1.1 ref sim ci workflow (#514)
Browse files Browse the repository at this point in the history
* added catalog file for 1.1 reference simulations to /data

* possible approach to packaging reference simulation data files

* moved files to better location

* fixed up filepaths for 1.1_uniform reference simulation in /data

* attempting to add benchmarks and updated ref_sim_1.1 in /data

* benchmark needs more work but almost there

* re-commented line

* preliminary commit for benchmark workflow. Still need to set up 1.1_uniform for remote downloading.

* pytest should now hopefully work remotely -- file on gdrive

* trying slight change to see if all other tests run to completion properly

* updated testsuite to install requests

* test change

* change and mpiexec

* tried possibly fixing caching

* benchmark action stuff

* minor updates to see if alert works and if -np 4 speeds up run

* test 3

* just alert

* test failed assert

* swap back to ==

* swapping back

* small TODO

* started adding new ref sim tests

* formatting

* added 1.1_gauss

* got 1.1 uniform and gauss working, and resolved warnings

* cosmetic update to testsuite

* slight changes to test_run_ref.py

* swapped to no longer writing the UVData objects and just returning one

* changed gaussian beam to have proper syntax as well

* preliminary attempt at sequential execution using matrix-lock

* hopefully fixes the issue

* 2nd attempt

* had bad version of matrix-lock

* removed matrix-lock

* attempt to fix up caching

* dummy commit

* swapped back to triplicate for tests -- no longer doing command line input to pytest

* added 1.3 sim files to data

* added 1.2 simulations to data

* google drive links

* swapped workflow simulation run order

* Swapped to downloading files from the BDR -- will trim workflow runtime down to ~1 hour using pedantic. Added mwa uvbeam sim files to data not yet tested.

* figured out approach to parametrize all reference simulation tests using workflow. Still need to integrate mwa sims. Added pedantic benchmarking. Need to determine best approach to setting up workflow matrix.

* filled out workflow refsim names, and added 1.1_mwa reference simulation to pytest and workflow.

* changed the Brown Digital Repository file downloading to use a collection approach, added some print statements, added a line for formatting

* removed 1.2_mwa files, minor comments change

* Intermediate commit while attempting to switch benchmarking approach to use artifacts. Need to re-integrate the benchmark action, and create structure for concatenating benchmark output and uploading it.

* fixed syntax error

* commented out line to be re-added later

* failed to comment out another line that skipped second part of workflow

* test for python script which concatenates benchmarks

* intermediate commit

* first attempt at gh-pages approach

* dummy change

* preliminary approach to only pushing results of benchmark if push to 'main', and running only if Tests finishes successfully

* removed dependence on Tests as that workflow seems to be failing independently

* hopefully fixed yaml syntax

* added initial output statistics to the reference simulation comparisons. currently only asserts '==' can implement others or even an absolute check

* re-added setting history to be equal

* fix

* all current ref sims should run now, and implemented hopefully more robust downloading

* commented out the 0 tolerance sim comparison check

* added dummy counter (#513)

* added dummy counter

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* only one TODO left to resolve in test_run_ref (determining how strict object comparison should be)

* cleaned up compare_post_benchmark.yaml a bit. now need to test running compare_post_benchmark using completion of another workflow (pull request and push)

* updated approach to computing num_mismatched and fixed style

* swapped compare_post_benchmark to run after Tests

* minor edits to compare_post_benchmark so hopefully it runs

* not sure why linking to tests isn't working -- swapping back

* edited README / environment.yaml to discuss/require requests and pytest-benchmark

* edited README

* swapping to have defaults expected for pull request

* changed underscore to hyphen to match style

* Tentative README update -- should probably add a section of regression testing / ci  in the developer section of the docs, and amend README to link to it

* made data comparison same as np.testing.assert_allclose defaults, removed some commented out code and comments

* fixed typos in ci workflow

* fixed formatting for a line

* Futher updated the README

* switching back to multiple ids

* refactored job matrix

* swapped discussion to docs for pytest regression testing

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
burdorfmitchell and pre-commit-ci[bot] authored Dec 19, 2024
1 parent f5e5228 commit 8cb4e68
Show file tree
Hide file tree
Showing 21 changed files with 773 additions and 0 deletions.
202 changes: 202 additions & 0 deletions .github/workflows/compare-post-benchmark.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
name: "Run Compare Post Benchmark"
on: [push, pull_request]
#on:
# workflow_run:
# workflows: [Tests]
# types:
# - in_progress

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
benchmark:
# Job that clones pyuvsim@main, sets up a conda environment with the necessary dependencies,
# then locally installs pyuvsim and additionally installs pytest-benchmark and requests from
# PYPI. Runs in parallel as a matrix with input individual reference simulations. The "id"
# input is passed as a flag to pytest which parametrizes the reference simulation test
# function. pytest-benchmark output is saved as an artifact with its current workflow run
# and attempt as part of name key.
#
# Link to discussion of artifacts
# https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow#about-workflow-artifacts
# https://github.com/actions/upload-artifact
#
# uncomment when Tests is working and done testing
#if: github.event.workflow_run.conclusion == 'success'
name: Performance Benchmark
env:
ENV_NAME: pyuvsim_tests_mpich
PYTHON: "3.12"
runs-on: ubuntu-latest

strategy:
# all jobs should run in parallel
matrix:
id: [1.1_uniform, 1.1_gauss, 1.1_mwa, 1.2_uniform, 1.2_gauss, 1.3_uniform, 1.3_gauss]

defaults:
run:
# Adding -l {0} helps ensure conda can be found properly.
shell: bash -l {0}
steps:
- uses: actions/checkout@main

- name: Setup Miniforge
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-version: latest
python-version: ${{ env.PYTHON }}
environment-file: ci/${{ env.ENV_NAME }}.yaml
activate-environment: ${{ env.ENV_NAME }}
run-post: false

- name: Conda Info
run: |
conda info -a
conda list
PYVER=`python -c "import sys; print('{:d}.{:d}'.format(sys.version_info.major, sys.version_info.minor))"`
if [[ $PYVER != $PYTHON ]]; then
exit 1;
fi
# pip install benchmark utility and requests from PYPI, and local install pyuvsim
- name: Install
run: |
pip install pytest-benchmark
pip install requests
pip install .
# make the artifacts directory, then run pytest using mpiexec with only 1 node and core, specifying the
# reference simulation to run using the "refsim" flag. Save the pytest benchmark output in artifacts/
# with a sufficiently unique name
- name: Run benchmark
run: |
mkdir artifacts/
mpiexec -n 1 -np 1 pytest --refsim=${{ matrix.id }} --benchmark-only --benchmark-json artifacts/output_${{ matrix.id }}.json -s
# upload the benchmark output as an artifact with name key corresponding to the current
# workflow run and attempt only store artifacts for 1 day
- name: Upload result artifacts
uses: actions/upload-artifact@v4
with:
name: ${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.id }}
path: artifacts/
if-no-files-found: error
include-hidden-files: true
retention-days: 1

collate-post-benchmark:
# Job that loads the saved artifacts corresponding to the specific workflow run and attempt id,
# then creates a net benchmark output file named output.json with a python script action. The
# net benchmark file should still be accurate except the explicit machine info will be mostly
# lost. The net benchmark file is then fed to github-actions-benchmark which compares the
# current benchmark output with the latest data in the gh-pages branch. If the current workflow
# is a push to main, github-actions-benchmark then pushes the current benchmark output to
# gh-pages. If, during the benchmark comparison, a performance regression occurs, a comment of
# the benchmark comparison output is made on the workflow and this Job fails.
#
# Inspired by this workflow by yewstack/yew and the github-action-benchmark README:
# https://github.com/yewstack/yew/blob/master/.github/workflows/benchmark.yml
# https://github.com/yewstack/yew/blob/master/.github/workflows/post-benchmark.yml
# https://github.com/benchmark-action/github-action-benchmark
# https://github.com/actions/download-artifact

name: Concatenate and Post Benchmark Results
needs: benchmark
runs-on: ubuntu-latest

steps:
# Checkout repo for the github-action-benchmark action
- uses: actions/checkout@v4

# setup python
- uses: actions/setup-python@v4
with:
python-version: '3.x'

# only downloads artifacts from current workflow and run attempt via the pattern matching
# loads the saved benchmark artifacts from running the benchmark matrix into artifacts/
- name: Download result artifacts
uses: actions/download-artifact@v4
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
pattern: ${{ github.run_id }}-${{ github.run_attempt }}-*
merge-multiple: true
path: artifacts

# prints directory info recursively, removable
# (could maybe swap this to exa or lsd because base ls doesn't do tree)
- name: Display structure of downloaded files
run: ls -R

# approach to putting all the benchmark output in one file, with the machine/run info
# of only one the pytest benchmark runs. Loads all the benchmark output artifact files,
# then takes the benchmark timing infor fron n-1 files and adds it to the first file.
# With this approach, benchmark comparison output is only a single table, and we only
# have one comment on alert in the workflow.
- uses: jannekem/run-python-script-action@v1
with:
script: |
import os
import json
# make list of paths to artifact files excluding hidden files
filepath_arr = [os.path.join('artifacts', bench) for bench in os.listdir('artifacts') if not bench.startswith('.')]
print(filepath_arr)
output_jsons = []
# open each filepath in the filepath_arr, load it as a json, and append it to an empty list
for filepath in filepath_arr:
with open(filepath) as f:
output_jsons.append(json.load(f))
# choose the first json as the one to modify to contain all the benchmark data
net_json = output_jsons[0]
# iterate through the other jsons (1-n) and append their benchmark data to net_json
for json_out in output_jsons[1:]:
net_json['benchmarks'].append(json_out['benchmarks'][0])
# save net_json as json with name output.json in current working directory
with open('output.json', 'w') as f:
json.dump(net_json, f)
# Print github event_name and ref_name and the boolean check for whether gh-pages should be updated
- name: Print Event, Ref, and Upload Boolean
run: |
echo "Event Name: ${{ github.event_name }}"
echo "Ref Name: ${{ github.ref_name }}"
echo "Update gh-pages: ${{ github.event_name == 'push' && github.ref_name == 'main' }}"
# Compares the data from the specified "output-file-path" and compares
# with the latest data from the gh-pages branch. If performance regression
# occurs, fails the test and alerts. Will only comment in if performance
# regression has occurred.
# NOTE: it is important that this does not modify gh-pages on pull request
# https://github.com/benchmark-action/github-action-benchmark?tab=readme-ov-file#caveats
# This only updates gh-pages if a push to main occurs
- name: Compare benchmarks
uses: benchmark-action/github-action-benchmark@v1
with:
# What benchmark tool the output.txt came from
tool: 'pytest'
# Where the output from the benchmark tool is stored
output-file-path: output.json
# Where the previous data file is stored
# should fail consistently
alert-threshold: "120%"
# Workflow will fail when an alert happens
fail-on-alert: true
# Comment on the PR if the branch is not a fork
comment-on-alert: true
# Enable Job Summary for PRs
summary-always: true
# Always leave a comment
#comment-always: true
github-token: ${{ secrets.GITHUB_TOKEN }}
# Push and deploy GitHub pages branch automatically
auto-push: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
save-data-file: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ One other package, pytest-xdist, is not required, but can be used to speed up ru
the test suite by running tests in parallel. To use it call pytest with the
```-n auto``` option.

Two additional packages, pytest-benchmark and requests, are required if you need to locally run
single core regression testing of the reference simulations. For more realistic benchmarking at
any level of scale, and for instruction on regression testing with pytest, see [Benchmarking](https://pyuvsim.readthedocs.io/en/latest/developers.html#benchmarking).

One way to ensure you have all the needed packages is to use the included
`environment.yaml` file to create a new environment that will
contain all the optional dependencies along with dependencies required for
Expand Down
30 changes: 30 additions & 0 deletions docs/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,40 @@ For more details, see `reference_simulations/README.md <https://github.com/Radio
Benchmarking
------------

Benchmarking Simulations
~~~~~~~~~~~~~~~~~~~~~~~~

The ``benchmarking`` directory contains tools to test the runtime and memory usage of large simulations. There is no requirement to check benchmarks for pull requests, but it's a good idea to make sure changes don't drastically alter the runtime. The file BENCHMARKS.log keeps a record of performance over time.

The README file in the ``benchmarking`` directory gives more details on how to do benchmarking.

Note that the benchmarking scripts are designed only for SLURM systems.

For more details, see `benchmarking/README.md <https://github.com/RadioAstronomySoftwareGroup/pyuvsim/tree/main/benchmarking>`_.

Running a Reference Simulation with pytest-benchmark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To run a single core regression test of the reference simulations, you need to specify a reference
simulation with the ``refsim`` flag and use ``benchmark-only``. Additionally, you need to use
mpiexec to run pytest as follows:

.. code-block:: python
# use mpiexec to run pytest specifying one core
> mpiexec -n 1 -np 1 pytest --refsim=1.1_uniform --benchmark-only
Here "1.1_uniform" would be the specific reference simulation being tested. You can use the ``refsim``
flag multiple times to parametrize multiple reference simulations: ``--refsim=refsim1 --refsim=refsim2``.

We run single core regression tests of the available reference simulations with pytest and pytest-benchmark via our github ci workflow on every push or pull request. We do so to ensure output and runtime consistency. As we only run the simulations with a single core, the benchmarking aspect of these tests is only relevant for linear operations and not a test of any parallelism.

The available ``refsim`` values are:

* 1.1_uniform
* 1.1_gauss
* 1.1_mwa
* 1.2_uniform
* 1.2_gauss
* 1.3_uniform
* 1.3_gauss
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ dependencies:
- pytest
- pytest-cov>=5.0.0
- pytest-xdist
- pytest-benchmark
- requests
- pyuvdata>=3.1.2
- pyyaml>=5.4.1
- scipy>=1.8
Expand Down
6 changes: 6 additions & 0 deletions src/pyuvsim/data/baseline_lite.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Name Number BeamID E N U

ANT1 0 0 0.0000 0.0000 0.0000
ANT2 1 0 50.000 0.0000 0.0000
ANT3 2 0 0.0000 -50.00 0.0000
ANT4 3 0 26.000 -26.00 0.0000
6 changes: 6 additions & 0 deletions src/pyuvsim/data/bl_lite_gauss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
beam_paths:
0 : !AnalyticBeam
class: GaussianBeam
sigma: 0.08449
telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
telescope_name: BLLITE
5 changes: 5 additions & 0 deletions src/pyuvsim/data/bl_lite_uniform.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
beam_paths:
0 : !AnalyticBeam
class: UniformBeam
telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
telescope_name: BLLITE
7 changes: 7 additions & 0 deletions src/pyuvsim/data/mwa88_nocore_config_MWA.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
beam_paths:
0: !UVBeam
filename: mwa_full_embedded_element_pattern.h5
path_variable: pyuvsim.data.DATA_PATH
pixels_per_deg: 1
telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
telescope_name: MWA
6 changes: 6 additions & 0 deletions src/pyuvsim/data/mwa88_nocore_config_gauss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
beam_paths:
0 : !AnalyticBeam
class: GaussianBeam
sigma: 0.08449
telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
telescope_name: MWA
13 changes: 13 additions & 0 deletions src/pyuvsim/data/test_catalogs/letter_R_12pt_2458098.38824015.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SOURCE_ID RA_ICRS [deg] Dec_ICRS [deg] Flux [Jy] Frequency [Hz]
HERATEST5 59.37045 -28.778843 1 100000000.0
HERATEST6 57.08925 -28.74223 1 100000000.0
HERATEST12 59.38125 -27.778828 1 100000000.0
HERATEST13 58.25100 -27.765359 1 100000000.0
HERATEST21 59.39115 -26.779049 1 100000000.0
HERATEST22 58.27125 -26.765736 1 100000000.0
HERATEST23 57.15150 -26.743624 1 100000000.0
HERATEST30 59.40120 -25.779269 1 100000000.0
HERATEST31 57.18090 -25.744495 1 100000000.0
HERATEST39 59.41035 -24.779242 1 100000000.0
HERATEST40 58.30965 -24.766704 1 100000000.0
HERATEST41 57.20820 -24.744905 1 100000000.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
SOURCE_ID RA_ICRS [deg] Dec_ICRS [deg] Flux [Jy] Frequency [Hz]
HERATEST0 68.48535 -28.559917 1 100000000.0
HERATEST1 66.21075 -28.669444 1 100000000.0
HERATEST2 63.93300 -28.742866 1 100000000.0
HERATEST3 62.79210 -28.76516 1 100000000.0
HERATEST4 61.65180 -28.779055 1 100000000.0
HERATEST5 59.37045 -28.778843 1 100000000.0
HERATEST6 57.08925 -28.74223 1 100000000.0
HERATEST7 54.81165 -28.668388 1 100000000.0
HERATEST8 52.53720 -28.558443 1 100000000.0
HERATEST9 68.41275 -27.564489 1 100000000.0
HERATEST10 66.15885 -27.671835 1 100000000.0
HERATEST11 63.90090 -27.743365 1 100000000.0
HERATEST12 59.38125 -27.778828 1 100000000.0
HERATEST13 58.25100 -27.765359 1 100000000.0
HERATEST14 54.86385 -27.670802 1 100000000.0
HERATEST15 52.60995 -27.563048 1 100000000.0
HERATEST16 68.34300 -26.568897 1 100000000.0
HERATEST17 67.22640 -26.625843 1 100000000.0
HERATEST18 66.10875 -26.674063 1 100000000.0
HERATEST19 63.87120 -26.744231 1 100000000.0
HERATEST20 62.75160 -26.766141 1 100000000.0
HERATEST21 59.39115 -26.779049 1 100000000.0
HERATEST22 58.27125 -26.765736 1 100000000.0
HERATEST23 57.15150 -26.743624 1 100000000.0
HERATEST24 54.91395 -26.673054 1 100000000.0
HERATEST25 53.79645 -26.624634 1 100000000.0
HERATEST26 52.67985 -26.56749 1 100000000.0
HERATEST27 68.27595 -25.573194 1 100000000.0
HERATEST28 66.06075 -25.676232 1 100000000.0
HERATEST29 63.84210 -25.745088 1 100000000.0
HERATEST30 59.40120 -25.779269 1 100000000.0
HERATEST31 57.18090 -25.744495 1 100000000.0
HERATEST32 54.96225 -25.675246 1 100000000.0
HERATEST33 52.74720 -25.571818 1 100000000.0
HERATEST34 68.21160 -24.577407 1 100000000.0
HERATEST35 66.01455 -24.678446 1 100000000.0
HERATEST36 63.81510 -24.745485 1 100000000.0
HERATEST37 62.71350 -24.76709 1 100000000.0
HERATEST38 61.61280 -24.779435 1 100000000.0
HERATEST39 59.41035 -24.779242 1 100000000.0
HERATEST40 58.30965 -24.766704 1 100000000.0
HERATEST41 57.20820 -24.744905 1 100000000.0
HERATEST42 53.90925 -24.630704 1 100000000.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SOURCE_ID RA_ICRS [deg] Dec_ICRS [deg] Flux [Jy] Frequency [Hz]
TWOPOINT0 108.05159 -0.9894784 1 100000000.0
TWOPOINT1 291.49730 -1.0600652 1 100000000.0
20 changes: 20 additions & 0 deletions src/pyuvsim/data/test_config/obsparam_ref_1.1_gauss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
filing:
outdir: '.'
outfile_name: 'ref_1.1_gauss'
output_format: 'uvh5'
freq:
Nfreqs: 1
channel_width: 80000.0
start_freq: 100000000.0
sources:
catalog: '../test_catalogs/mock_catalog_heratext_2458098.38824015.txt'
telescope:
array_layout: '../mwa_nocore_layout.csv'
telescope_config_name: '../mwa88_nocore_config_gauss.yaml'
time:
Ntimes: 1
integration_time: 11.0
start_time: 2458098.38824015
ordering:
conjugation_convention: ant1<ant2
blt_order: [time, baseline]
Loading

0 comments on commit 8cb4e68

Please sign in to comment.