1.1 ref sim ci workflow (#514)

* added catalog file for 1.1 reference simulations to /data * possible approach to packaging reference simulation data files * moved files to better location * fixed up filepaths for 1.1_uniform reference simulation in /data * attempting to add benchmarks and updated ref_sim_1.1 in /data * benchmark needs more work but almost there * re-commented line * preliminary commit for benchmark workflow. Still need to set up 1.1_uniform for remote downloading. * pytest should now hopefully work remotely -- file on gdrive * trying slight change to see if all other tests run to completion properly * updated testsuite to install requests * test change * change and mpiexec * tried possibly fixing caching * benchmark action stuff * minor updates to see if alert works and if -np 4 speeds up run * test 3 * just alert * test failed assert * swap back to == * swapping back * small TODO * started adding new ref sim tests * formatting * added 1.1_gauss * got 1.1 uniform and gauss working, and resolved warnings * cosmetic update to testsuite * slight changes to test_run_ref.py * swapped to no longer writing the UVData objects and just returning one * changed gaussian beam to have proper syntax as well * preliminary attempt at sequential execution using matrix-lock * hopefully fixes the issue * 2nd attempt * had bad version of matrix-lock * removed matrix-lock * attempt to fix up caching * dummy commit * swapped back to triplicate for tests -- no longer doing command line input to pytest * added 1.3 sim files to data * added 1.2 simulations to data * google drive links * swapped workflow simulation run order * Swapped to downloading files from the BDR -- will trim workflow runtime down to ~1 hour using pedantic. Added mwa uvbeam sim files to data not yet tested. * figured out approach to parametrize all reference simulation tests using workflow. Still need to integrate mwa sims. Added pedantic benchmarking. Need to determine best approach to setting up workflow matrix. * filled out workflow refsim names, and added 1.1_mwa reference simulation to pytest and workflow. * changed the Brown Digital Repository file downloading to use a collection approach, added some print statements, added a line for formatting * removed 1.2_mwa files, minor comments change * Intermediate commit while attempting to switch benchmarking approach to use artifacts. Need to re-integrate the benchmark action, and create structure for concatenating benchmark output and uploading it. * fixed syntax error * commented out line to be re-added later * failed to comment out another line that skipped second part of workflow * test for python script which concatenates benchmarks * intermediate commit * first attempt at gh-pages approach * dummy change * preliminary approach to only pushing results of benchmark if push to 'main', and running only if Tests finishes successfully * removed dependence on Tests as that workflow seems to be failing independently * hopefully fixed yaml syntax * added initial output statistics to the reference simulation comparisons. currently only asserts '==' can implement others or even an absolute check * re-added setting history to be equal * fix * all current ref sims should run now, and implemented hopefully more robust downloading * commented out the 0 tolerance sim comparison check * added dummy counter (#513) * added dummy counter * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * only one TODO left to resolve in test_run_ref (determining how strict object comparison should be) * cleaned up compare_post_benchmark.yaml a bit. now need to test running compare_post_benchmark using completion of another workflow (pull request and push) * updated approach to computing num_mismatched and fixed style * swapped compare_post_benchmark to run after Tests * minor edits to compare_post_benchmark so hopefully it runs * not sure why linking to tests isn't working -- swapping back * edited README / environment.yaml to discuss/require requests and pytest-benchmark * edited README * swapping to have defaults expected for pull request * changed underscore to hyphen to match style * Tentative README update -- should probably add a section of regression testing / ci in the developer section of the docs, and amend README to link to it * made data comparison same as np.testing.assert_allclose defaults, removed some commented out code and comments * fixed typos in ci workflow * fixed formatting for a line * Futher updated the README * switching back to multiple ids * refactored job matrix * swapped discussion to docs for pytest regression testing --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
RadioAstronomySoftwareGroup · Dec 19, 2024 · 8cb4e68 · 8cb4e68
1 parent f5e5228
commit 8cb4e68
Show file tree

Hide file tree

Showing 21 changed files with 773 additions and 0 deletions.
diff --git a/.github/workflows/compare-post-benchmark.yaml b/.github/workflows/compare-post-benchmark.yaml
@@ -0,0 +1,202 @@
+name: "Run Compare Post Benchmark"
+on: [push, pull_request]
+#on:
+#  workflow_run:
+#    workflows: [Tests]
+#    types:
+#      - in_progress
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  benchmark:
+    # Job that clones pyuvsim@main, sets up a conda environment with the necessary dependencies,
+    # then locally installs pyuvsim and additionally installs pytest-benchmark and requests from
+    # PYPI. Runs in parallel as a matrix with input individual reference simulations. The "id"
+    # input is passed as a flag to pytest which parametrizes the reference simulation test
+    # function. pytest-benchmark output is saved as an artifact with its current workflow run
+    # and attempt as part of name key.
+    #
+    # Link to discussion of artifacts
+    # https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow#about-workflow-artifacts
+    # https://github.com/actions/upload-artifact
+    #
+    # uncomment when Tests is working and done testing
+    #if: github.event.workflow_run.conclusion == 'success'
+    name: Performance Benchmark
+    env:
+      ENV_NAME: pyuvsim_tests_mpich
+      PYTHON: "3.12"
+    runs-on: ubuntu-latest
+
+    strategy:
+      # all jobs should run in parallel
+      matrix:
+        id: [1.1_uniform, 1.1_gauss, 1.1_mwa, 1.2_uniform, 1.2_gauss, 1.3_uniform, 1.3_gauss]
+
+    defaults:
+      run:
+        # Adding -l {0} helps ensure conda can be found properly.
+        shell: bash -l {0}
+    steps:
+      - uses: actions/checkout@main
+
+      - name: Setup Miniforge
+        uses: conda-incubator/setup-miniconda@v3
+        with:
+          miniforge-version: latest
+          python-version: ${{ env.PYTHON }}
+          environment-file: ci/${{ env.ENV_NAME }}.yaml
+          activate-environment: ${{ env.ENV_NAME }}
+          run-post: false
+
+      - name: Conda Info
+        run: |
+          conda info -a
+          conda list
+          PYVER=`python -c "import sys; print('{:d}.{:d}'.format(sys.version_info.major, sys.version_info.minor))"`
+          if [[ $PYVER != $PYTHON ]]; then
+            exit 1;
+          fi
+
+      # pip install benchmark utility and requests from PYPI, and local install pyuvsim
+      - name: Install
+        run: |
+          pip install pytest-benchmark
+          pip install requests
+          pip install .
+
+      # make the artifacts directory, then run pytest using mpiexec with only 1 node and core, specifying the
+      # reference simulation to run using the "refsim" flag. Save the pytest benchmark output in artifacts/
+      # with a sufficiently unique name
+      - name: Run benchmark
+        run: |
+          mkdir artifacts/
+          mpiexec -n 1 -np 1 pytest --refsim=${{ matrix.id }} --benchmark-only --benchmark-json artifacts/output_${{ matrix.id }}.json -s
+
+      # upload the benchmark output as an artifact with name key corresponding to the current
+      # workflow run and attempt only store artifacts for 1 day
+      - name: Upload result artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: ${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.id }}
+          path: artifacts/
+          if-no-files-found: error
+          include-hidden-files: true
+          retention-days: 1
+
+  collate-post-benchmark:
+    # Job that loads the saved artifacts corresponding to the specific workflow run and attempt id,
+    # then creates a net benchmark output file named output.json with a python script action. The
+    # net benchmark file should still be accurate except the explicit machine info will be mostly
+    # lost. The net benchmark file is then fed to github-actions-benchmark which compares the
+    # current benchmark output with the latest data in the gh-pages branch. If the current workflow
+    # is a push to main, github-actions-benchmark then pushes the current benchmark output to
+    # gh-pages. If, during the benchmark comparison, a performance regression occurs, a comment of
+    # the benchmark comparison output is made on the workflow and this Job fails.
+    #
+    # Inspired by this workflow by yewstack/yew and the github-action-benchmark README:
+    # https://github.com/yewstack/yew/blob/master/.github/workflows/benchmark.yml
+    # https://github.com/yewstack/yew/blob/master/.github/workflows/post-benchmark.yml
+    # https://github.com/benchmark-action/github-action-benchmark
+    # https://github.com/actions/download-artifact
+
+    name: Concatenate and Post Benchmark Results
+    needs: benchmark
+    runs-on: ubuntu-latest
+
+    steps:
+      # Checkout repo for the github-action-benchmark action
+      - uses: actions/checkout@v4
+
+      # setup python
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.x'
+
+      # only downloads artifacts from current workflow and run attempt via the pattern matching
+      # loads the saved benchmark artifacts from running the benchmark matrix into artifacts/
+      - name: Download result artifacts
+        uses: actions/download-artifact@v4
+        with:
+          github-token: "${{ secrets.GITHUB_TOKEN }}"
+          pattern: ${{ github.run_id }}-${{ github.run_attempt }}-*
+          merge-multiple: true
+          path: artifacts
+
+      # prints directory info recursively, removable
+      # (could maybe swap this to exa or lsd because base ls doesn't do tree)
+      - name: Display structure of downloaded files
+        run: ls -R
+
+      # approach to putting all the benchmark output in one file, with the machine/run info
+      # of only one the pytest benchmark runs. Loads all the benchmark output artifact files,
+      # then takes the benchmark timing infor fron n-1 files and adds it to the first file.
+      # With this approach, benchmark comparison output is only a single table, and we only
+      # have one comment on alert in the workflow.
+      - uses: jannekem/run-python-script-action@v1
+        with:
+          script: |
+            import os
+            import json
+
+            # make list of paths to artifact files excluding hidden files
+            filepath_arr = [os.path.join('artifacts', bench) for bench in os.listdir('artifacts') if not bench.startswith('.')]
+            print(filepath_arr)
+
+            output_jsons = []
+
+            # open each filepath in the filepath_arr, load it as a json, and append it to an empty list
+            for filepath in filepath_arr:
+                with open(filepath) as f:
+                    output_jsons.append(json.load(f))
+
+            # choose the first json as the one to modify to contain all the benchmark data
+            net_json = output_jsons[0]
+
+            # iterate through the other jsons (1-n) and append their benchmark data to net_json
+            for json_out in output_jsons[1:]:
+                net_json['benchmarks'].append(json_out['benchmarks'][0])
+
+            # save net_json as json with name output.json in current working directory
+            with open('output.json', 'w') as f:
+                json.dump(net_json, f)
+
+      # Print github event_name and ref_name and the boolean check for whether gh-pages should be updated
+      - name: Print Event, Ref, and Upload Boolean
+        run: |
+          echo "Event Name: ${{ github.event_name }}"
+          echo "Ref Name: ${{ github.ref_name }}"
+          echo "Update gh-pages: ${{ github.event_name == 'push' && github.ref_name == 'main' }}"
+
+      # Compares the data from the specified "output-file-path" and compares
+      # with the latest data from the gh-pages branch. If performance regression
+      # occurs, fails the test and alerts. Will only comment in if performance
+      # regression has occurred.
+      # NOTE: it is important that this does not modify gh-pages on pull request
+      # https://github.com/benchmark-action/github-action-benchmark?tab=readme-ov-file#caveats
+      # This only updates gh-pages if a push to main occurs
+      - name: Compare benchmarks
+        uses: benchmark-action/github-action-benchmark@v1
+        with:
+          # What benchmark tool the output.txt came from
+          tool: 'pytest'
+          # Where the output from the benchmark tool is stored
+          output-file-path: output.json
+          # Where the previous data file is stored
+          # should fail consistently
+          alert-threshold: "120%"
+          # Workflow will fail when an alert happens
+          fail-on-alert: true
+          # Comment on the PR if the branch is not a fork
+          comment-on-alert: true
+          # Enable Job Summary for PRs
+          summary-always: true
+          # Always leave a comment
+          #comment-always: true
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          # Push and deploy GitHub pages branch automatically
+          auto-push: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
+          save-data-file: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
diff --git a/README.md b/README.md
@@ -127,6 +127,10 @@ One other package, pytest-xdist, is not required, but can be used to speed up ru
 the test suite by running tests in parallel. To use it call pytest with the
 ```-n auto``` option.
 
+Two additional packages, pytest-benchmark and requests, are required if you need to locally run
+single core regression testing of the reference simulations. For more realistic benchmarking at
+any level of scale, and for instruction on regression testing with pytest, see [Benchmarking](https://pyuvsim.readthedocs.io/en/latest/developers.html#benchmarking).
+
 One way to ensure you have all the needed packages is to use the included
 `environment.yaml` file to create a new environment that will
 contain all the optional dependencies along with dependencies required for

diff --git a/docs/developers.rst b/docs/developers.rst
@@ -31,10 +31,40 @@ For more details, see `reference_simulations/README.md <https://github.com/Radio
 Benchmarking
 ------------
 
+Benchmarking Simulations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
 The ``benchmarking`` directory contains tools to test the runtime and memory usage of large simulations. There is no requirement to check benchmarks for pull requests, but it's a good idea to make sure changes don't drastically alter the runtime. The file BENCHMARKS.log keeps a record of performance over time.
 
 The README file in the ``benchmarking`` directory gives more details on how to do benchmarking.
 
 Note that the benchmarking scripts are designed only for SLURM systems.
 
 For more details, see `benchmarking/README.md <https://github.com/RadioAstronomySoftwareGroup/pyuvsim/tree/main/benchmarking>`_.
+
+Running a Reference Simulation with pytest-benchmark
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To run a single core regression test of the reference simulations, you need to specify a reference
+simulation with the ``refsim`` flag and use ``benchmark-only``. Additionally, you need to use
+mpiexec to run pytest as follows:
+
+    .. code-block:: python
+
+        # use mpiexec to run pytest specifying one core
+        > mpiexec -n 1 -np 1 pytest --refsim=1.1_uniform --benchmark-only
+
+Here "1.1_uniform" would be the specific reference simulation being tested. You can use the ``refsim``
+flag multiple times to parametrize multiple reference simulations: ``--refsim=refsim1 --refsim=refsim2``.
+
+We run single core regression tests of the available reference simulations with pytest and pytest-benchmark via our github ci workflow on every push or pull request. We do so to ensure output and runtime consistency. As we only run the simulations with a single core, the benchmarking aspect of these tests is only relevant for linear operations and not a test of any parallelism.
+
+The available ``refsim`` values are:
+
+* 1.1_uniform
+* 1.1_gauss
+* 1.1_mwa
+* 1.2_uniform
+* 1.2_gauss
+* 1.3_uniform
+* 1.3_gauss
diff --git a/environment.yml b/environment.yml
@@ -15,6 +15,8 @@ dependencies:
   - pytest
   - pytest-cov>=5.0.0
   - pytest-xdist
+  - pytest-benchmark
+  - requests
   - pyuvdata>=3.1.2
   - pyyaml>=5.4.1
   - scipy>=1.8

diff --git a/src/pyuvsim/data/baseline_lite.csv b/src/pyuvsim/data/baseline_lite.csv
@@ -0,0 +1,6 @@
+Name Number   BeamID   E          N          U
+
+ANT1        0        0     0.0000   0.0000  0.0000
+ANT2        1        0     50.000   0.0000  0.0000
+ANT3        2        0     0.0000   -50.00  0.0000
+ANT4        3        0     26.000   -26.00  0.0000
diff --git a/src/pyuvsim/data/bl_lite_gauss.yaml b/src/pyuvsim/data/bl_lite_gauss.yaml
@@ -0,0 +1,6 @@
+beam_paths:
+    0 : !AnalyticBeam
+        class: GaussianBeam
+        sigma: 0.08449
+telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
+telescope_name: BLLITE
diff --git a/src/pyuvsim/data/bl_lite_uniform.yaml b/src/pyuvsim/data/bl_lite_uniform.yaml
@@ -0,0 +1,5 @@
+beam_paths:
+    0 : !AnalyticBeam
+        class: UniformBeam
+telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
+telescope_name: BLLITE
diff --git a/src/pyuvsim/data/mwa88_nocore_config_MWA.yaml b/src/pyuvsim/data/mwa88_nocore_config_MWA.yaml
@@ -0,0 +1,7 @@
+beam_paths:
+  0: !UVBeam
+      filename: mwa_full_embedded_element_pattern.h5
+      path_variable: pyuvsim.data.DATA_PATH
+      pixels_per_deg: 1
+telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
+telescope_name: MWA
diff --git a/src/pyuvsim/data/mwa88_nocore_config_gauss.yaml b/src/pyuvsim/data/mwa88_nocore_config_gauss.yaml
@@ -0,0 +1,6 @@
+beam_paths:
+    0 : !AnalyticBeam
+        class: GaussianBeam
+        sigma: 0.08449
+telescope_location: (-30.72152777777791, 21.428305555555557, 1073.0000000093132)
+telescope_name: MWA
diff --git a/src/pyuvsim/data/test_catalogs/letter_R_12pt_2458098.38824015.txt b/src/pyuvsim/data/test_catalogs/letter_R_12pt_2458098.38824015.txt
@@ -0,0 +1,13 @@
+SOURCE_ID	RA_ICRS [deg]	Dec_ICRS [deg]	Flux [Jy]	Frequency [Hz]
+HERATEST5	59.37045	-28.778843	1	100000000.0
+HERATEST6	57.08925	-28.74223	1	100000000.0
+HERATEST12	59.38125	-27.778828	1	100000000.0
+HERATEST13	58.25100	-27.765359	1	100000000.0
+HERATEST21	59.39115	-26.779049	1	100000000.0
+HERATEST22	58.27125	-26.765736	1	100000000.0
+HERATEST23	57.15150	-26.743624	1	100000000.0
+HERATEST30	59.40120	-25.779269	1	100000000.0
+HERATEST31	57.18090	-25.744495	1	100000000.0
+HERATEST39	59.41035	-24.779242	1	100000000.0
+HERATEST40	58.30965	-24.766704	1	100000000.0
+HERATEST41	57.20820	-24.744905	1	100000000.0
diff --git a/src/pyuvsim/data/test_catalogs/mock_catalog_heratext_2458098.38824015.txt b/src/pyuvsim/data/test_catalogs/mock_catalog_heratext_2458098.38824015.txt
@@ -0,0 +1,44 @@
+SOURCE_ID	RA_ICRS [deg]	Dec_ICRS [deg]	Flux [Jy]	Frequency [Hz]
+HERATEST0	68.48535	-28.559917	1	100000000.0
+HERATEST1	66.21075	-28.669444	1	100000000.0
+HERATEST2	63.93300	-28.742866	1	100000000.0
+HERATEST3	62.79210	-28.76516	1	100000000.0
+HERATEST4	61.65180	-28.779055	1	100000000.0
+HERATEST5	59.37045	-28.778843	1	100000000.0
+HERATEST6	57.08925	-28.74223	1	100000000.0
+HERATEST7	54.81165	-28.668388	1	100000000.0
+HERATEST8	52.53720	-28.558443	1	100000000.0
+HERATEST9	68.41275	-27.564489	1	100000000.0
+HERATEST10	66.15885	-27.671835	1	100000000.0
+HERATEST11	63.90090	-27.743365	1	100000000.0
+HERATEST12	59.38125	-27.778828	1	100000000.0
+HERATEST13	58.25100	-27.765359	1	100000000.0
+HERATEST14	54.86385	-27.670802	1	100000000.0
+HERATEST15	52.60995	-27.563048	1	100000000.0
+HERATEST16	68.34300	-26.568897	1	100000000.0
+HERATEST17	67.22640	-26.625843	1	100000000.0
+HERATEST18	66.10875	-26.674063	1	100000000.0
+HERATEST19	63.87120	-26.744231	1	100000000.0
+HERATEST20	62.75160	-26.766141	1	100000000.0
+HERATEST21	59.39115	-26.779049	1	100000000.0
+HERATEST22	58.27125	-26.765736	1	100000000.0
+HERATEST23	57.15150	-26.743624	1	100000000.0
+HERATEST24	54.91395	-26.673054	1	100000000.0
+HERATEST25	53.79645	-26.624634	1	100000000.0
+HERATEST26	52.67985	-26.56749	1	100000000.0
+HERATEST27	68.27595	-25.573194	1	100000000.0
+HERATEST28	66.06075	-25.676232	1	100000000.0
+HERATEST29	63.84210	-25.745088	1	100000000.0
+HERATEST30	59.40120	-25.779269	1	100000000.0
+HERATEST31	57.18090	-25.744495	1	100000000.0
+HERATEST32	54.96225	-25.675246	1	100000000.0
+HERATEST33	52.74720	-25.571818	1	100000000.0
+HERATEST34	68.21160	-24.577407	1	100000000.0
+HERATEST35	66.01455	-24.678446	1	100000000.0
+HERATEST36	63.81510	-24.745485	1	100000000.0
+HERATEST37	62.71350	-24.76709	1	100000000.0
+HERATEST38	61.61280	-24.779435	1	100000000.0
+HERATEST39	59.41035	-24.779242	1	100000000.0
+HERATEST40	58.30965	-24.766704	1	100000000.0
+HERATEST41	57.20820	-24.744905	1	100000000.0
+HERATEST42	53.90925	-24.630704	1	100000000.0
diff --git a/src/pyuvsim/data/test_catalogs/two_distant_points_2458098.38824015.txt b/src/pyuvsim/data/test_catalogs/two_distant_points_2458098.38824015.txt
@@ -0,0 +1,3 @@
+SOURCE_ID	RA_ICRS [deg]	Dec_ICRS [deg]	Flux [Jy]	Frequency [Hz]
+TWOPOINT0	108.05159	-0.9894784	1	100000000.0
+TWOPOINT1	291.49730	-1.0600652	1	100000000.0
diff --git a/src/pyuvsim/data/test_config/obsparam_ref_1.1_gauss.yaml b/src/pyuvsim/data/test_config/obsparam_ref_1.1_gauss.yaml
@@ -0,0 +1,20 @@
+filing:
+  outdir: '.'
+  outfile_name: 'ref_1.1_gauss'
+  output_format: 'uvh5'
+freq:
+  Nfreqs: 1
+  channel_width: 80000.0
+  start_freq: 100000000.0
+sources:
+  catalog: '../test_catalogs/mock_catalog_heratext_2458098.38824015.txt'
+telescope:
+  array_layout: '../mwa_nocore_layout.csv'
+  telescope_config_name: '../mwa88_nocore_config_gauss.yaml'
+time:
+  Ntimes: 1
+  integration_time: 11.0
+  start_time: 2458098.38824015
+ordering:
+  conjugation_convention: ant1<ant2
+  blt_order: [time, baseline]