Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3755: Remove unused options and add unit tests to SOSS extraction algorithm #9000

Draft
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

emolter
Copy link
Collaborator

@emolter emolter commented Dec 10, 2024

Resolves JP-3755

Closes #8804

This PR addresses reducing the technical debt of the SOSS ATOCA algorithm. There are a lot of changed lines, so I'm annotating this PR with comments during a self-review. The high-level changes are:

  • Removed many unused functions, i.e., ones that were never called during a pipeline run
  • Removed many optional (and occasionally required) parameters from functions when they were either never changed from their default value, or always set to the same value (AND were not set-able by the user or by parameter reference files)
  • Added a unit testing suite, including a detector model (wave grid, trace profiles, kernels, throughput functions, mock data, masks, etc) scaled down by a factor of 10 in each spatial dimension. Added unit tests for most helper functions.
  • Fixed various bugs, usually found when unit-testing function edge cases. I'm attempting to annotate all of these, so it's clear where actual changes to output have occurred. (These obviously require the most scrutiny).
  • Many small refactors for clarity

There are several outstanding items to complete before this can be merged:

  • Validate that changes in behavior (e.g., differences in retrieved Tikhonov regularization factor) are ok, i.e., not adversely affecting science results.
  • Get several questions about intended / current behavior answered by instrument team members.
  • Check whether some removed functions should be retained in the pipeline version of ATOCA, e.g., because they are particularly helpful for INS work on SOSS data (even though not used during pipeline runs). edit: Joe is unaware of any usages like this- as far as he knows, everyone uses the pipeline-level interface only.
  • Add documentation of the algorithm to the pipeline readthedocs page. edit: Nadia suggests this should be done by INS folks, and that it's fine to merge this without the documentation updates.

Tasks

  • request a review from someone specific, to avoid making the maintainers review every PR
  • add a build milestone, i.e. Build 11.3 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly
news fragment change types...
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change
  • changes/<PR#>.docs.rst
  • changes/<PR#>.stpipe.rst
  • changes/<PR#>.datamodels.rst
  • changes/<PR#>.scripts.rst
  • changes/<PR#>.fits_generator.rst
  • changes/<PR#>.set_telescope_pointing.rst
  • changes/<PR#>.pipeline.rst

stage 1

  • changes/<PR#>.group_scale.rst
  • changes/<PR#>.dq_init.rst
  • changes/<PR#>.emicorr.rst
  • changes/<PR#>.saturation.rst
  • changes/<PR#>.ipc.rst
  • changes/<PR#>.firstframe.rst
  • changes/<PR#>.lastframe.rst
  • changes/<PR#>.reset.rst
  • changes/<PR#>.superbias.rst
  • changes/<PR#>.refpix.rst
  • changes/<PR#>.linearity.rst
  • changes/<PR#>.rscd.rst
  • changes/<PR#>.persistence.rst
  • changes/<PR#>.dark_current.rst
  • changes/<PR#>.charge_migration.rst
  • changes/<PR#>.jump.rst
  • changes/<PR#>.clean_flicker_noise.rst
  • changes/<PR#>.ramp_fitting.rst
  • changes/<PR#>.gain_scale.rst

stage 2

  • changes/<PR#>.assign_wcs.rst
  • changes/<PR#>.badpix_selfcal.rst
  • changes/<PR#>.msaflagopen.rst
  • changes/<PR#>.nsclean.rst
  • changes/<PR#>.imprint.rst
  • changes/<PR#>.background.rst
  • changes/<PR#>.extract_2d.rst
  • changes/<PR#>.master_background.rst
  • changes/<PR#>.wavecorr.rst
  • changes/<PR#>.srctype.rst
  • changes/<PR#>.straylight.rst
  • changes/<PR#>.wfss_contam.rst
  • changes/<PR#>.flatfield.rst
  • changes/<PR#>.fringe.rst
  • changes/<PR#>.pathloss.rst
  • changes/<PR#>.barshadow.rst
  • changes/<PR#>.photom.rst
  • changes/<PR#>.pixel_replace.rst
  • changes/<PR#>.resample_spec.rst
  • changes/<PR#>.residual_fringe.rst
  • changes/<PR#>.cube_build.rst
  • changes/<PR#>.extract_1d.rst
  • changes/<PR#>.resample.rst

stage 3

  • changes/<PR#>.assign_mtwcs.rst
  • changes/<PR#>.mrs_imatch.rst
  • changes/<PR#>.tweakreg.rst
  • changes/<PR#>.skymatch.rst
  • changes/<PR#>.exp_to_source.rst
  • changes/<PR#>.outlier_detection.rst
  • changes/<PR#>.tso_photometry.rst
  • changes/<PR#>.stack_refs.rst
  • changes/<PR#>.align_refs.rst
  • changes/<PR#>.klip.rst
  • changes/<PR#>.spectral_leak.rst
  • changes/<PR#>.source_catalog.rst
  • changes/<PR#>.combine_1d.rst
  • changes/<PR#>.ami.rst

other

  • changes/<PR#>.wfs_combine.rst
  • changes/<PR#>.white_light.rst
  • changes/<PR#>.cube_skymatch.rst
  • changes/<PR#>.engdb_tools.rst
  • changes/<PR#>.guider_cds.rst

@melanieclarke melanieclarke added this to the Build 11.3 milestone Dec 12, 2024
jwst/extract_1d/soss_extract/atoca.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/atoca.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/atoca.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/atoca.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/atoca.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/soss_syscor.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/soss_utils.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/soss_utils.py Show resolved Hide resolved
jwst/extract_1d/soss_extract/soss_utils.py Show resolved Hide resolved
Comment on lines +110 to +117
# In both orders, the errors on the replaced pixels are roughly
# half of the errors on the original good pixels
# There are enough replaced pixels here (~30) that small-number statistics cannot account for this
# The reason is because the code chooses the lower error between the two nearest-flux
# data points, and since the errors in our tests are uncorrelated with the flux values,
# this leads to a factor-of-2 decrease
# It's not clear to me that picking the smaller of two error values is the right thing to do
# but that behavior is documented
Copy link
Collaborator Author

@emolter emolter Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this is actually the intended behavior or not, because it seems like a questionable choice to me (although admittedly I don't know very much). Any opinions @hover2pi ?

@emolter emolter requested review from hover2pi and tapastro December 17, 2024 20:25
@emolter
Copy link
Collaborator Author

emolter commented Dec 20, 2024

diff-3755-release-0
diff-3755-release-1
diff-3755-release-2
diff-3755-release-3
diff-3755-release-4
diff-3755-release-5

@hover2pi I'm attaching the first of a few comparisons I plan to make between the output spectra from this PR branch vs the current release branch (jwst==1.16.1, which corresponds to DMS build 11.1.1). The test data here were some SOSS data I had lying around from fixing #8166. The attached 6 spectra show three integrations (indices 71, 72, and 73, where 72 contains the data causing the problem from the aforementioned issue). So the filename diff-3755-release-0 is the order 1 spectrum for integration 71, diff-3755-release-1 is the order 2 spectrum for integration 71, diff-3755-release-2 is the order 1 spectrum for integration 72, etc.

The upshot here is that the typical differences between my PR branch and the current main branch are typically of order parts per ten thousand. The residuals are a bit larger at very long wavelengths (beyond 2.5 um) and it's not initially obvious to me why, but the differences are still what I would consider very small. Let me know your thoughts, and whether you think I should investigate anything in these residuals further or just deem them "close enough".

I plan to try this with the SUBSTRIP96 data you mentioned on #8983 next.

Random side note, the regularization factors (both initial guess and best-fitting solution) are larger in the PR branch by around half an order of magnitude and sometime 1 order of magnitude. I don't think the results should be sensitive to the specific choice of regularization factor; the code tests in 1/4 order-of-mag increments, and the likelihoods seem very similar between adjacent factors. This was what I was concerned about initially RE numerical differences, but as shown above it doesn't seem to matter much for the outputs.

@emolter
Copy link
Collaborator Author

emolter commented Dec 20, 2024

substrip96-spec0
substrip96-spec1
substrip96-spec2
substrip96-spec3

Same plots for some substrip96 data, these are the first two integrations from jw03596001001_03102_00001-seg001. They should be interpreted the same way; first plot shows integration 0 order 1, second plot shows integration 0 order 2, third shows int 1 order 1, 4th shows int 1 order 2.

For order 1, same story as above: the differences are very small, except for the flux drop that is still present for jwst==1.16.1 but was fixed by your previous PR.

The differences for order 2 are much, much more substantial. It's not clear which of these looks more like the expected spectrum. I did make one change to the "Model the remaining part of order 2" portion of the code that may have caused this, which set some 0-valued pixels to NaN. I'll look into this further of course, but perhaps that was the wrong thing to do.

Copy link
Contributor

@hover2pi hover2pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emolter Everything looks good to me! I tried to respond to all your comments that asked questions. I do need to do a little reading on how the ATOCA algorithm works from Antoine's paper and then may have better answers. Ultimately, if the results are within our tolerances after removing a whole bunch of stuff, I think you were successful. We'll be revisiting several aspects of the code in the coming year, particularly by replacing the specprofile and speckernel reference files with pastasoss functionality. As it stands, I think it's in great shape. Thanks!

@hover2pi
Copy link
Contributor

hover2pi commented Jan 9, 2025

@emolter Regarding the differences in the extracted order 2 spectra from the SUBSTRIP96 data between the two branches, I'll definitely take a closer look. It would be great to understand why it changed but ultimately order 2 is not really used from that data. Between the heavy contamination from order 1 at long wavelengths and the order 2 throughput dropping to almost 0 around 1.2um, it's not too worrisome.

@emolter
Copy link
Collaborator Author

emolter commented Jan 10, 2025

@@ -277,7 +276,7 @@ def find_spectral_order_index(refmodel, order):
return -1


def get_soss_traces(refmodel, pwcpos, order, subarray, interp=True):
def _get_soss_traces(refmodel, pwcpos, order, subarray):
Copy link
Contributor

@tapastro tapastro Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hover2pi Is there any expectation that external users may wish to use pastasoss but not ATOCA? If so, would they use this function, or is the NIRISS team still planning to direct users to the pastasoss standalone package?

If there is a chance users may wish to call this function independently, we may want to reconsider the name change.

@tapastro
Copy link
Contributor

Pending some small discussions and clean regression tests, this looks good to me! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce technical debt of ATOCA/SOSS extraction code
4 participants