DM-41955: Remove unphysical diaSources from the output of detectAndMeasure #287

isullivan · 2023-12-01T17:31:40Z

No description provided.

parejkoj

You need some tests of this code. It won't do what you think, because of local argument scoping in the function.

I think this is ok here as a quick fix, but I still think we should consider removing these sources higher up (as an early measurement plugin, preferably), so that we're not doing any further math on them.

python/lsst/ip/diffim/detectAndMeasure.py

parejkoj · 2023-12-05T22:53:01Z

python/lsst/ip/diffim/detectAndMeasure.py

+        default=("base_PixelFlags_flag_offimage",
+                 "base_PixelFlags_flag_interpolatedCenterAll",
+                 "base_PixelFlags_flag_saturatedCenterAll",
+                 "base_PixelFlags_flag_crCenterAll",


I'm not sure about removing CR flagged sources; is our CR removal code good enough to leave a real source relatively ok? What about if the CR was in one of two snaps? I think we should be conservative on this one.

I removed all of the CenterAll flags for now pending further investigation to unblock this ticket.

python/lsst/ip/diffim/detectAndMeasure.py

parejkoj · 2023-12-05T23:00:34Z

python/lsst/ip/diffim/detectAndMeasure.py

+        nPurged = np.sum(~flags)
+        if nPurged > 0:
+            self.log.warning(f"Found and purged {nPurged} unphysical sources.")
+        diaSources = diaSources[flags].copy(deep=True)


This doesn't do what you think it does: the function argument is locally scoped, so this doesn't modify it outside the function. You have to return it and assign in the caller.

Good catch! That fixes the remaining APDB collisions (without needing Eric and my validity_start change). I will update this, and write a unit test for the renamed purgeSources method.

parejkoj

Suggestion on how to rework the test to catch the bug you'd had before, and some questions about squashing exceptions and logging.

python/lsst/ip/diffim/detectAndMeasure.py

parejkoj · 2023-12-08T23:13:10Z

python/lsst/ip/diffim/detectAndMeasure.py

+            try:
+                flags &= ~diaSources[flag]
+            except Exception as e:
+                self.log.warning("Could not apply source flag: %s", e)


I don't like this try:catch; if you want to not raise on missing flags, you should only catch KeyError. More broadly, I think a missing flag should just raise: if someone configures their pipeline to expect a certain flag, it should be an error if it doesn't appear in the schema, otherwise it's hiding unintended behavior, like a typo in the flags list.

I understand your objection to a try:catch here, and can certainly change this to a KeyError, but I would strongly prefer keeping it. My reasoning is that this should ideally not be finding and removing any bad sources, so I would rather it shout at people with a warning than have it break if the configured flag is missing.

I very strongly disagree. An error that occurs on every single quantum (which is what will happen if the config doesn't match the schema) should be an actual exception raised up the chain, not a warning that will get lost. If it's an error, processing will stop early and we'll know to fix the misconfiguration. If it's a warning, we could process a whole data release without knowing that we meant to filter on something but didn't.

I'm not sure what you mean by "should ideally not be finding and removing any bad sources"? We know there will be "bad sources" because difference imaging will produce totally spurious detections due to interpolation. Unless we remove those sources much earlier (which is hard to do, given the design of the plugin system), we know they'll show up here and have to be removed.

There's actually a special exception type for configuration/consistency errors that can only be caught during after execution (whenever possible we want to catch them in config validation or task construction, of course): lsst.pipe.base.InvalidQuantumError. I don't know the context well enough to say for certain whether it's the right call, but it seems potentially relevant.

After a side discussion with @parejkoj , lsst.pipe.base.InvalidQuantumError in the task __init__ looks like the right solution here.

python/lsst/ip/diffim/detectAndMeasure.py

tests/test_detectAndMeasure.py

parejkoj · 2023-12-08T23:52:49Z

tests/test_detectAndMeasure.py

+        badDiaSrc1 = ~bbox.contains(diaSources.getX(), diaSources.getY())
+        nBad1 = np.count_nonzero(badDiaSrc1)
+        self.assertEqual(nBad1, nSetBad)
+        diaSources2 = detectionTask.removeBadSources(diaSources)


This test wouldn't have caught the bug I described before (local scoping of the catalog), since it's not looking at the output of run. What if instead you set all of difference.mask to BAD, add the badCenterAll flag to the bad flags list, and then just call detectionTask.run and confirm that the source is not included in the output? It's crude, but your other tests should be checking that "normal" sources behave correctly, and we don't really care why a given bad flag is set, just that it is.

I disagree. If this test were implemented as detectionTask.removeBadSources(diaSources) then it would fail because the flagged sources are not removed from diaSources in this scope, as you pointed out. Now that the API is for .removeBadSources to return a source catalog, it seems appropriate to use that here.

If the goal is to test removeBadSources by itself as a unittest, then you don't need any of the other infrastructure here. All you need is to create a trivial SourceCatalog with a couple flags (test_flag_1, etc.), call removeBadSources on that and check that the things you didn't want aren't there. There's no need to create an image or run detection or anything.

If the goal is to test that run behaves correctly, then you need to actually test run itself. This test as written would not catch run not assigning the output of removeBadSources to something that gets returned. If this is the goal of this test you don't need to run detectAndMeasure in a way that doesn't remove anything: you're already testing that with the other tests in this class.

The code comment here was out of date. With 20 sky sources in the test, there are some with unphysical coordinates which are removed in the first half of the test. I will fix the code comment, and I can also add an additional step that shows that some of the resulting diaSources are unphysical if badSourceFlags=[].

Downstream code will break in multiple places if there is a single diaSource with a NaN value from .getCentroid()

parejkoj · 2023-12-21T23:08:20Z

python/lsst/ip/diffim/detectAndMeasure.py

@@ -376,6 +383,45 @@ def processResults(self, science, matchedTemplate, difference, sources, table,

        return measurementResults

+    def removeBadSources(self, diaSources):


Oh, we should probably make this _removeBadSources: we're trying to make the "private" parts of Tasks more explicitly so.

parejkoj · 2023-12-21T23:12:02Z

python/lsst/ip/diffim/detectAndMeasure.py

+        # Use slot_Centroid_x/y here instead of getX() method, since the former
+        #  works on non-contiguous source tables and the latter does not.
+        centroidFlag = np.isfinite(diaSources["slot_Centroid_x"]) & np.isfinite(diaSources["slot_Centroid_y"])
+        nBad = np.count_nonzero(~centroidFlag)
+        if nBad > 0:
+            self.log.info("Found and removed %d unphysical sources with non-finite centroid.", nBad)
+            self.metadata.add("nRemovedBadCentroidSources", nBadTotal)
+            nBadTotal += nBad
+            selector &= centroidFlag


You should be able to replace this entire thing with adding slot_Centroid_flag to the bad flags list. See DM-7102: that flag should be catching all cases where the centroid positions are non-finite.

I want to explicitly catch diaSources["slot_Centroid_x"] (and y) here since that breaks downstream code. As @parejkoj and I discussed, this check should be unnecessary after DM-42313 merges since these nan centroids would then be picked up by the flag base_PixelFlags_flag_offimage.

parejkoj · 2023-12-22T02:11:32Z

Thinking about this more, why can't we just use a ScienceSourceSelector, like Calibrate does, with flags.bad and requireFiniteRaDec?

isullivan · 2023-12-22T22:58:47Z

ScienceSourceSelector could be configured and used here to remove the flagged and unphysical sources, but I do not think it is a good fit here. It includes multiple types of selections (such as on flux, signal to noise, isolation, etc..) that we do not want people to accidentally configure here, and which would not be obvious to the user that they were being used to throw out good sources. I think the log message it prints would also be unhelpful: "Selected 102/104 sources" does not tell the user what is being selected or why.

With DM-42313 merged, these should be caught by `base_PixelFlags_flag_offimage`

parejkoj

Thanks for the cleanups!

parejkoj requested changes Dec 5, 2023

View reviewed changes

isullivan force-pushed the tickets/DM-41955 branch 2 times, most recently from ab0b980 to 31bc42c Compare December 8, 2023 00:02

parejkoj requested changes Dec 8, 2023

View reviewed changes

Remove unphysical diaSources from the output of detectAndMeasure

dc7c7af

isullivan force-pushed the tickets/DM-41955 branch from dfe3531 to fa1262a Compare December 15, 2023 19:54

isullivan added 2 commits December 15, 2023 12:08

Respond to review

2e33967

Remove diaSources with NaN centroids

e60df45

Downstream code will break in multiple places if there is a single diaSource with a NaN value from .getCentroid()

isullivan force-pushed the tickets/DM-41955 branch from fa1262a to e60df45 Compare December 15, 2023 20:08

parejkoj reviewed Dec 21, 2023

View reviewed changes

isullivan added 3 commits December 22, 2023 17:47

Clean up variable and method names

2d89abf

Add consistency check of schema and config in Task init

7ab489c

Remove extra check for NaN centroids.

3c9f4bd

With DM-42313 merged, these should be caught by `base_PixelFlags_flag_offimage`

parejkoj approved these changes Jan 4, 2024

View reviewed changes

isullivan merged commit 8096f4f into main Jan 4, 2024
2 checks passed

isullivan deleted the tickets/DM-41955 branch January 4, 2024 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-41955: Remove unphysical diaSources from the output of detectAndMeasure #287

DM-41955: Remove unphysical diaSources from the output of detectAndMeasure #287

isullivan commented Dec 1, 2023

parejkoj left a comment

parejkoj Dec 5, 2023

isullivan Dec 15, 2023

parejkoj Dec 5, 2023

isullivan Dec 6, 2023

parejkoj left a comment

parejkoj Dec 8, 2023

isullivan Dec 9, 2023

parejkoj Dec 21, 2023

TallJimbo Dec 22, 2023

isullivan Dec 23, 2023 •

edited

Loading

parejkoj Dec 8, 2023

isullivan Dec 9, 2023

parejkoj Dec 21, 2023

isullivan Dec 22, 2023

parejkoj Dec 21, 2023

parejkoj Dec 21, 2023

isullivan Dec 23, 2023

parejkoj commented Dec 22, 2023 •

edited

Loading

isullivan commented Dec 22, 2023

parejkoj left a comment

		@@ -376,6 +383,45 @@ def processResults(self, science, matchedTemplate, difference, sources, table,

		return measurementResults

		def removeBadSources(self, diaSources):

DM-41955: Remove unphysical diaSources from the output of detectAndMeasure #287

DM-41955: Remove unphysical diaSources from the output of detectAndMeasure #287

Conversation

isullivan commented Dec 1, 2023

parejkoj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isullivan Dec 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj commented Dec 22, 2023 • edited Loading

isullivan commented Dec 22, 2023

parejkoj left a comment

Choose a reason for hiding this comment

isullivan Dec 23, 2023 •

edited

Loading

parejkoj commented Dec 22, 2023 •

edited

Loading