Prevent Curation from re-adding an existing sorting key with a new cu… #670

samuelbray32 · 2023-10-27T17:41:34Z

Description

Calling the spikesorting_pipeline_populator() again with the same parameters after already running resulted in many duplicate entries. This was because insert_curation() called with an existing key will create a new entry with a iterated curation_id. This new entry then runs in all susequent steps of the pipeline, using up compute time and memory.

This PR avoids this by checking for existing entries in Curation for the sort key before calling insert_curation()

Checklist:

This PR should be accompanied by a release: (yes/no/unsure)
(If release) I have updated the CITATION.cff
I have updated the CHANGELOG.md
I have added/edited docs/notebooks to reflect the changes

…ration_id

CBroz1 · 2023-10-27T19:56:23Z

It looks like this check is already present in the insert curation func, which we're calling in the populator with the default parent curation id. Does this new check catch cases the sub-func doesn't?

samuelbray32 · 2023-10-27T21:26:41Z

It looks like this check is already present in the insert curation func, which we're calling in the populator with the default parent curation id. Does this new check catch cases the sub-func doesn't?

Thanks for the catch. I'll start digging again

samuelbray32 · 2023-10-27T22:08:00Z

Next idea:

Populating AutomaticCuration creates a new curation_id. If this function is run again, that curation_id will be incorporated in
curation_keys = [ {**k, "waveform_params_name": waveform_params_name} for k in (Curation() & sort_dict).fetch("KEY") ]

And then get run through the remaining pipeline steps.

Propose restricting this key list to curation_id=0 to avoid this. (My todo).

edeno · 2023-12-14T19:46:26Z

Should this be closed or is this still being worked on?

samuelbray32 · 2023-12-14T22:11:39Z

Should this be closed or is this still being worked on?

The last commit fixes the initial issue. This isn't updated for the v1 spikesorting pipeline though. If that doesn't need to happen in the current PR this is good to go

Prevent Curation from re-adding an existing sorting key with a new cu…

1919427

…ration_id

samuelbray32 requested a review from CBroz1 October 27, 2023 17:41

samuelbray32 marked this pull request as draft October 27, 2023 23:21

Only run populator on initial curation entries

2c86a5b

edeno marked this pull request as ready for review December 14, 2023 23:06

edeno approved these changes Dec 14, 2023

View reviewed changes

edeno merged commit c3d9c6f into master Dec 14, 2023

edeno deleted the fix_spikesorting_populator_redundancies branch December 14, 2023 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent Curation from re-adding an existing sorting key with a new cu… #670

Prevent Curation from re-adding an existing sorting key with a new cu… #670

samuelbray32 commented Oct 27, 2023

CBroz1 commented Oct 27, 2023

samuelbray32 commented Oct 27, 2023

samuelbray32 commented Oct 27, 2023

edeno commented Dec 14, 2023

samuelbray32 commented Dec 14, 2023

Prevent Curation from re-adding an existing sorting key with a new cu… #670

Prevent Curation from re-adding an existing sorting key with a new cu… #670

Conversation

samuelbray32 commented Oct 27, 2023

Description

Checklist:

CBroz1 commented Oct 27, 2023

samuelbray32 commented Oct 27, 2023

samuelbray32 commented Oct 27, 2023

edeno commented Dec 14, 2023

samuelbray32 commented Dec 14, 2023