Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Curation from re-adding an existing sorting key with a new cu… #670

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

samuelbray32
Copy link
Collaborator

Description

Calling the spikesorting_pipeline_populator() again with the same parameters after already running resulted in many duplicate entries. This was because insert_curation() called with an existing key will create a new entry with a iterated curation_id. This new entry then runs in all susequent steps of the pipeline, using up compute time and memory.

This PR avoids this by checking for existing entries in Curation for the sort key before calling insert_curation()

Checklist:

  • This PR should be accompanied by a release: (yes/no/unsure)
  • (If release) I have updated the CITATION.cff
  • I have updated the CHANGELOG.md
  • I have added/edited docs/notebooks to reflect the changes

@samuelbray32 samuelbray32 requested a review from CBroz1 October 27, 2023 17:41
@CBroz1
Copy link
Member

CBroz1 commented Oct 27, 2023

It looks like this check is already present in the insert curation func, which we're calling in the populator with the default parent curation id. Does this new check catch cases the sub-func doesn't?

@samuelbray32
Copy link
Collaborator Author

It looks like this check is already present in the insert curation func, which we're calling in the populator with the default parent curation id. Does this new check catch cases the sub-func doesn't?

Thanks for the catch. I'll start digging again

@samuelbray32
Copy link
Collaborator Author

Next idea:

Populating AutomaticCuration creates a new curation_id. If this function is run again, that curation_id will be incorporated in
curation_keys = [ {**k, "waveform_params_name": waveform_params_name} for k in (Curation() & sort_dict).fetch("KEY") ]

And then get run through the remaining pipeline steps.

Propose restricting this key list to curation_id=0 to avoid this. (My todo).

@samuelbray32 samuelbray32 marked this pull request as draft October 27, 2023 23:21
@edeno
Copy link
Collaborator

edeno commented Dec 14, 2023

Should this be closed or is this still being worked on?

@samuelbray32
Copy link
Collaborator Author

Should this be closed or is this still being worked on?

The last commit fixes the initial issue. This isn't updated for the v1 spikesorting pipeline though. If that doesn't need to happen in the current PR this is good to go

@edeno edeno marked this pull request as ready for review December 14, 2023 23:06
@edeno edeno merged commit c3d9c6f into master Dec 14, 2023
@edeno edeno deleted the fix_spikesorting_populator_redundancies branch December 14, 2023 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants