Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merging error #91

Merged
merged 24 commits into from
Apr 18, 2024
Merged

Fix merging error #91

merged 24 commits into from
Apr 18, 2024

Conversation

danilexn
Copy link
Member

@danilexn danilexn commented Jan 18, 2024

This PR fixes #69.

There were several bugs contributing to the MissingInputException when trying to run sample merging:

  • in assert_valid from project_df.py, SpacemakeError was thrown when R1 and R2 was empty, even for merged samples (where these will be empty by default)
  • creation of puck collections was not properly handling merging - count of barcodes and filtering of the project_df was not working
  • as well, creation of puck collections did not support creation from non-meshed individual pucks/tiles, which is now supported

Additionally, these features/fixes were implemented:

  • new spatial test data <1MB, for quick unit testing (instead of the larger 150MB subset of a real dataset). There are three spatial tiles, two of them are covered in the read files, and the third is just to test the automatic puck filtering. Also, the genome fa and annotation are provided (taken from the drop-dropseq branch, it is the chr22 data). The puck coordinate system is provided as well.
  • calculation of puck width and height is now performed considering max-min, instead of max
  • now, an warning is printed for samples where the adata file is empty (i.e., adata.X.sum() == 0).

I tested this branch with the tiny test data, with and without meshing, with and without puck collection - using all defaults. Also, with in-house (Open-ST) mouse data.

@danilexn danilexn requested a review from nukappa January 18, 2024 15:55
@danilexn
Copy link
Member Author

There's a very ugly chunk of code (https://github.com/danilexn/spacemake/blob/0a10d9c15c8b83ee47c983c31817822c17a5caad/spacemake/snakemake/main.smk#L590-L627), which I had to add such that the puck collection can be generated for meshed and non-meshed data. I'm sure there are more elegant alternatives, but I think this would require changing some of the variables/wildcards.

Please feel free to commit some code if you find an alternative, thanks! 😄

@nukappa nukappa self-assigned this Jan 18, 2024
@nukappa
Copy link
Member

nukappa commented Jan 22, 2024

Tested locally with real openst data, works like a charm, also for meshed. (Just pump the version if you haven't :) )

@danilexn
Copy link
Member Author

danilexn commented Jan 24, 2024

@nukappa I have noticed that when using two run modes (e.g., one that meshes and other that doesn't) within the same sample, the QC reports fail. I am debugging this (the fix will also apply to #95), then I will let you know. This only happens for the puck_collection files, the others are fine.

I would ask you to run a test for such use-case when I commit the fix, thanks in advance! After this, we can merge this PR.

@marvin-jens
Copy link
Member

Dear Daniel, the code looks very nice to me. Thanks for fixing these issues. Regarding the test-data, I am somewhat hesitant to pull 10MB of genome and annotation, when we already have small test genomes and annotation. Can we perhaps re-use that? Also, I would highly appreciate if we could have a unit test in sth like tests/test_spatial_merge.py or wherever you prefer. Pls have a look at the existing tests in tests/test_cmdline.py and let me know if I can help with that.

@nukappa
Copy link
Member

nukappa commented Mar 7, 2024

Re-tested this with a fresh env/install and the QC sheets didn't compile. It's not that the jobs failed, but there were no rules to begin with. Will investigate and report back.

@nukappa nukappa added the bug Something isn't working label Apr 16, 2024
@nukappa
Copy link
Member

nukappa commented Apr 16, 2024

Update: issue might be due to different snakemake versions. @danilexn will modify and test. We already pin snakemake in the env.

@danilexn
Copy link
Member Author

danilexn commented Apr 17, 2024

Dear Daniel, the code looks very nice to me. Thanks for fixing these issues. Regarding the test-data, I am somewhat hesitant to pull 10MB of genome and annotation, when we already have small test genomes and annotation. Can we perhaps re-use that? Also, I would highly appreciate if we could have a unit test in sth like tests/test_spatial_merge.py or wherever you prefer. Pls have a look at the existing tests in tests/test_cmdline.py and let me know if I can help with that.

Hi Marvin! Good point, I am updating to use the small genome+annotation, and writing some additional tests. I will keep the tests for the fast-cmdline branch, because this is just a bugfix against the current master.

@danilexn
Copy link
Member Author

I've updated the test data (so it uses the tiny genome now), it runs fine (except for some sporadic issues in QC sheets, because there's only 1 spatial cell and there are problems with the x/y limits -> maybe open a separate issue?).

Also, works Open-ST mouse data (one tile, multiple tiles, multiple tiles + non-matching tiles, without tiles to simulate sc data).

I added some additional validations for the project_df, so it is more verbose when the merging columns are not properly formatted (before, it was giving a MissingInputException, which was difficult to relate to badly formatted project_df)

@nukappa
Copy link
Member

nukappa commented Apr 17, 2024

Testing locally with a small set of 10mln + 10mln reads in a sample and sample_reseq fashion. Testing for:

  • Processing the samples independently, then merging them, and processing the merged sample.
  • Adding the merged sample from the beginning and processing everything together.

@danilexn
Copy link
Member Author

Nikos and I re-tested this, and the comments on test data were addressed. Unit testing will be added into the fast-cmdline branch, so I merge this PR that implements the bugfix, as merging does not work as advertised in current spacemake.

@danilexn danilexn merged commit 53fccd3 into rajewsky-lab:master Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MissingInputException when merging with 0.7.2
3 participants