Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore use of loom files for output #122

Closed
allyhawkins opened this issue Jul 29, 2021 · 8 comments
Closed

Explore use of loom files for output #122

allyhawkins opened this issue Jul 29, 2021 · 8 comments
Labels
future thoughts This issue to be closed has some good ideas that we may want to revisit

Comments

@allyhawkins
Copy link
Member

There has been some discussion about the potential of using loom files instead of an RDS file with a single cell experiment object. See #107.

We should do some testing of generating a loom file to see if this is something we would want to use or continue to use an RDS file.

Including some more links for information on loom files here:
http://loompy.org/
https://bioconductor.org/packages/release/bioc/vignettes/LoomExperiment/inst/doc/LoomExperiment.html

@jashapiro
Copy link
Member

jashapiro commented Aug 31, 2021

I just did a quick test with exporting and importing loom data.

Unfortunately, there were a number of downsides.

  • It was extremely slow to write ( >10 minutes on my machine, vs. <20 seconds for writing the compressed rds file)
  • The file was bigger: 150MB vs. 62MB (but it compressed a bit smaller, probably because...)
  • The altExp was not included. We would need to create a separate file for any CITE-seq data. This is not the worst thing in the world, but it is annoying. The steps to reconstitute the original data structure would be a bit more involved.

None of these are insurmountable (aside from export time, which is pretty bad!), but they do make me less excited to use the format.

Another option is the h5ad format that seems a bit more native to scanpy, but I fear it will have many of the same issues. From the docs for a package that writes this format (https://theislab.github.io/zellkonverter/articles/zellkonverter.html) it is unclear if altExperiments are part of the export or not.

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Aug 31, 2021

Okay what if we have an FAQ that's like How do I use the RDS files with Python? that includes several lines of code that may include using R to split out CITE-seq and saving loom objects. To me, this seems like a good "soft launch" goal where we might consider offering multiple formats in the future.

@allyhawkins
Copy link
Member Author

10 minutes!! That doesn't sound like an ideal situation... Based on this, I also would say that maybe loom files aren't as appealing.

It looks like based on the link you sent, that we would be able to just write out the sce to a h5ad without changing anything about the contents of the sce, including an altExperiment. I did a quick check with writing one of my sces out to an h5ad file and it also took ~ 10 minutes for one sce rather than a matter of seconds.

I think if the only concern about rds files is compatibility with python then I agree with Jackie about providing some code on how to do it yourself (and then providing multiple formats in the future). Although, I would argue that the h5ad format maybe more straight forward than loom since it looks like you wouldn't have to separate out CITE-seq based on my brief interaction with it.

@jashapiro
Copy link
Member

@allyhawkins Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.

I also ran into trouble getting the h5ad support with zellkonverter to work: it uses conda in an... interesting... way and I suspect there was some incompatibility with my base conda install. Didn't want to spend time debugging the install. The installation happens on the first call to write an h5ad file, which might explain part of the time it took? If you repeat the export, is it faster?

But mostly, I am totally fine with RDS files + explanation of conversion in the docs. I'm actually not sure how seurat handles AltExp, so we may want docs on that front as well.

@jaclyn-taroni
Copy link
Member

Filed a docs ticket: AlexsLemonade/scpca-docs#13

@allyhawkins
Copy link
Member Author

Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.

When I read it back in the contents were relatively similar, but the assay name was changed from counts to X. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames when I load it back in and it looks like a typical sce.

I also had trouble at first with the conda issues, but then just updated conda and it seemed to solve the issues? But regardless, I would agree that it seems .rds right now is the way to go.

@jashapiro
Copy link
Member

When I read it back in the contents were relatively similar, but the assay name was changed from counts to X. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames

Any SCE should have that, so I don't think that really answers the question, unfortunately. We'll need to test more directly.

@jaclyn-taroni
Copy link
Member

Discussed in Slack but to reiterate: Exploring the use of loom files for output has been accomplished. We want to stick with RDS for now and add an FAQ to help python users (AlexsLemonade/scpca-docs#13). We have some questions about Seurat to explore that are now tracked separately (#129).

So I'm going to close this and label it with future thoughts to reflect that there are some ideas about testing h5ad in here that could be handy later.

@jaclyn-taroni jaclyn-taroni added the future thoughts This issue to be closed has some good ideas that we may want to revisit label Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future thoughts This issue to be closed has some good ideas that we may want to revisit
Projects
None yet
Development

No branches or pull requests

3 participants