Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to include recommended processing for 10x scRNA 5' V2 #118

Open
jeremymsimon opened this issue Aug 2, 2023 · 3 comments

Comments

@jeremymsimon
Copy link

Hey @rob-p and @Gaura -
I've started working with 10x 5' V2 data, and wanted to utilize alevin-fry for processing of the raw data. I initially found some discussions from @k3yavi on the alevin repo, describing how the only change needed to handle this different data type was to switch alevin's -l ISR to -l ISF (e.g. here). However, I ended up getting far fewer cells detected for each sample, and the results didn't seem to make sense nearly as much as the same data processed via cellranger.

Digging deeper, I discovered a very fruitful and helpful discussion here, where effectively the conclusion was to run alevin with -l ISR as normal, but switch alevin-fry generate-permit-list's expected orientation from -d fw to -d rc. This made a huge difference both in the example data analyzed by @allyhawkins here and in my own data; for me I detected almost 60% more cells that passed QC filters, and a totally different (and much more sensible) set of clusters and markers characterizing them after making this change.

For others interested in processing data of this type, and assuming it isn't somewhere already that I've missed, it might be helpful to elevate the cellranger vs alevin-fry comparison doc linked above to a polished vignette here and/or mention this more clearly on the main alevin-fry docs or within the generate-permit-list page. I'm curious as well regarding how this approach would/could get handled within the simpleaf and nf-core/scrnaseq frameworks.

Thanks as always!

@crazyhottommy
Copy link

crazyhottommy commented Sep 11, 2023

hey @jeremymsimon @rob-p I was processing some 10x5' V2 data last week and the number of reads per cell is much fewer than the cellranger output. I then found this issue and indeed changing to -d rc made a difference.

Thanks!
Tommy

@rob-p
Copy link
Contributor

rob-p commented Sep 11, 2023

Thanks, @crazyhottommy! @DongzeHE : We should figure out the best way to add this information to the documentation — something like a table of protocols with notes or some such.

@wmacnair
Copy link

Just supporting previous comments that (1) changing to -d rc made a huge difference to my 5prime data 🥳, and (2) it would be great for users to have this clearer in the documentation.

From my user point of view, I actually think the highest priority for the alevin ecosystem should be a unified documentation page. Maybe something a bit like the scanpy docs, i.e. tutorials and API in one obvious and natural place (I think scvi also has nice docs).

A follow-up question on usage of -d/--expected-ori - are there ever circumstances where you would recommend using both? Assuming you know what the chemistry is. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants