Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate distribution #89

Closed
khkk378 opened this issue May 20, 2021 · 1 comment
Closed

Simulate distribution #89

khkk378 opened this issue May 20, 2021 · 1 comment

Comments

@khkk378
Copy link

khkk378 commented May 20, 2021

As it is now you sample random fractions for each cell type. I wonder if it would be more efficient to sample based on the prior distributions in the training data, or at least have this as a setting. Maybe just from a normal distribution around the current proportions. In most cases the training data would be rather similar to the bulk; a large fraction of kidney cells would be tubular, liver would be hepatocytes, heart would be cardiomyocytes and so on. Just a suggestion :)

@KevinMenden
Copy link
Owner

Hi @khkk378 ,

yes that might make sense as an additional option. We intentionally didn't do it because it of course introduces some bias into the training set. If you have only one dataset for data simulation, and this is somewhat weirdly distributed, that could be problematic. And scRNA-seq data is not the best tool for estimating cell type fractions, sometimes cells are also selected.

So would be an interesting thing to try as an option - I believe that the default should still be random fractions.

But if you want to cook up a PR, I would be happy to include that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants