Simulate distribution #89

khkk378 · 2021-05-20T19:38:19Z

As it is now you sample random fractions for each cell type. I wonder if it would be more efficient to sample based on the prior distributions in the training data, or at least have this as a setting. Maybe just from a normal distribution around the current proportions. In most cases the training data would be rather similar to the bulk; a large fraction of kidney cells would be tubular, liver would be hepatocytes, heart would be cardiomyocytes and so on. Just a suggestion :)

KevinMenden · 2021-05-21T06:12:34Z

Hi @khkk378 ,

yes that might make sense as an additional option. We intentionally didn't do it because it of course introduces some bias into the training set. If you have only one dataset for data simulation, and this is somewhat weirdly distributed, that could be problematic. And scRNA-seq data is not the best tool for estimating cell type fractions, sometimes cells are also selected.

So would be an interesting thing to try as an option - I believe that the default should still be random fractions.

But if you want to cook up a PR, I would be happy to include that :)

KevinMenden closed this as completed Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulate distribution #89

Simulate distribution #89

khkk378 commented May 20, 2021

KevinMenden commented May 21, 2021

Simulate distribution #89

Simulate distribution #89

Comments

khkk378 commented May 20, 2021

KevinMenden commented May 21, 2021