Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could the detection of oligodendrocytes be possibly improved? #33

Open
LASeeker opened this issue Jun 21, 2023 · 3 comments
Open

Could the detection of oligodendrocytes be possibly improved? #33

LASeeker opened this issue Jun 21, 2023 · 3 comments

Comments

@LASeeker
Copy link

Hi Aleksandr,
I just tested your sc-type method on my dataset (https://pubmed.ncbi.nlm.nih.gov/37217978/) and it works really nicely for most cell types. So thank you for that! I am showing below my first rough annotation (unknown turned out to be immune cells) followed by the annotation using cell-type.

You will see that sc-type performed really very well, however, it did not recognise oligodendrocytes (which happen to be the main focus of our lab). Would it be possible to add to the gene database to improve the detection of oligos? We would be happy to suggest additional marker genes. PLP1 may be a good one for example.
Also, the detection of cerebellar granule cells (RELN +) was not perfect.

Cool tool, thank you!

image image
@IanevskiAleksandr
Copy link
Owner

We are planning to make sctype 2.0 early next year (but it can appear on GitHub much earlier) with addition of many new cell types and analyses options. Thanks for the suggestions. Please send more markers with corresponding references if you have those.

@pedriniedoardo
Copy link

pedriniedoardo commented Sep 8, 2023

Interestingly we had a similar situation in our lab. We noticed that the issue was originating by the fact that not all the marker genes were making the cut of the HVG.
To keep the object slimmer, we do not scale all the features in the object. And since the tool relies on the extraction of the scale.data slot, if the genes are not there, the scoring is affected. In particular, we noticed that when exploring the scale.data slot, not many genes were present from ScTypeDB_full.xlsx.

For the positive markers.

lapply(gs_list$gs_positive,function(x){
  sum(rownames(scobj[["RNA"]]@scale.data) %in% x)
})
$Astrocytes
[1] 8

$`Cholinergic neurons`
[1] 0

$`Dopaminergic neurons`
[1] 1

$`Endothelial cells`
[1] 10

$`GABAergic neurons`
[1] 3

$`Glutamatergic neurons`
[1] 2

$`Immature neurons`
[1] 0

$`Immune system cells`
[1] 0

$`Mature neurons`
[1] 2

$`Microglial cells`
[1] 7

$`Myelinating Schwann cells`
[1] 0

$`Neural Progenitor cells`
[1] 0

$`Neural stem cells`
[1] 0

$Neuroblasts
[1] 0

$`Neuroepithelial cells`
[1] 1

$`Non myelinating Schwann cells`
[1] 0

$`Oligodendrocyte precursor cells`
[1] 4

$Oligodendrocytes
[1] 0

$`Radial glial cells`
[1] 5

$`Schwann precursor cells`
[1] 0

$`Serotonergic neurons`
[1] 0

$Tanycytes
[1] 0

$`Cancer cells`
[1] 1

$`Cancer stem cells`
[1] 0

The quick and dirty solution we used, was to run ScaleData again, specifying the features of interest.

# -------------------------------------------------------------------------
# run an ad hoc scaling to include the genes for the cell type annotation
scobj_test <- scobj %>%
  # I can scale the missing features afterwards now focus on the highly variable one for speed purposes
  ScaleData(vars.to.regress = c("percent.mt.harmony","nCount_RNA.harmony","S.Score","G2M.Score","origin","facility"), verbose = T,features = unique(unlist(gs_list))) %>% 
  identity()

dim(scobj_test@[email protected])

es.max <- sctype_score(scRNAseqData = scobj_test[["RNA"]]@scale.data, scaled = TRUE, 
                        gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)
# -------------------------------------------------------------------------

Eventually, the pool of markers genes for Oligo was better represented.
For positive markers

lapply(gs_list$gs_positive,function(x){
  sum(rownames(scobj_test[["RNA"]]@scale.data) %in% x)
})
$Astrocytes
[1] 15

$`Cholinergic neurons`
[1] 2

$`Dopaminergic neurons`
[1] 8

$`Endothelial cells`
[1] 12

$`GABAergic neurons`
[1] 6

$`Glutamatergic neurons`
[1] 7

$`Immature neurons`
[1] 6

$`Immune system cells`
[1] 9

$`Mature neurons`
[1] 9

$`Microglial cells`
[1] 26

$`Myelinating Schwann cells`
[1] 4

$`Neural Progenitor cells`
[1] 14

$`Neural stem cells`
[1] 4

$Neuroblasts
[1] 6

$`Neuroepithelial cells`
[1] 7

$`Non myelinating Schwann cells`
[1] 4

$`Oligodendrocyte precursor cells`
[1] 6

$Oligodendrocytes
[1] 11

$`Radial glial cells`
[1] 11

$`Schwann precursor cells`
[1] 6

$`Serotonergic neurons`
[1] 4

$Tanycytes
[1] 1

$`Cancer cells`
[1] 3

$`Cancer stem cells`
[1] 6

@LASeeker
Copy link
Author

LASeeker commented Sep 8, 2023

Hi, Amazing to hear @IanevskiAleksandr that you are working on further improving sctype! I don't think in my case scaling the data would help because all genes were already represented in the scaled data slot. I also noticed that when I am running sctype on a randomly subsetted dataset (same number of nuclei per manually annotated cell type), it usually performs better and detects oligodendrocytes. So, I think it is not an oligodendrocyte problem per se but something else. Could it have to do with them being the most abundant celltype in the complete dataset? Interesting @pedriniedoardo that you saw have seen something similar. It would be great to hear from the community, if this happens with other cell types, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants