-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathinstructionsQC.Rmd
104 lines (72 loc) · 4.21 KB
/
instructionsQC.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "QUALITY CONTROL AND PREPROCESSING, SINGLE CELL EXPERIMENTS"
output:
pdf_document: default
---
## Introduction
The script 'QC_plus_doubletsdet.R' located in this repository is valid for 10X-format raw counts.
Single Cell RNA sequencing (scRNAseq) projects in our setting start with cell isolation from skeletal muscle and sorting, followed by Chromium transcriptome 3' protocol (10X Genomics) which includes barcoded gel beads library preparation, unique molecular identifier (UMI) linking, sequencing, alignment and Cell Ranger filtering. The 10X protocol is performed by a third party laboratory, then resulting raw count matrix and metadata are received for our downstream analysis. Even in this scenario, rigorous pre-processing is mandatory.
The present document is only for practical instructions regarding the QC script. Concepts and details about scRNAseq and quality control are found in INMG_SingleCell/scRNAseqMuscleNiche.pdf.
## Getting started
*IMPORTANT* Use **one single** batch data when running this script, as doublet detection procedures should only be applied to libraries generated in the same experimental batch.
The input is the experiment in the form of a folder (here for our illustration 'dorsowt2'), containing the raw count matrix and its respective metadata, organized as follows:
\begin{itemize}
\item data/
\begin{itemize}
\item dorsowt2/
\begin{itemize}
\item barcodes.tsv.gz
\item features.tsv.gz
\item matrix.mtx.gz
\end{itemize}
\end{itemize}
\end{itemize}
Within the R code, change 'exper' variable accordingly, check also working directory ('prloc' variable). I recommend to stick to default location (HOME) for 'QC_single_cell'. Launch from RStudio, and if any difficulties are encountered check 'results/outputsfile.txt' to see at which level error occurs.
An executable version will be available to be able to run into a bash loop, taking as argument the experiment's raw matrix folder name.
'QC_plus_doubletsdet.R' must be used if features are present as gene symbols ("Gsn"). One version for features in the form of Ensembl identifiers ("ENSMUSG00000026879") will be soon available.
## Brief Illustration
```{r stuff0, include=FALSE, results='hide'}
listpackages <- c( "ggplot2", "dplyr", "stringr", "tidyverse",
"BSgenome", "GenomeInfoDb", "Seurat",
"lubridate", # right color
"simpleSingleCell", # for scRNA storage & manipulation
"scater", # for QC control
"scran", # analysis pipeline
"uwot", # UMAP dim-red
"DropletUtils", #utility functions for handling single-cell (RNA-seq)
"AnnotationHub", # ensbl query
"AnnotationDbi", # ensbl query
"sctransform", "SingleCellExperiment", "Matrix" )
lapply(listpackages,require,character.only=TRUE, quietly=T)
prloc = "~/QC_single_cell"
setwd(prloc)
exper="dorsowt2" # TODO set accordingly
resdir="results/"
```
Here we load an '_END.RData' already generated after running 'QC_plus_doubletsdet.R' on publicly available 10X data in mouse model from GEO (accession code GSM3614993).
Lets see SingleCellExperiment object and dimensions:
```{r stuff2}
load(paste0("rdatas/",exper,"_END.RData"))
sce
```
We expect only M musculus but we found out gene symbols from H sapiens (this is negligeable anyway, and metrics did not show relevant contamination):
```{r stuff3}
table(rowData(sce)$species)
tail(rowData(sce)[rowData(sce)$species %in% "Homo sapiens",])
```
## Doublets detection
```{r stuff4}
tsnepl <- plotTSNE(sce[rowData(sce)$expressed,sce$is_cell], colour_by="doublet_score")
detfeat <- scater::plotColData(sce, x="sum",y="detected",colour_by="doublet_score")
tsnepl + detfeat
```
#### output
All figures in .pdf format are saved in 'results/' whereas 'rdatas/_END.RData' file corresponds to filtered sce object.
## Acknowledgements
**Many thanks to Dr. L Modolo for most of this code**
## sources
I strongly encourage the user/developer to read:
<http://perso.ens-lyon.fr/laurent.modolo/scRNA/>
<https://bioconductor.org/packages/release/bioc/vignettes/scater/inst/doc/overview.html>
#### author
Johanna Galvis-Lascroux, 2020