-
Notifications
You must be signed in to change notification settings - Fork 5
1. Phenotypic diversity analysis
Author: Ruben Props
Make sure to have the devtools
package loaded.
install_github("rprops/Phenoflow_package")
library("Phenoflow") # for fingerprinting
library("flowViz") # for plotting
library("ggplot2") # for plotting
library("flowAI") # for denoising
Set a fixed seed to ensure reproducible analysis
set.seed(777)
Load the test data or insert the path to your own data folder, for example test_data for the tutorial data.
data(flowData)
#path = "test_data"
#flowData <- read.flowSet(path = path, pattern=".fcs")
Lets first take a quick look at the dimensions of our flowData using the attributes
function. Here we can see that we have 12 parameters (+ Width/Time) which we can use in our analyses.
attributes(flowData)
$frames
<environment: 0x000000003521c8d8>
$phenoData
An object of class 'AnnotatedDataFrame'
rowNames: CYCLUS1_BEKKEN_-1_A_SYBR_PRECYCLE_rep1.fcs CYCLUS1_BEKKEN_-3_A_SYBR_PRECYCLE_rep1.fcs ...
CYCLUS1_BEKKEN_9_A_SYBR_OPERATION_rep1.fcs (41 total)
varLabels: name
varMetadata: labelDescription
$colnames
[1] "FSC-A" "SSC-A" "FL1-A" "FL2-A" "FL3-A" "FL4-A" "FSC-H" "SSC-H" "FL1-H" "FL2-H" "FL3-H" "FL4-H" "Width" "Time"
$class
[1] "flowSet"
attr(,"package")
[1] "flowCore"
Next, we select the phenotypic features of interest and transform their intensity values according to the hyperbolic arcsin. In this case we chose two fluorescent parameters and two scatter parameters in their height format (-H). Depending on the FCM, the resolution may increase by using the area values (-A) since many detectors have a higher signal resolution for area values. For transparency we store the transformed data in a new object, called flowData_transformed
. We also define the parameters of interest (e.g. c("FL1-H", "FL3-H","SSC-H","FSC-H")
)
# Select phenotypic features of interest and transform parameters
flowData_transformed <- transform(flowData,`FL1-H`=asinh(`FL1-H`),
`SSC-H`=asinh(`SSC-H`),
`FL3-H`=asinh(`FL3-H`),
`FSC-H`=asinh(`FSC-H`))
param=c("FL1-H", "FL3-H","SSC-H","FSC-H")
Now that the data has been formatted, we need to discard all the signals detected by the FCM which correspond to instrument noise and (in)organic background. This is done by selecting the cells in a scatterplot on the primary fluorescence or scatter signals. For SYBR Green I, this is done based on the FL1-H
and FL3-H
parameters. For this example, an initial polygon gate (polyGate1
) is created and adjusted based on the sample type in question. For each specific experiment, it is advised to use identical gating for each sample. The gating strategy is evaluated on the xyplot
and adjusted if necessary. A more detailed guideline for gating or denoising aquatic microbial samples can be found here (p.6):
### Create a PolygonGate for denoising the dataset
### Define coordinates for gate in sqrcut1 in format: c(x,x,x,x,y,y,y,y)
sqrcut1 <- matrix(c(8.75,8.75,14,14,3,7.5,14,3),ncol=2, nrow=4)
colnames(sqrcut1) <- c("FL1-H","FL3-H")
polyGate1 <- polygonGate(.gate=sqrcut1, filterId = "Total Cells")
### Gating quality check
xyplot(`FL3-H` ~ `FL1-H`, data=flowData_transformed[1], filter=polyGate1,
scales=list(y=list(limits=c(0,14)),
x=list(limits=c(6,16))),
axis = axis.default, nbin=125,
par.strip.text=list(col="white", font=2, cex=2), smooth=FALSE)
Here is an example of a good and bad filtering approach:
When the optimal gate has been chosen, the data can be denoised using the Subset
function. Next we will gather some metadata from the sample names.
### Isolate only the cellular information based on the polyGate1
flowData_transformed <- Subset(flowData_transformed, polyGate1)
### Extract metadata from sample names
metadata <- data.frame(do.call(rbind, lapply(strsplit(flowCore::sampleNames(flowData),"_"), rbind)))
colnames(metadata) <- c("Cycle_nr", "Location", "day", "timepoint", "Staining", "Reactor_phase", "replicate")
Next, the phenotypic intensity values of each cell are normalized to the [0,1] range. This is required for using a bandwidth of 0.01 in the fingerprint calculation. Each parameter is normalized based on the maximum FL1-H intensity value over the data set.
summary <- fsApply(x = flowData_transformed, FUN = function(x) apply(x, 2, max), use.exprs = TRUE)
maxval <- max(summary[,"FL1-H"]) #Replace with the column representing the green fluorescence channel (e.g. "FITC-H")
mytrans <- function(x) x/maxval
flowData_transformed <- transform(flowData_transformed,`FL1-H`=mytrans(`FL1-H`),
`FL3-H`=mytrans(`FL3-H`),
`SSC-H`=mytrans(`SSC-H`),
`FSC-H`=mytrans(`FSC-H`))
We have seen that depending on the max value of the primary fluorescence parameter (FL1-H in this case), large differences in observed alpha/beta diversity may occur. This is most likely related to the changing resolution of the grid used in the kernel density estimation. If you observe these variations, please try increasing the grid resolution of the fingerprints from `128` to `256`. We are actively investigating the sensitivity of the method in order to make it more robust to these fluctuations. Feel free to make an issue on this github or to contact the developers should you have any questions or concerns.
The denoised data can now be used for calculating the phenotypic fingerprint using the flowBasis
function. Changing nbin
increases the grid resolution of the density estimation but also steeply increases the computation time. The most determining factor for getting an accurate kernel density estimation is the number of cells counted in polyGate1
. In general, 1,000 cells will give a good estimation of the mean alpha-diversity (D2) but in order to reduce the variance on this estimate I would recommend a cell count of 10,000 cells or more. You can find more details here.
If desired, you can randomly resample your samples to the lowest sample size or any user specified sample size using the FCS_resample()
function (standard without replacement, but can adjust by changing replace
). Samples with a sample size that is equal to 0 or that is lower than the specified size will be discarded.
### Randomly resample to the lowest sample size
#flowData_transformed <- FCS_resample(flowData_transformed, replace=TRUE)
### Calculate fingerprint with bw = 0.01
fbasis <- flowBasis(flowData_transformed, param, nbin=128,
bw=0.01,normalize=function(x) x)
Your samples were randomly subsampled to 9143 cells
In this case we will continue with the samples without randomly subsampling. From the phenotypic fingerprint, alpha diversity metrics can be calculated. Bootstrapping has been implemented for the Diversity function, so errors are calculated for each sample separately. R
is the number of bootstraps. d
is a rounding factor which is used to remove spurious density values from the dataset. Different rounding factors usually only scale the diversity estimates by a fixed factor and do not affect temporal trends or comparative analysis.
### Calculate Diversity from normalized fingerprint
Diversity.fbasis <- Diversity(fbasis,d=3,plot=FALSE, R=999)
Alpha diversity metrics (D1,D2) have been computed after 999 bootstraps
Add the argument plot=TRUE
in case a quick plot of the D2 diversity values with their errors is desired. A bit more tidied up figure can be easily achieved as such:
p1 <- ggplot(data = Diversity.fbasis, aes(x = as.numeric(as.character(metadata$day)), y = D2, color = metadata$Reactor_phase))+
geom_point(size = 8, alpha = 0.7)+
geom_line()+
scale_color_manual(values = c("#a65628", "red",
"#ffae19", "#4daf4a", "#1919ff", "darkorchid3", "magenta"))+
theme_bw()+
labs(color = "Reactor phase", y = "Phenotypic diversity (D2)", x = "Days")+
geom_errorbar(aes(ymin=D2-sd.D2, ymax=D2+sd.D2), width=0.05, color="black")
print(p1)
Typically for FCM data, the bootstrap replication results in an approximate normal distribution:
This function allows more accurate diversity and error estimation by bootstrapping from the initial FCS files, as well as denoising through flowAI: automatic and interactive anomaly discerning tools for flow cytometry data. It can also easily be parallelized by setting parallel = TRUE
and defining the number of cores with ncores
. On Windows, you may have to give administrator permission the first time you try to parallelize (and have library foreach
loaded). The input is a non-subsetted flowSet object (i.e. no parameters were removed - flowAI crashes otherwise) in case automated denoising is desired (cleanFCS = TRUE
). Otherwise a normal subsetted flowSet object is enough. R
is the number of bootstraps that should be taken from the initial FCS file, R.b
is the number of bootstraps that should be taken from the fingerprint object.
# Diversity assessment with cleaning
Diversity.clean <- Diversity_rf(flowData_transformed, param = param, R = 3, R.b = 3,
cleanFCS = TRUE)
Similarly as before, we can plot these diversity calculations as well. Note that we only took 3 bootstraps here. It is recommended to increase R
and R.b
to 100.
p2 <- ggplot(data = Diversity.clean, aes(x = as.numeric(as.character(metadata$day)), y = D2, color = metadata$Reactor_phase))+
geom_point(size = 8, alpha = 0.7)+
geom_line()+
scale_color_manual(values = c("#a65628", "red",
"#ffae19", "#4daf4a", "#1919ff", "darkorchid3", "magenta"))+
theme_bw()+
labs(color = "Reactor phase", y = "Phenotypic diversity (D2)", x = "Days")+
geom_errorbar(aes(ymin=D2-sd.D2, ymax=D2+sd.D2), width=0.05, color="black")
print(p2)
Alpha diversity analysis has completed: time to export all the data to your working directory. If you are not sure where this is, type getwd()
.
### Export diversity estimates to .csv file in the chosen directory
write.csv2(file="results.metrics.csv", Diversity.clean)
Optionally, you can also perform a beta diversity analysis using Non-metric Multidimensional Scaling (NMDS) or PCoA from the vegan
package. The beta_div_fcm
function does the calculation (accepts all dissimilarity metrics of the vegdist
function under the dist
argument) and the plot_beta_fcm
plots the ordination. Additional factorial arguments to be given to the plot function are: color
and shape
which can be used for visualization (color
and shape
have to be factor vectors with length equal to number of samples). The title of the legends for these factors can be specified in the labels
argument. In case no legend has to be displayed, legend.pres
can be put to FALSE
.
Attention: this dimension reduction utilizes the density values as opposed to the count values which are used for beta-diversity analysis in community 16S data sets
# Beta-diversity assessment of fingerprint
beta.div <- beta_div_fcm(fbasis, ord.type="PCoA")
# Plot ordination
plot_beta_fcm(beta.div, color = metadata$Reactor_phase, labels="Reactor phase") +
theme_bw() +
geom_point(size = 8, alpha = 0.5)
It is often also useful to know the exact cell densities of your sample. This is performed by the following code. Additionally it quantifies the amount of High Nucleic Acid (HNA) and Low Nucleic Acid (LNA) bacteria as defined by Prest et al. (2013).
Important: for defining the filter for the HNA bacterial cells (rGate_HNA
) we need the value maxval
as we've normalized the data for previous diversity calculations. If you've skipped the diversity calculations and are only interested in the cell counts delete the maxval
parameter in the below rectangleGate
function. But make sure the data is at least asinh
transformed. Otherwise the scatterplot and subsequent calculations will be weird and wrong.
Warning: the HNA/LNA partition is only valid for data gathered on a BD Accuri C6 flow cytometer. For other flow cytometers the HNA/LNA cutoff (FL1-H = 20 000) should be adjusted according to the appropriate reference samples.
### Creating a rectangle gate for counting HNA and LNA cells
rGate_HNA <- rectangleGate("FL1-H"=c(asinh(20000), 20)/maxval,"FL3-H"=c(0,20)/maxval,
filterId = "HNA bacteria")
### Normalize total cell gate
sqrcut1 <- matrix(c(8.75,8.75,14,14,3,7.5,14,3)/maxval,ncol=2, nrow=4)
colnames(sqrcut1) <- c("FL1-H","FL3-H")
polyGate1 <- polygonGate(.gate=sqrcut1, filterId = "Total Cells")
### Check if rectangle gate is correct, if not, adjust rGate_HNA
xyplot(`FL3-H` ~ `FL1-H`, data=flowData_transformed[1], filter=rGate_HNA,
scales=list(y=list(limits=c(0,1)),
x=list(limits=c(0.4,1))),
axis = axis.default, nbin=125, par.strip.text=list(col="white", font=2,
cex=2), smooth=FALSE)
### Extract the cell counts
a <- flowCore::filter(flowData_transformed, rGate_HNA)
HNACount <- summary(a);HNACount <- toTable(HNACount)
s <- flowCore::filter(flowData_transformed, polyGate1)
TotalCount <- summary(s);TotalCount <- toTable(TotalCount)
### Extract the volume
vol <- as.numeric(flowCore::fsApply(flowData_transformed, FUN = function(x) x@description$`$VOL`))/1000
### Store the data
results_counts <- data.frame(Samples=flowCore::sampleNames(flowData_transformed),
Total.cells = TotalCount$true/vol, HNA.cells = HNACount$true/vol)
Finally, you can export the count data if you so desire.
### Exporting cell counts to .csv file to working directory
write.csv2(file="results.counts.csv", results_counts)
### Plot cell density
ggplot(data = results_counts, aes(x = as.numeric(as.character(metadata$day)), y = Total.cells, color = metadata$Reactor_phase))+
geom_point(size = 8, alpha = 0.9)+
scale_color_manual(values = c("#a65628", "red",
"#ffae19", "#4daf4a", "#1919ff", "darkorchid3", "magenta"))+
theme_bw()+
labs(color = "Reactor phase", y = "Total cell density (cells/µL)", x = "Days")
===============