diff --git a/abundance-measurement.Rmd b/abundance-measurement.Rmd index 281450e..d94ca74 100644 --- a/abundance-measurement.Rmd +++ b/abundance-measurement.Rmd @@ -142,7 +142,7 @@ Total-abundance measurements recently used for this purpose include counting cel Importantly, these methods of measuring total abundance are themselves subject to taxonomic bias. Flow cytometry may, for example, yield lower cell counts for species whose cells tend to clump together or are prone to lysis during steps involved in sample collection, storage, and preparation. -Marker-gene concentrations measured by qPCR are affected by variation among species in extraction efficiency, marker-gene copy number, and PCR binding and amplification efficiency. +Marker-gene concentrations measured by qPCR are affected by variation among species in extraction efficiency, marker-gene copy number, and PCR binding and amplification efficiency (@lloyd2013meta). We can easily understand the impact of taxonomic bias on total-abundance measurement under simplifying assumptions analogous to those in our MGS model. Suppose that each species $i$ has an _absolute efficiency_ $B_{i}^{\mtot}$ for the total-abundance measurement that is constant across samples. Further, let $\bar B^{\mtot (a)}$ be the abundance-weighted average of these efficiencies in sample $a$---that is, the mean efficiency of the total-abundance measurement. @@ -185,12 +185,12 @@ The effect on $\tilde A_i^{(a)}$ of taxonomic bias in the MGS measurement can be \end{align} The FE in $\tilde A_i^{(a)}$ consists of two terms: the relative efficiency of species $i$ to species $r$ in the MGS measurement (${B_i}/{B_r}$) and the FE in the reference species' abundance (${\tilde A_r^{(a)}}/{A_r^{(a)}}$). -A common application of this approach involves adding a 'spike-in' (as described above) in a known (and typically constant) abundance across samples (@stammler2016adju, @ji2019quan, @harrison2021theq, REFs). +A common application of this approach involves adding a 'spike-in' (as described above) in a known (and typically constant) abundance across samples (@stammler2016adju, @ji2019quan, @tkacz2018abso, @harrison2021theq, @rao2021mult). In this case, the reference abundance $\tilde A_r^{(a)}$ is determined from the concentration of the spike-in stock multiplied by the ratio of the spike-in to sample volumes. -(In practice, researchers in spike-in experiments have typically used a more indirect calculation to determine species abundances, but which yields identical results to \@ref(eq:density-ratio-meas); Appendix REF). +(In practice, researchers in spike-in experiments have typically used a more indirect calculation to determine species abundances, but which yields identical results to \@ref(eq:density-ratio-meas); Appendix TODO). -Others have instead sought to determine naturally-occurring species that are thought to be constant across samples; we refer to such species as _housekeeping species_ by analogy with the housekeeping genes used for absolute-abundance conversion in gene-expression studies (REF). -Housekeeping species can sometimes be identified using prior scientific knowledge; for example, in shotgun sequencing experiments, researchers have used sequencing reads from the plant or animal host as a reference (REFs). +Others have instead sought to determine naturally-occurring species that are thought to be constant across samples; we refer to such species as _housekeeping species_ by analogy with the housekeeping genes used for absolute-abundance conversion in gene-expression studies (@silver2006sele). +Housekeeping species can sometimes be identified using prior scientific knowledge; for example, in shotgun sequencing experiments, researchers have used sequencing reads from the plant or animal host as a reference (@karasov2020ther, @regalado2020comb, @wallace2021thed). A related approach involves computationally identifying species that are constant between pairs of samples (@david2014host) or between sample conditions (@mandal2015anal, @kumar2018anal). The abundance of a housekeeping species is typically unknown; therefore, to estimate the abundances of other species, we simply set $\tilde A_r^{(a)}$ to 1 in Equation \@ref(eq:density-ratio-meas). The resulting abundance measurements have unknown but fixed units, which is sufficient for measuring fold changes across samples. @@ -199,6 +199,6 @@ We suggest an additional way of using the reference-species strategy even in the Performing targeted measurements of the absolute abundance of one or more naturally occurring species. These species can then be used as reference species in Equation \@ref(eq:density-ratio-meas) to measure the absolute abundances of all species. The most common form of targeted measurement involves using qPCR or ddPCR to measure the concentration of a marker-gene in the extracted DNA. -It is also possible to directly measure cell concentration by performing ddPCR prior to DNA extraction (REFs), flow cytometry with species-specific florescent probes, or CFU counting on selective media. -Appendix [REF] describes how using multiple reference species and/or statistical modeling can address the fact that any one native reference species is unlikely to be found in all samples. +It is also possible to directly measure cell concentration by performing ddPCR prior to DNA extraction (@morella2018rapi), flow cytometry with species-specific florescent probes, or CFU counting on selective media. +Appendix TODO describes how using multiple reference species and/or statistical modeling can address the fact that any one native reference species is unlikely to be found in all samples. diff --git a/appendix-old.Rmd b/appendix-old.Rmd index 6a6aa8a..9fd4d07 100644 --- a/appendix-old.Rmd +++ b/appendix-old.Rmd @@ -144,7 +144,7 @@ By accounting for the contribution of unknown species when computing proportions ## Using reference species for total-density normalization {#total-density-ref} Constant reference species are sometimes used to measure total density of $S$ by the ratio of $S$ reads to $R$ reads. -For example, a study of *Arabidopsis* microbiomes used the ratio of bacterial to host reads in shotgun sequencing as a proxy for total bacterial density, which they then used for total-community normalization of 16S amplicon sequencing measurements (@karasov2020ther, @regalado2019comb). +For example, a study of *Arabidopsis* microbiomes used the ratio of bacterial to host reads in shotgun sequencing as a proxy for total bacterial density, which they then used for total-community normalization of 16S amplicon sequencing measurements (@karasov2020ther, @regalado2020comb). @chng2020meta similarly used the ratio of bacterial to host or diet reads in shotgun sequencing of mouse fecal samples as a proxy for total bacterial density (though they did not use this measurement for community normalization). @smets2016amet similarly used the ratio of non-spike-in to spike-in reads to estimate total density. @@ -247,7 +247,7 @@ We use _housekeeping species_ (by analogy with housekeeping genes used for norma Housekeeping species can sometimes be identified from prior scientific knowledge. Several studies that have employed shotgun sequencing of host-associated microbiomes have use the plant or animal host for this purpose. -A study of *Arabidopsis* microbiomes used the ratio of bacterial to host reads in shotgun sequencing as a proxy for total bacterial density, which they then used for total-community normalization of 16S amplicon sequencing measurements (@karasov2020ther, @regalado2019comb). +A study of *Arabidopsis* microbiomes used the ratio of bacterial to host reads in shotgun sequencing as a proxy for total bacterial density, which they then used for total-community normalization of 16S amplicon sequencing measurements (@karasov2020ther, @regalado2020comb). @chng2020meta similarly used the ratio of bacterial to host reads in shotgun sequencing of mouse fecal samples as a proxy for total bacterial density (though they did not use this measurement for community normalization). They also use reads from dietary plants for the same purpose. @wallace2021thed used shotgun sequencing to study the virome of _Drosophila_, and normalized virus reads to _Drosophila_ reads to measure viral abundance per fly. diff --git a/main.bib b/main.bib index 05914d8..99e1ab5 100644 --- a/main.bib +++ b/main.bib @@ -508,6 +508,19 @@ @article{lofgren2019geno volume = {28}, year = {2019} } +@article{love2014mode, +author = {Love, Michael I and Huber, Wolfgang and Anders, Simon}, +doi = {10.1186/s13059-014-0550-8}, +journal = {Genome Biol.}, +month = {dec}, +number = {12}, +pages = {550}, +publisher = {BioMed Central}, +title = {{Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}}, +url = {http://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8}, +volume = {15}, +year = {2014} +} @article{lozupone2013meta, author = {Lozupone, Catherine A and Stombaugh, Jesse and Gonzalez, Antonio and Ackermann, Gail and Jansson, Janet K and Gordon, Jeffrey I and Wendel, Doug and Va, Yoshiki and Knight, Rob}, doi = {10.1101/gr.151803.112}, @@ -541,6 +554,18 @@ @article{martin2020mode volume = {14}, year = {2020} } +@article{martin2020mode, +author = {Martin, Bryan D. and Witten, Daniela and Willis, Amy D.}, +doi = {10.1214/19-AOAS1283}, +journal = {Ann. Appl. Stat.}, +month = {mar}, +number = {1}, +pages = {94--115}, +title = {{Modeling microbial abundances and dysbiosis with beta-binomial regression}}, +url = {https://projecteuclid.org/euclid.aoas/1587002666}, +volume = {14}, +year = {2020} +} @article{martino2019anov, author = {Martino, Cameron and Morton, James T. and Marotz, Clarisse A. and Thompson, Luke R. and Tripathi, Anupriya and Knight, Rob and Zengler, Karsten}, doi = {10.1128/mSystems.00016-19}, @@ -658,7 +683,7 @@ @article{rao2021mult volume = {591}, year = {2021} } -@article{regalado2019comb, +@article{regalado2020comb, author = {Regalado, Julian and Lundberg, Derek S. and Deusch, Oliver and Kersten, Sonja and Karasov, Talia and Poersch, Karin and Shirsekar, Gautam and Weigel, Detlef}, doi = {10.1038/s41396-020-0665-8}, journal = {ISME J.}, @@ -716,6 +741,23 @@ @Misc{rstanarm year = {2020}, url = {https://mc-stan.org/rstanarm}, } +@article{silver2006sele, + title = "Selection of housekeeping genes for gene expression studies in + human reticulocytes using real-time {PCR}", + author = "Silver, Nicholas and Best, Steve and Jiang, Jie and Thein, Swee + Lay", + journal = "BMC Mol. Biol.", + volume = 7, + pages = "33", + month = oct, + year = 2006, + url = "http://dx.doi.org/10.1186/1471-2199-7-33", + language = "en", + issn = "1471-2199", + pmid = "17026756", + doi = "10.1186/1471-2199-7-33", + pmc = "PMC1609175" +} @article{silverman2017aphy, author = {Silverman, Justin D and Washburne, Alex D and Mukherjee, Sayan and David, Lawrence A}, doi = {10.7554/eLife.21887}, @@ -728,6 +770,19 @@ @article{silverman2017aphy volume = {6}, year = {2017} } +@article{silverman2018dyna, +author = {Silverman, Justin D. and Durand, Heather K. and Bloom, Rachael J. and Mukherjee, Sayan and David, Lawrence A.}, +doi = {10.1186/s40168-018-0584-3}, +journal = {Microbiome}, +month = {dec}, +number = {1}, +pages = {202}, +publisher = {BioMed Central}, +title = {{Dynamic linear models guide design and analysis of microbiota studies within artificial human guts}}, +url = {https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0584-3}, +volume = {6}, +year = {2018} +} @article{smets2016amet, author = {Smets, Wenke and Leff, Jonathan W. and Bradford, Mark A. and McCulley, Rebecca L. and Lebeer, Sarah and Fierer, Noah}, doi = {10.1016/j.soilbio.2016.02.003}, @@ -766,6 +821,20 @@ @article{stammler2016adju volume = {4}, year = {2016} } +@article{stein2013ecol, +author = {Stein, Richard R. and Bucci, Vanni and Toussaint, Nora C. and Buffie, Charlie G. and R{\"{a}}tsch, Gunnar and Pamer, Eric G. and Sander, Chris and Xavier, Jo{\~{a}}o B.}, +doi = {10.1371/journal.pcbi.1003388}, +editor = {von Mering, Christian}, +journal = {PLoS Comput. Biol.}, +month = {dec}, +number = {12}, +pages = {e1003388}, +publisher = {Public Library of Science}, +title = {{Ecological Modeling from Time-Series Inference: Insight into Dynamics and Stability of Intestinal Microbiota}}, +url = {http://dx.plos.org/10.1371/journal.pcbi.1003388}, +volume = {9}, +year = {2013} +} @article{tettamantiboshier2020comp, author = {{Tettamanti Boshier}, Florencia A. and Srinivasan, Sujatha and Lopez, Anthony and Hoffman, Noah G. and Proll, Sean and Fredricks, David N. and Schiffer, Joshua T.}, doi = {10.1128/mSystems.00777-19}, @@ -981,55 +1050,3 @@ @incollection{zhao2021alog url = {https://link.springer.com/10.1007/978-3-030-73351-3{\_}9}, year = {2021} } -@article{love2014mode, -author = {Love, Michael I and Huber, Wolfgang and Anders, Simon}, -doi = {10.1186/s13059-014-0550-8}, -journal = {Genome Biol.}, -month = {dec}, -number = {12}, -pages = {550}, -publisher = {BioMed Central}, -title = {{Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}}, -url = {http://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8}, -volume = {15}, -year = {2014} -} -@article{martin2020mode, -author = {Martin, Bryan D. and Witten, Daniela and Willis, Amy D.}, -doi = {10.1214/19-AOAS1283}, -journal = {Ann. Appl. Stat.}, -month = {mar}, -number = {1}, -pages = {94--115}, -title = {{Modeling microbial abundances and dysbiosis with beta-binomial regression}}, -url = {https://projecteuclid.org/euclid.aoas/1587002666}, -volume = {14}, -year = {2020} -} -@article{silverman2018dyna, -author = {Silverman, Justin D. and Durand, Heather K. and Bloom, Rachael J. and Mukherjee, Sayan and David, Lawrence A.}, -doi = {10.1186/s40168-018-0584-3}, -journal = {Microbiome}, -month = {dec}, -number = {1}, -pages = {202}, -publisher = {BioMed Central}, -title = {{Dynamic linear models guide design and analysis of microbiota studies within artificial human guts}}, -url = {https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0584-3}, -volume = {6}, -year = {2018} -} -@article{stein2013ecol, -author = {Stein, Richard R. and Bucci, Vanni and Toussaint, Nora C. and Buffie, Charlie G. and R{\"{a}}tsch, Gunnar and Pamer, Eric G. and Sander, Chris and Xavier, Jo{\~{a}}o B.}, -doi = {10.1371/journal.pcbi.1003388}, -editor = {von Mering, Christian}, -journal = {PLoS Comput. Biol.}, -month = {dec}, -number = {12}, -pages = {e1003388}, -publisher = {Public Library of Science}, -title = {{Ecological Modeling from Time-Series Inference: Insight into Dynamics and Stability of Intestinal Microbiota}}, -url = {http://dx.plos.org/10.1371/journal.pcbi.1003388}, -volume = {9}, -year = {2013} -}