RasperGade: Reconstructing Ancestral State under Pulsed Evolution in R by Gaussian Decomposition

RasperGade is an R package that predicts ancestral and hidden states while accounting for pulsed evolution and time-independent variation. It has 3 major functions:

fitting the pulsed evolution model given extant trait value and a phylogeny
reconstructing ancestral states or predicting hidden states
evaluating the quality of predicted ancestral and hidden states

Detailed algorithm and analyses are described in

Yingnan Gao, Martin Wu, Modeling Pulsed Evolution and Time-Independent Variation Improves the Confidence Level of Ancestral and Hidden State Predictions, Systematic Biology, 2022;, syac016, https://doi.org/10.1093/sysbio/syac016

System requirements

RasperGade are built on R 3.6.3. It requires the following R packages and their dependencies for fitting the pulsed evolution model and ancestral state reconstruction:

ape, bbmle, castor,digest, parallel,pracma

The following R packages are suggested for plotting diagnostic graphs:

ggplot2, ggpubr

Installation

The required packages can be installed from CRAN by running the following lines in R:

  install.packages("ape")
  install.packages("bbmle")
  install.packages("castor")
  install.packages("digest")
  install.packages("parallel")
  install.packages("pracma")

The suggested packages can also be installed from CRAN:

  install.packages("ggplot2")
  install.packages("ggpubr")

RasperGade can be installed from Github using the devtools package:

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github(repo = "wu-lab-uva/RasperGade")

Data format

RasperGade takes two basic input data: the phylogeny and the extant trait values, to perform a typical ancestral state reconstruction or hidden state prediction.

Phylogenies are expected to be phylo-class objects as defined by the R package ape.

Extant trait values are expected to be a named numeric vector, and the names should correspond to tip labels of the phylogeny. Hidden states to be predicted are either absent or marked by NA values in the named vector.

Tips should have unique labels, and if nodes have labels, they should be unique.

Demo

A small demo is provided under the folder inst/extdata/Demo Once RasperGade is installed, the location of the demo files can also be accessed by the following command in R

demo.tree.file = system.file("extdata/Demo/demo.tree",package = "RasperGade",mustWork = TRUE)
demo.trait.file = system.file("extdata/Demo/demo.trait.txt",package = "RasperGade",mustWork = TRUE)

To run a typical leave-one-out cross-validation analysis

library(RasperGade)

# read the demo tree and trait values
tree = read.tree(demo.tree.file)
trait = read.table(demo.trait.file,sep = "\t",stringsAsFactors = FALSE)[,2]
names(trait) = read.table(demo.trait.file,sep = "\t",stringsAsFactors = FALSE)[,1]

# fit the pulsed evolution model
pe.model = fitPE.phy(phy = tree,x = trait[tree$tip.label],laplace = FALSE)
print(pe.model$params)
##   lambda      size     sigma   epsilon 
## 1.172651 58.925763  6.760760  2.166998 

# conducting leave-one-out cross-validation under different models
bm.cv = LOO_CV_with_BM(tree,trait)
pe.fmr = fullMarginalReconstructionWithPE(phy = tree,x = trait[tree$tip.label],params = pe.model$params,laplace = FALSE)
pe.cv = LOO_CV_with_PE(FMR = pe.fmr)

To evaluate the performance of CV

# get the residual errors
bm.error = trait[tree$tip.label] - bm.cv$summary$x
pe.error = trait[tree$tip.label] - pe.cv$summary$x

# compare the error and bias
## bias
print(c(BM=mean(bm.error),PE=mean(pe.error)))
###         BM          PE 
### 0.01810704 -0.08808421 

## error
print(c(BM=mean(abs(bm.error)),PE=mean(abs(pe.error))))
###       BM       PE 
### 3.258241 3.156028 

# get the residual Z-scores
bm.pZ = pseudo.Z.score(trait=trait[tree$tip.label],pred=bm.cv$summary$x,error=bm.cv$error)
bm.pZ$se = sqrt(bm.cv$summary$var)
pe.pZ = pseudo.Z.score(trait=trait[tree$tip.label],pred=pe.cv$summary$x,error=pe.cv$error)
pe.pZ$se = sqrt(pe.cv$summary$var)

# calculate diagnostic statistics
diag.stats = lapply(list(BM=bm.pZ,PE=pe.pZ),function(x){
  this.test = ks.test(x$q,pnorm)
  this.D = this.test$statistic
  this.H = calculateHeteroscedasticity(x = x$q,y = x$se,bin = 20)[1]
  this.RSS = calculateBinomRSS(p = rep(0.95,dim(x)[1]),
                               obs = sapply(x$p,function(pp){
                                 min(pp,1-pp)>=0.025}),group = log(x$se),bin = 10,equal.bin = FALSE)
  data.frame(D = unname(this.D),H = unname(this.H),RSS = unname(this.RSS))
})
print(diag.stats)

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
R		R
inst/extdata/Demo		inst/extdata/Demo
man		man
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
RasperGade.Rproj		RasperGade.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RasperGade: Reconstructing Ancestral State under Pulsed Evolution in R by Gaussian Decomposition

System requirements

Installation

Data format

Demo

About

Releases

Packages

Contributors 2

Languages

wu-lab-uva/RasperGade

Folders and files

Latest commit

History

Repository files navigation

RasperGade: Reconstructing Ancestral State under Pulsed Evolution in R by Gaussian Decomposition

System requirements

Installation

Data format

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages