-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscratchwork-AMS.qmd
754 lines (601 loc) · 23.3 KB
/
scratchwork-AMS.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
---
title: "Annakate scratchwork"
description: "Intro R-phylogenetics tutorial using Osmondi et al 2020 data"
format:
html:
df-print: kable
code-fold: show
code-summary: "Hide code"
code-overflow: wrap
toc-title: Page Contents
toc: true
toc-depth: 2
toc-location: right
number-sections: false
html-math-method: katex
smooth-scroll: true
editor: source
editor_options:
chunk_output_type: console
---
## References and External Resources
>[Osmondi et al 2020](https://onlinelibrary.wiley.com/doi/10.1111/tbed.13573)(2020)
>The role of African buffalo in the epidemiology of foot-and-mouth disease in sympatric cattle and buffalo populations in Kenya. https://doi.org/10.1111/tbed.13573
GenBank [PopSet: 1685824549] (https://www.ncbi.nlm.nih.gov/popset?LinkName=nuccore_popset&from_uid=1685824549)
## Libraries
A few packages are needed. First, these help with directory management, visualization, and data wrangling.
```{r, warning=FALSE, message=FALSE}
library(here) # directory management
library(tidyverse) # ggplot, lubridate, and the like
options(dplyr.summarise.inform = FALSE) # don't render data default summaries
library(ggmap) # maps
library(ggspatial) # spatial plots
library(pals) # color pallets
library(gt) # pretty tables
library(coda) # mcmc summaries/tools
```
Next, several genetics specific packages are recommended. The *BioManager* packages may take a few minutes to compile.
```{r, warning=FALSE, message=FALSE}
library(ape) #Analyses of Phylogenetics and Evolution (APE)
library(phangorn) # phylogenetic trees and networks
library(rentrez) # R interface to the NCBI - GenBank
# these next 4 pieces of code check each package, then installs them if not already installed.
# if (!requireNamespace("BiocManager", quietly = TRUE)) {
# install.packages("BiocManager")
# }
#
# if (!requireNamespace("Biostrings", quietly = TRUE)) {
# BiocManager::install("Biostrings")
# }
#
# if (!requireNamespace("msa", quietly = TRUE)) {
# BiocManager::install("msa")
# }
#
# if (!requireNamespace("ggtree", quietly = TRUE)) {
# BiocManager::install("ggtree")
# }
# if now installed, load the packages
library(Biostrings) # sequence wrangling
library(msa) # Multiple Sequence Alignment (MSA) algorithms
library(ggtree) # tree visualization and annotation
```
Custom functions created for this demo.
```{r, warning=FALSE, message=FALSE}
source(here("R/utilities.R"))
source_dir(here("R"))
```
# -------------------------------------------------------------------------------------------------
# accession numbers --> ids and metadata --> sequences
# -------------------------------------------------------------------------------------------------
## Query GenBank
The Osmondi paper provided a listing of accession numbers as a supplemental. This was attached as a Word Doc, which I copy-and-pasted into `osmondi_2020_supplemental.csv`.
```{r}
osmondi_seqs <- read_csv(here("data_AMS/osmondi_2020_supplemental.csv"))
head(osmondi_seqs)
search_term <- paste(osmondi_seqs$Accession, collapse = " OR ")
genbank_return <- entrez_search(
db = "nucleotide",
term = search_term,
retmax = 100 # the default is 20 records, our list is 95
)
```
Alternatively, search the database for the *popset* of interest and return all metadata. This is simpler, but not always available and GenBank may stop using PopSet numbers altogether next year.
```{r}
popset_id <- "1685824549" # "population set" from GenBank page
genbank_return <-
entrez_search(db = "nucleotide",
term = paste0("POPSET:", popset_id), # search by popset number
retmax = 100)
```
### Check Contents
No actual sequences yet, only metadata. Even if you can easily access the sequences, the expanded retrieval process in this script is often needed to pull additional info, like collection dates, isolate names/labels, or geographic coordinates.
```{r}
genbank_return # check results
class(genbank_return)
str(genbank_return)
```
### Samples Table
Using the metadata for each record, the desired data is pulled one at a time, then organized as a data frame. Lots of character string wrangling, yuk!
First, create an empty data frame to hold results.
```{r}
seq_meta <- data.frame(Accession=character(),
Collection=character(),
Host=character(),
Isolate=character(),
stringsAsFactors=FALSE)
```
Then, loop through each record and pull desired data. In this case, searching the metadata for accession numbers, collection dates, host type, and the more detailed isolate names.
```{r}
for (id in genbank_return$ids) {
try({
record <- entrez_fetch(db="nucleotide", id=id, rettype="gb", retmode="text")
accession <- sub("^.*?ACCESSION\\s+([^\n]+).*", "\\1", record)
Collection <- ifelse(grepl("/collection_date=", record),
sub("^.*?/collection_date=\"([^\"]+)\".*", "\\1", record), NA)
host <- ifelse(grepl("/host=", record),
sub("^.*?/host=\"([^\"]+)\".*", "\\1", record), NA)
isolate <- ifelse(grepl("/isolate=", record),
sub("^.*?/isolate=\"([^\"]+)\".*", "\\1", record), NA)
# add to data frame
seq_meta <- rbind(seq_meta, data.frame(Accession=accession,
Collection=Collection,
Host=host,
Isolate=isolate,
stringsAsFactors=FALSE)) %>%
as.data.frame()
# delay to prevent overwhelming the API server
Sys.sleep(0.5) # this gives a 0.5 second gap
}, silent = TRUE) # continue if an error
}
```
### Data Table
Examine what was retrived.
```{r}
dim(seq_meta)
head(seq_meta)
# using trimws due to an extra space in the numbers
seq_meta$Accession <- trimws(seq_meta$Accession)
# add a couple more columns
seq_meta$Serotype <- sub("/.*", "", seq_meta$Isolate)
seq_meta$Animal <- sub("^.*/([^/]+)/[^/]+$", "\\1", seq_meta$Isolate)
seq_meta %>%
gt() %>%
tab_header(
title = md("Kenya Sequences Metadata")) %>%
cols_width(starts_with("Accession") ~ px(90),
starts_with("Collection") ~ px(80),
starts_with("Host") ~ px(100),
starts_with("Isolate") ~ px(180),
starts_with("Serotype") ~ px(80),
starts_with("Animal") ~ px(80),
everything() ~ px(95)) %>%
tab_options(table.font.size = "small",
row_group.font.size = "small",
stub.font.size = "small",
column_labels.font.size = "medium",
heading.title.font.size = "large",
data_row.padding = px(2),
heading.title.font.weight = "bold",
column_labels.font.weight = "bold") %>%
opt_stylize(style = 6, color = 'gray')
```
### Samples by Group
```{r}
seq_meta %>%
group_by(Host, Serotype) %>%
summarise(Count = length(Accession))
```
```{r save}
save(genbank_return, osmondi_seqs, seq_meta, popset_id, search_term,
file="data_AMS/prep-for-seqs.Rda")
```
## Retrieve Sequences
Now, query GenBank for the actual sequences. Example here using SAT1 as an example.
```{r}
# load("data_AMS/prep-for-seqs.Rda")
sat1_df <- seq_meta %>%
filter(Serotype == "SAT1")
# function to get sequences
get_sequences <- function(accessions) {
sequences <- sapply(accessions, function(acc) {
entrez_fetch(db = "nuccore", id = acc, rettype = "fasta")
})
return(sequences)
}
# run function
sat1_sequences <- get_sequences(sat1_df$Accession)
# remove special characters
sat1_sequences <- gsub("[^ATCG]", "", sat1_sequences)
# save to text file - fasta format
writeLines(sat1_sequences, here("data_AMS/sat1-sequences.fasta"))
```
```{r SAT2}
# load("data_AMS/prep-for-seqs.Rda")
sat2_df <- seq_meta %>%
filter(Serotype == "SAT2")
# function to get sequences
get_sequences <- function(accessions) {
sequences <- sapply(accessions, function(acc) {
entrez_fetch(db = "nuccore", id = acc, rettype = "fasta")
})
return(sequences)
}
# run function
sat2_sequences <- get_sequences(sat2_df$Accession)
# remove special characters
sat2_sequences <- gsub("[^ATCG]", "", sat2_sequences)
# save to text file - fasta format
writeLines(sat2_sequences, here("data_AMS/sat2-sequences.fasta"))
```
# -------------------------------------------------------------------------------------------------
# alignment and substitution model
# -------------------------------------------------------------------------------------------------
## Alignment
```{r}
# ensure all is named correctly
unique_names <- make.unique(names(sat1_sequences))
names(sat1_sequences) <- unique_names
# convert to a DNAStringSet object, needed for the msa package
dna_sequences <- DNAStringSet(sat1_sequences)
# MUSCLE alignment
alignment <- msa(dna_sequences, method = "Muscle")
alignment <- as(alignment, "DNAStringSet")
# save the aligned sequences to a text file
writeXStringSet(alignment, filepath = here("data_AMS/sat1-aligned.fasta"))
```
```{r SAT2}
# ensure all is named correctly
unique_names2 <- make.unique(names(sat2_sequences))
names(sat2_sequences) <- unique_names2
# convert to a DNAStringSet object, needed for the msa package
dna_sequences2 <- DNAStringSet(sat2_sequences)
# MUSCLE alignment
alignment2 <- msa(dna_sequences2, method = "Muscle")
alignment2 <- as(alignment2, "DNAStringSet")
# save the aligned sequences to a text file
writeXStringSet(alignment2, filepath = here("data_AMS/sat2-aligned.fasta"))
```
### View Alignment
A plot to view the alignment. These are very clean, hardly any breaks or missingness.
```{r, fig.width=10, fig.height=6}
plot_alignment(alignment)
```
```{r SAT2, fig.width=10, fig.height=6}
plot_alignment(alignment2)
```
## Substitution Model
Read in the saved alignment. Do this rather than using the version already in the environment, because the classes are different.
```{r, message=FALSE}
alignment <- read.dna(here("data_AMS/sat1-aligned.fasta"),
format="fasta", as.matrix=TRUE)
# convert again for modelTest (phangorn pkg)
aligned_phyDat <- as.phyDat(alignment)
# run the test, compare the models
mt <- modelTest(aligned_phyDat)
mt %>%
arrange(AIC) %>%
slice_head(n=5) %>%
gt()
# use the next to best
env <- attr(mt, "env")
best_mod <- eval(get("GTR+G(4)+I", env), env)
best_mod
```
```{r SAT2, message=FALSE}
alignment2 <- read.dna(here("data_AMS/sat2-aligned.fasta"),
format="fasta", as.matrix=TRUE)
# convert again for modelTest (phangorn pkg)
aligned_phyDat2 <- as.phyDat(alignment2)
# run the test, compare the models
mt2 <- modelTest(aligned_phyDat2)
mt2 %>%
arrange(AIC) %>%
slice_head(n=5) %>%
gt()
# use the best (same one we picked for SAT1)
env2 <- attr(mt2, "env")
best_mod2 <- eval(get("GTR+G(4)+I", env2), env2)
best_mod2
```
```{r save}
save(mt, best_mod,
file="data_AMS/sat1.Rda")
save(mt2, best_mod2,
file="data_AMS/sat2.Rda")
```
# -------------------------------------------------------------------------------------------------
# tree checks
# -------------------------------------------------------------------------------------------------
## Maximum Likelihood Tree
Quick tree to see if there's any craziness happening. Also an opportunity to check out [**ggtree**](https://yulab-smu.top/treedata-book/).
### Optimize model
This optimization process is rather specific to this algorithm; it's OK for quick checks, but you'll want to use other methods for publishable results.
```{r}
# optimize model parameters without fitting edges
fit1 <- optim.pml(best_mod, # best model
optNni = FALSE, optBf = TRUE,
optQ = TRUE, optInv = TRUE,
optGamma = TRUE, optEdge = FALSE,
optRate = TRUE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
#Fix substitution model and fit tree
fit2 <- optim.pml(fit1,
optNni = TRUE, optBf = FALSE,
optQ = FALSE, optInv = FALSE,
optGamma = FALSE, optEdge = TRUE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
#Fine tune
fit3 <- optim.pml(fit2,
optNni = TRUE, optBf = TRUE,
optQ = TRUE, optInv = TRUE,
optGamma = TRUE, optEdge = TRUE,
optRate = FALSE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
```
```{r SAT2}
# optimize model parameters without fitting edges
fit1.2 <- optim.pml(best_mod2, # best model
optNni = FALSE, optBf = TRUE,
optQ = TRUE, optInv = TRUE,
optGamma = TRUE, optEdge = FALSE,
optRate = TRUE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
#Fix substitution model and fit tree
fit2.2 <- optim.pml(fit1.2,
optNni = TRUE, optBf = FALSE,
optQ = FALSE, optInv = FALSE,
optGamma = FALSE, optEdge = TRUE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
#Fine tune
fit3.2 <- optim.pml(fit2.2,
optNni = TRUE, optBf = TRUE,
optQ = TRUE, optInv = TRUE,
optGamma = TRUE, optEdge = TRUE,
optRate = FALSE,
control = pml.control(epsilon = 1e-08,
maxit = 10, trace = 0))
```
## Bootstrap Values
Only running 100 trees as an example, although a small number, this might take a few minutes...
```{r}
set.seed(1976)
boots <- bootstrap.pml(fit3,
bs = 100,
optNni = TRUE,
control = pml.control(trace = 0))
```
```{r SAT2}
set.seed(1976)
boots2 <- bootstrap.pml(fit3.2,
bs = 100,
optNni = TRUE,
control = pml.control(trace = 0))
```
```{r save}
save(mt, best_mod, boots,
file="data_AMS/sat1.Rda")
save(mt2, best_mod2, boots2,
file="data_AMS/sat2.Rda")
```
## Tree Plots
Using **phangorn**.
```{r, fig.width=8, fig.height=10}
# get the best tree from optimization
ml_tree <- fit3$tree
# Or use a consensus tree
# consensus_tree <- consensus(boots, p = 0.5)
# phangorn specific plots
plotBS(midpoint(ml_tree), boots,
type="p", cex=0.4,
bs.adj = c(1.25, 1.25),
bs.col = "black")
add.scale.bar()
title("Maximum Likelihood")
```
```{r SAT2, fig.width=8, fig.height=10}
# get the best tree from optimization
ml_tree2 <- fit3.2$tree
# Or use a consensus tree
# consensus_tree2 <- consensus(boots2, p = 0.5)
# phangorn specific plots
plotBS(midpoint(ml_tree2), boots2,
type="p", cex=0.4,
bs.adj = c(1.25, 1.25),
bs.col = "black")
add.scale.bar()
title("Maximum Likelihood")
```
Could also extract the bootstrap values for use in other plotting tools.
```{r}
bootstrap_matrix <- sapply(boots, function(tree) tree$node.label)
bootstrap_matrix <- apply(bootstrap_matrix, 2, as.numeric)
bootstrap_summarized <- apply(bootstrap_matrix, 1, mean, na.rm = TRUE)
ml_tree$node.label <- round(bootstrap_summarized, 2)
# root on oldest sequences, Jan 2014
ml_tree = root(ml_tree, c("MH882578", "MH882579", "MH882580"),
resolve.root = TRUE)
```
```{r SAT2}
bootstrap_matrix2 <- sapply(boots2, function(tree) tree$node.label)
#The bootstrap trees for SAT2 appear to differ in number of nodes/labels,
#unlike those for SAT1.
#As a result, the line above can't return a neat matrix (# of rows conflict)
#and instead returns a list of individual vectors, one per tree.
#I'm not sure how to sync up those vectors correctly
#or exactly what the different lengths mean,
#so I'm skipping SAT2 ggtree plotting.
```
Using **ggtree**. This basic plot isn't much prettier than the above, but offers more flexibility for customization.
```{r, fig.width=8, fig.height=10}
ggtree(ml_tree, size=0.5, col = "gray30",
ladderize=TRUE) +
geom_text2(aes(subset = !isTip, label=label),
hjust=-0.4,
size=3,
color="black") +
geom_tiplab(col="gray40", size=3,
align=FALSE, offset = 0.025, hjust = 1.4) +
geom_treescale(width=0.02)
```
Here's a more fancy-fied version:
```{r, fig.width=8, fig.height=10}
# get labels
tree_df <- as.data.frame(
ml_tree$tip.label
)
names(tree_df) <- "label"
# match to isolate names, more info
tree_df$isolate <- with(seq_meta,
Isolate[match(
tree_df$label,
Accession)])
# match to host type
tree_df$host <- with(seq_meta,
Host[match(
tree_df$label,
Accession)])
# add data to tree
tmp_tree <- full_join(ml_tree, tree_df, by = 'label')
ggtree(tmp_tree, size=0.5, col = "gray30",
ladderize=TRUE) +
geom_tiplab(aes(label = isolate), col="gray40", size=3,
align=FALSE, offset = 0.025, hjust = .5) +
geom_text2(aes(subset = !isTip, label=label),
hjust=-0.4,
size=3,
color="darkred") +
geom_tippoint(aes(colour=host, shape=host), size = 4) +
scale_color_manual(values = c("Syncerus_caffer" = "red", "Bos_taurus" = "blue")) +
scale_shape_manual(values = c("Syncerus_caffer" = 16, "Bos_taurus" = 17)) +
theme(plot.margin = unit(c(1,8,1,0.1), "mm"),
axis.title.x = element_text(size=24, face="bold"),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks=element_blank(),
axis.line=element_blank(),
legend.direction = "vertical",
legend.position= "inside",
legend.position.inside = c(0.2, 0.8),
strip.text = element_blank(),
strip.background = element_blank(),
legend.key.size = unit(2,"line"),
legend.key.width = unit(3,"line"),
legend.text = element_text(size=16, face="bold"),
legend.title = element_text(size=16, face="bold"),
plot.title = element_text(size=28, face="bold")) +
geom_treescale(width=0.02) +
labs(colour = "Host", shape = "Host") +
guides(colour = guide_legend(override.aes = list(size=4)))
```
# -------------------------------------------------------------------------------------------------
# BEAST
# -------------------------------------------------------------------------------------------------
### Create a date file
```{r}
sat1_dates <- sat1_df %>%
select(Accession, Collection)
unique(sat1_dates$Collection)
sat1_dates <- sat1_dates %>%
mutate(samp_date = case_when(
Collection == "2016-01" ~ as_date("2016-01-01", format = "%Y-%m-%d"),
Collection == "2016-10" ~ as_date("2016-10-01", format = "%Y-%m-%d"),
Collection == "2014-01" ~ as_date("2014-01-01", format = "%Y-%m-%d"),
)) %>%
select(-Collection)
write.table(sat1_dates, file = here("beast_AMS/sat1-dates.txt"),
sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
```
```{r SAT2}
sat2_dates <- sat2_df %>%
select(Accession, Collection)
unique(sat2_dates$Collection)
sat2_dates <- sat2_dates %>%
mutate(samp_date = case_when(
Collection == "2016-01" ~ as_date("2016-01-01", format = "%Y-%m-%d"),
Collection == "2015-03" ~ as_date("2015-03-01", format = "%Y-%m-%d"),
Collection == "2014-09" ~ as_date("2014-09-01", format = "%Y-%m-%d"),
)) %>%
select(-Collection)
write.table(sat2_dates, file = here("beast_AMS/sat2-dates.txt"),
sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
```
1. Beauti walk through (produces .xml)
2. Write shell script (save as .sh)
3. Login to HPC (Ceres has Beast, Atlas has Beast2, or install preferred version on either)
4. Upload data (.xml, .sh) to a folder ("phylo-demo")
5. Open shell, switch to folder
6. Run shell script (.sh) as below
```
$ cd phylo-demo
$ dos2unix sat*-**.sh
$ sbatch sat*-**.sh
$ squeue -u annakate.schatz
```
7. Download BEAST output files (.log, .trees)
### HPC run results
SAT1 files
- Ceres: shell file runs and returns Beast files, but Beagle fails to load
- Atlas: ditto
SAT2 files
- Ceres:
- Atlas:
# -------------------------------------------------------------------------------------------------
# post-BEAST
# -------------------------------------------------------------------------------------------------
## Tracer Type Stats
```{r}
sat1_stats <- get_tracer_stats(here("beast_AMS/sat1-tracelog-AMS.log"))
sat1_stats %>%
gt() %>%
tab_header(
title = md("Simple SAT1 Tree")) %>%
cols_width(everything() ~ px(95)) %>%
tab_options(table.font.size = "small",
row_group.font.size = "small",
stub.font.size = "small",
column_labels.font.size = "medium",
heading.title.font.size = "large",
data_row.padding = px(2),
heading.title.font.weight = "bold",
column_labels.font.weight = "bold") %>%
opt_stylize(style = 6, color = 'gray')
```
```{r SAT2}
sat1_stats <- get_tracer_stats(here("beast_AMS/sat2-tracelog-AMS.log"))
sat1_stats %>%
gt() %>%
tab_header(
title = md("Simple SAT2 Tree")) %>%
cols_width(everything() ~ px(95)) %>%
tab_options(table.font.size = "small",
row_group.font.size = "small",
stub.font.size = "small",
column_labels.font.size = "medium",
heading.title.font.size = "large",
data_row.padding = px(2),
heading.title.font.weight = "bold",
column_labels.font.weight = "bold") %>%
opt_stylize(style = 6, color = 'gray')
```
## Maximum Clade Credibility Tree
### Ceres
```
$ module load java
$ module load beast2
$ beast
$ treeannotator -burnin 10000 -heights median sat1-treelog-AMS.trees sat1-mcc.tree
```
ERROR ($ beast) : "/software/el9/apps/beast2/2.7.3/bin/beast: line 67: /software/el9/apps/java/17/bin/java: Operation not permitted"
ISSUE: not sure how to load downloaded version of Beast (Beast2 2.7.7)
### Atlas
```
$ module load openjdk
$ module load beast2/2.7.7
$ beast
$ treeannotator -burnin 10000 -heights median sat1-treelog-AMS.trees sat1-mcc.tree
```
ERROR ($ treeannotator ...) : "/home/annakate.schatz/beast/bin/treeannotator: line 33: /apps/spack-managed/gcc-11.3.1/beast2-2.7.7-jiva66x7qqmcbrniff4jtdqspeg7ao2q/jre/bin/java: No such file or directory"
- appears to be a Java problem?
- notes about Java also come up when initializing Beast
java.lang.reflect.InvocationTargetException
at (multiple lines of paths)
Caused by: java.awt.HeadlessException:
(and more)
ISSUE: I think this is loading Atlas's Beast2, not the one I downloaded, but that might not matter since both are 2.7.7
### Visualization
```{r}
options(ignore.negative.edge=TRUE)
sat_mcc.tree <- read.nexus(here("beast_AMS/sat1-mcc.tree"))
ggtree(sat_mcc.tree, size=0.5, col = "gray30",
ladderize=TRUE) +
geom_tiplab(col="gray40", size=3,
align=FALSE, offset = 0.025, hjust = 2.25)
```
# -------------------------------------------------------------------------------------------------