-
Notifications
You must be signed in to change notification settings - Fork 1
/
PhyloCompMethodsMaterial.Rmd
1492 lines (1025 loc) · 57.2 KB
/
PhyloCompMethodsMaterial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "`Phylogenetic Comparative Methods` in R"
author: "Pedro Henrique Pereira Braga"
abstract: "MSc. in Ecology and Evolution; PhD Candidate at Concordia University, Montreal"
date: "`r Sys.Date()`"
email: "[email protected]"
output:
rmdformats::html_clean:
highlight: kate
thumbnails: true
lightbox: true
gallery: true
fig_caption: TRUE
---
```{r knitr_init, echo=FALSE, results="asis", cache=FALSE}
library(knitr)
library(rmdformats)
## Global options
options(max.print = "75")
opts_chunk$set(echo = FALSE,
cache = TRUE,
prompt = FALSE,
tidy = FALSE,
comment = NA,
message = FALSE,
warning = FALSE,
strip.white = TRUE,
fig.align = 'center',
fig.width = 9,
fig.height = 9)
opts_knit$set(width = 75)
```
# What is this?
This is a short course/tutorial I developed to help researchers cover some of the families of phylogenetic comparative analyses of trait evolution and correlation, diversification rates, as well as community structure. My objective is to provide a short background in statistical and comparative phylogenetic methods, to further allow researchers to explore their research questions on their own.
A big part of the inspiration for this workshop came from the book *Phylogenies in Ecology*, by *Marc W. Cadotte* and *Jonathan Davies*, as well as from courses and workshops on ecophylogenetics taught by Dr. *Will Pearse*, Dr. *Jonathan Davies* and Dr. *Steven Kembel*.
I am progressively adding more material to this document. You can access all topics addressed here in the outline on the right side of this page.
If you would like to provide feedback on this material, do not hesitate to get in touch!
I will highly appreciate your comments!
Kind regards,
Pedro Henrique P. Braga
**pedrohbraga.github.io**
**ph.pereirabraga [at] gmail [dot] com**.
## Preparation for this tutorial
I kindly ask you to run the following code to download the datasets, and install and load the libraries we will use through this course:
```{r dataset-download, eval=FALSE, echo=TRUE, message=FALSE, warning=FALSE, paged.print=FALSE}
# Check for needed packages, and install the missing ones
required.libraries <- c("ape", "picante", "pez", "phytools",
"vegan", "adephylo", "phylobase",
"geiger", "mvMORPH", "OUwie",
"hisse", "BAMMtools",
"phylosignal", "Biostrings",
"devtools",
"ggplot2",
"fulltext")
needed.libraries <- required.libraries[!(required.libraries %in% installed.packages()[,"Package"])]
if(length(needed.libraries)) install.packages(needed.libraries)
# Load all required libraries at once
lapply(required.libraries, require, character.only = TRUE)
### Install missing libraries
source("https://bioconductor.org/biocLite.R")
biocLite("ggtree")
###
set.seed(1)
# Download files from Marc Cadotte's and Jonathan Davies' book:
temp <- tempfile()
download.file("http://www.utsc.utoronto.ca/~mcadotte/ewExternalFiles/resources.zip", temp)
dir.create("data/Jasper")
JasperData <- c("jasper_tree.phy", "jasper_data.csv")
lapply(JasperData,
FUN = function(x)
unzip(zipfile = temp,
files = paste("resources/data/Jasper/", x, sep = ""),
exdir = "data/Jasper"))
# Download the tree files from this paper
download.file("https://onlinelibrary.wiley.com/action/
downloadSupplement?doi=10.1111%2Fj.1461-0248.2009.01307.x&attachmentId=179085359",
destfile = "data/ele_1307_sm_sa1.tre")
# If the download of the above file does not work,
# download the first supplementary material from the following paper
# and place it within your 'data' directory:
# https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2009.01307.x
```
# Why are phylogenies important?
Since many decades we recognize that many ecolgical patterns are processes are not independent of the evolution of the lineages involved in generating these patterns.
Here, we are taking it seriously what Theodosius Dobzhansky said in 1973: *“nothing in biology makes sense except in the light of evolution”*!
# A brief introduction to phylogenies
**Phylogenetic trees** (*i.e.*, **evolutionary tree** or **cladogram**) are branching diagrams illustrating the evolutionary relationships among taxa.
Phylogenetic trees can be constructed based on species’ genes and/or on traits. Phylogenetic reconstruction through analyses of molecular sequences usually consists of three distinct procedures: (i) sequence alignment; (ii) character coding; and (iii) tree building. There is a range of available options for each of these steps, with no simple objective method for choosing between them.
<center>![The root of a phylogenetic tree indicates that an ancestral lineage gave rise to all organisms on the tree. A branch point indicates where two lineages diverged. A lineage that evolved early and remains unbranched is a basal taxon. When two lineages stem from the same branch point, they are sister taxa. A branch with more than two lineages is a polytomy. Extracted from: https://courses.lumenlearning.com/wm-biology1/chapter/reading-structure-of-phylogenetic-trees/ \label{figurelabel}](figures/Figure_20_01_02.jpg) </center>
In this course, we will not cover the steps necessary to reconstruct phylogenies. We will thus assume that a phylogenetic relationship is already available for the taxa of interest.
## `ape` and the `phylo` object in R
The core of phylogenetics analyses in R is the library `ape` - an acronym that stands for **A**nalysis of **P**hylogenetics and **E**volution. You can run the following commands to install, load and call help for `ape`:
```{r ape, echo=TRUE, eval=TRUE}
# install.packages("ape")
library(ape)
# help(package = ape)
```
From `ape`, we depict two important *functions* to read and load trees into `R`: `read.nexus` and `read.tree`:
`read.nexus` reads NEXUS formatted trees, which hasis organized into major units known as *blocks* and usually have the following basic structure:
```{}
#NEXUS
[!This is the data file used for my dissertation]
BEGIN TAXA;
TaxLabels Scarabaeus Drosophila Aranaeus;
END;
BEGIN TREES;
Translate beetle Scarabaeus, fly Drosophila, spider Aranaeus;
Tree tree1 = ((1,2),3);
Tree tree2 = ((beetle,fly),spider);
Tree tree3 = ((Scarabaeus,Drosophila),Aranaeus);
END;
```
`read.tree` reads Newick formatted trees, which commonly have the following format:
```{}
(Bovine:0.69395,(Hylobates:0.36079,(Pongo:0.33636,(G._Gorilla:0.17147, (P._paniscus:0.19268,H._sapiens:0.11927):0.08386):0.06124):0.15057):0.54939, Rodent:1.21460)
```
When phylogenies are created, loaded and handled in `ape`, their objects are represented as a list of *class* `phylo`. Let's try reading the following text string of a tree (*is it in Newick or in NEXUS format?*) and see what `R` tells us when we call it:
```{r phylo-primates, echo=TRUE, eval=TRUE}
tree.primates <- read.tree(text="((((Homo:0.21, Pongo:0.21):0.28, Macaca:0.49):0.13,
Ateles:0.62):0.38, Galago:1.00);")
tree.primates
```
A class `phylo` object distributes the information of phylogenetic trees into six main components:
```{r phylo-object-str, echo=TRUE, eval=TRUE}
str(tree.primates)
```
Since `phylo` is a list, we can access its elements by using the `$` symbol. We can access the *matrix* `edge` by running `phylo_object$edge`, which hasthe beginning and ending node number for all the nodes and tips of a given tree. In summary, this allows us to keep track of the internal and external nodes of the tree:
```{r phylo-object-edge, echo=TRUE, eval=TRUE}
tree.primates$edge
```
The *vector* `tip.label` element contains all the labels for all tips in the tree (*i.e.* the names of your OTUs):
```{r phylo-object-tiplabel, echo=TRUE, eval=TRUE}
tree.primates$tip.label
```
There is a little error in the tip labels, right? We can easily rewrite the labels with this:
```{r phylo-object-tiplabel-2, echo=TRUE, eval=TRUE}
tree.primates$tip.label <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")
```
Another important element is the vector `edge.length`, which indicates the length of each edge with relation to the root or base of the tree:
```{r phylo-object-edges, echo=TRUE, eval=TRUE}
tree.primates$edge.length
```
Finally, `Nnode` contains the number of internal nodes in the tree:
```{r phylo-object-Nnode, echo=TRUE, eval=TRUE}
tree.primates$Nnode
```
One can see all the important information a `phylo` object carries from a tree by plotting it with all its elements:
```{r phylo-object-plotting, echo=TRUE, eval=TRUE}
plot(tree.primates,
edge.width = 2,
label.offset = 0.05,
type = "cladogram")
nodelabels()
tiplabels()
add.scale.bar()
```
There are other classes that store trees in R, such as `phylo4` and `phylo4d` objects. We will talk about them later in the course.
Sometimes, you may be interested in simulating trees. There are various ways of creating trees through simulations in `R`. Let's take a look at `phytools`'s `pbtree`:
```{r phylo-object, echo=TRUE, eval=TRUE}
sampleTree <- pbtree(n = 20, nsim = 1)
plot(sampleTree)
```
## Handling and preparing phylogenies
Now that we know the main characteristics of a phylogenetic tree and of a `phylo` object, let's dive into more details on the structure of phylogenetic trees.
### Dicotomous and polytomous trees
While some phylogenetic analyses are able to handle polytomies (*i.e.* more than two OTUs originating from the same node in a tree; or an unresolved clade), one may be interesting in solving the polytomies to produce a binary tree.
```{r polytomy, echo=TRUE, eval=TRUE}
t1 <- read.tree(text = "((A,B,C),D);")
# See how A, B and C are within the same round brackets,
# *i.e.* departing from the same node.
plot(t1,
type = "cladogram")
```
One may check if a tree is binary by using `is.binary.tree` function and, in case it is `FALSE`, randomly resolve polytomies using the `multi2d` function:
```{r is-binary, echo=TRUE, eval=TRUE}
is.binary.tree (t1)
```
```{r multi2di-tree, echo=TRUE, eval=TRUE}
t2 <- multi2di(t1)
plot(t2,
type = "cladogram")
```
### Ultrametric, non-ultrametric and uninformative trees
*Ultrametric trees* (or approximates of them) can be used to infer both braching and temporal patterns of evolutionary history for a given clade. For a tree to be ultrametric, its matrix has to follow the following strong assumptions:
1. Each tip has to be uniquely labelled;
2. Each internal node of the tree diverges towards at least two "children", *i.e.* is dicotomous;
3. Along any path from the root to a tip (or leaf), the numbers labelling internal nodes strictly decrease, *i.e.* time must strictly increase along the path from the root;
4. The distance from the root to all tips of the tree is constant for all clades.
In many ultrametric trees, one can assume that all extant species are sampled at the present time, and branch lengths represent time calibrated in millions of years.
This information is obtained when we infer plausible history from data that reflects time since divergence (*i.e.* reconstruct evolutionary history based on the *molecular clock theory*).
When a tree violates one of the above assumptions (especially, number 4), a tree is then considered to be *non-ultrametric*. For instance, a non-ultrametric tree will have branches with different distances towards the tips. While this information reflects different rates of evolution across the phylogeny, many analyses will require an ultrametric tree (*i.e.* where models assume constant variance and equal means at the tips).
In these cases, one may be interested in "smoothing out" these differences across the evolutionary history to transform a non-ultrametric tree into a ultrametric tree.
One method is to use the *mean path length* (MPL) that assumes random mutation and a fixed molecular clock to calculate the age of a node as the mean of all the distances from this node to all tips diverging from it. To run this, we use the `ape` function `chronoMPL`. MPL sometimes returns negative branch lengths, meaning that it should be used with caution.
Another function we can use is `chronos`, also from the `ape` package. `chronos` uses the penalized maximum likelihood method to estimate divergences times.
Let's try these methods by first creating a random tree, then tansforming its edge lengths using both `chronoMPL` and `chronos` functions, and plotting them side-by-side:
```{r ultrametric-transformation, eval=TRUE, echo=TRUE, fig.height=6, fig.width=18, fig.align='center', message=TRUE}
tree.NonUlt <- rtree(7)
tree.Ult.MPL <- chronoMPL(tree.NonUlt)
tree.Ult.S <- chronos(tree.NonUlt)
par(mfrow=c(1,3))
plot(tree.NonUlt, edge.width = 2,
cex = 2,
main = "A) Non-ultrametric",
cex.main = 2)
plot(tree.Ult.MPL, edge.width = 2,
cex = 2,
main = "B) Ultrametric chronoMPL",
cex.main = 2)
plot(tree.Ult.S, edge.width = 2,
cex = 2,
main = "C) Ultrametric chronos",
cex.main = 2)
```
You may verify if any tree is ultrametric using `is.ultrametric`:
```{r ultrametric-test, echo=TRUE, eval=TRUE}
is.ultrametric(tree.NonUlt)
is.ultrametric(tree.Ult.MPL)
is.ultrametric(tree.Ult.S)
```
Sometimes, trees may have no branch lengths available, but only the divergence relationships. We often refer to these trees as being *non-informative* trees (despite the fact that they can still give us some information about branching events):
```{r non-informative-tree, echo=TRUE, eval=TRUE}
tree.NonInf <- rtree(7)
tree.NonInf$edge.length <- NULL
plot(tree.NonInf, edge.width = 2,
cex = 2,
main = "D) Uninformative tree",
cex.main = 2)
```
### Edge length and rate transformations
Very oten, we may also be interested in altering the edge lengths of a given tree for a number of reasons. We may be interested, for example, in chaning the edge distance from molecular distances to time. Alternatively, we may want to change edge lengths to meet specific evolutionary models - let's say, we may seek to assess whether a given ecological pattern might be explained by recent evolutionary changes [*i.e.* resembling a Pagel's $\delta$ time-dependent model, (Pagel 1999)], by branching events (*i.e.*, diversification rates; by changing Pagel's $\kappa$) or by the phylogenetic structure of a tree (by rescaling a tree using Pagel's $\lambda$).
We can use the function `rescale` to perform changes in the branches of a phylogenetic tree.
Pagel's $\delta$ time-dependent model rescales a tree by raising all node depths to an estimated power greater than 1 ($\delta$). A value greater than 1 means that recent evolution has been relatively fast; and if delta is less than 1, recent evolution has been comparatively slow.
We can try the $\delta$ transformation in a random ultrametric tree:
```{r rescaling-trees-delta, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
library(geiger)
tree.Ult1 <- rcoal(25)
tree.Ult1.Delta.01 <- rescale(tree.Ult1,
model = "delta", 0.1) # Setting power of 0.1
tree.Ult1.Delta.1 <- rescale(tree.Ult1,
model = "delta", 1) # with power 1; compare this tree with the original tree
tree.Ult1.Delta.10 <- rescale(tree.Ult1,
model = "delta", 10) # with power 10
# Representing all three trees
par(mfrow = c(1,3), cex.main = 2)
plot(tree.Ult1.Delta.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("A) ", delta," = 0.1")))
plot(tree.Ult1.Delta.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("B) ", delta," = 1")))
plot(tree.Ult1.Delta.10,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("C) ", delta," = 10")))
```
*Note how $\delta$ = 1 returns the original tree, $\delta$ < 1 extends the branch lengths of the edges, and $\delta$ > 1 shortens the edge lengths of the tree.*
Pagel's $\kappa$ punctuational (speciational) model of trait evolution focus on the number of speciation events between species to assess whether different rates of evolution are associated with branching events. This transformation raises all branch lengths to an estimated power.
```{r rescaling-trees-kappa, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
tree.Ult1.kappa.01 <- rescale(tree.Ult1,
model = "kappa", 0.1) # Kappa of 0.1
tree.Ult1.kappa.1 <- rescale(tree.Ult1,
model = "kappa", 1) # Kappa of 1
tree.Ult1.kappa.2 <- rescale(tree.Ult1,
model = "kappa", 2) # Kappa of 2
# Represent these trees
par(mfrow = c(1,3),
cex.main = 2)
plot(tree.Ult1.kappa.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("D) ", kappa," = 0.1")))
plot(tree.Ult1.kappa.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("E) ", kappa," = 1")))
plot(tree.Ult1.kappa.2,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("F) ", kappa," = 2")))
```
Finally, by rescaling a tree using Pagel's $\lambda$, one may change the structure of a phylogenetic tree to reduce the overall contribution of internal edges relative to the terminal edges of a given tree.
A Pagel's $\lambda$ of closer to 0 removes such structure from the tree, making it resemble a "star" phylogeny; while the closest $\lambda$ is to 1, the more the rescaled tree resembles the original. Let's try it:
```{r rescaling-trees-lambda, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
tree.Ult1.lambda.01 <- rescale(tree.Ult1, model = "lambda", 0.1)
tree.Ult1.lambda.05 <- rescale(tree.Ult1, model = "lambda", 0.5)
tree.Ult1.lambda.1 <- rescale(tree.Ult1, model = "lambda", 1)
# Represent the above trees
par(mfrow = c(1,3),
cex.main = 2)
plot(tree.Ult1.lambda.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("G) ", lambda," = 0.1")))
plot(tree.Ult1.lambda.05,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("H) ", lambda," = 0.5")))
plot(tree.Ult1.lambda.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("I) ", lambda," = 1")))
```
*The above examples are our opening door to start understanding three key features of evolutionary change: the tempo (δ), mode (κ) and the phylogenetic signal (λ) of the evolution (of a trait) (Pagel, 1994, 1999).*
You can compare all our transformations with the figure below:
```{r rescaling-trees-all, echo=FALSE, fig.height=18, fig.width=18}
# Set our figure facet grid
par(mfrow = c(3,3),
cex.main = 2)
# Pagel's delta
plot(tree.Ult1.Delta.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("A) ", delta," = 0.1")))
plot(tree.Ult1.Delta.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("B) ", delta," = 1")))
plot(tree.Ult1.Delta.10,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("C) ", delta," = 10")))
# Pagel's kappa
plot(tree.Ult1.kappa.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("D) ", kappa," = 0.1")))
plot(tree.Ult1.kappa.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("E) ", kappa," = 1")))
plot(tree.Ult1.kappa.2,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("F) ", kappa," = 2")))
# Pagel's lambda
plot(tree.Ult1.lambda.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("G) ", lambda," = 0.1")))
plot(tree.Ult1.lambda.05,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("H) ", lambda," = 0.5")))
plot(tree.Ult1.lambda.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("I) ", lambda," = 1")))
```
## Representing trees and other class objects
There are multiple ways for *representing trees* in R. You can access the help page of any R function we are using by running the function name preceeded by question mark (`?`), *e.g.* `?plot.phylo`.
```{r representingTrees, echo=TRUE}
treeVertebrates <- read.tree(text = "(((((((Cow, Pig), Whale),
(Bat,(Lemur, Human))),
(Robin, Iguana)), Coelacanth),
Gold fish), Shark);")
par(mfrow = c(2,2))
# Using the plot function, which is based on plot.phylo
plot(treeVertebrates, no.margin = TRUE,
edge.width = 2)
# Plotting with roundPhylogram
roundPhylogram(treeVertebrates)
# Removing the root of the tree
plot(unroot(treeVertebrates),
type="unrooted",
no.margin = TRUE, lab4ut = "axial",
edge.width=2)
# Posting a type fan tree
plotTree(treeVertebrates,
type="fan",
fsize=0.7,
lwd=1,
ftype="i")
```
One package that allows a fast and easily changeable visualization of phylogenetic trees is the `ggtree` package, which is based on the popular `ggplot2` package. To better work with `ggtree`, you should have a basic familiarity with the *Grammar of Graphics* and with `ggplot2`. I recommend the QCBS R Workshop material on [`ggplot2`](http://qcbs.ca/wiki/r_workshop3), which helped me acquire a general knowledge on this package.
You can access the documentation for `ggtree` [here](https://guangchuangyu.github.io/software/ggtree/).
We can use `ggtree`, for example, to attribute colors to different taxonomic groups (in this case, *genera* of bats).
```{r ChiropteraTree, echo=TRUE}
library(ggtree)
data(chiroptera, package="ape")
# Separate the genus names from species names
groupInfo <- split(chiroptera$tip.label,
gsub("_\\w+", "", chiroptera$tip.label))
head(groupInfo)
# Group species accordingly to their genera
chiroptera <- groupOTU(chiroptera,
groupInfo)
# Plot tree using ggtree
ggtree(chiroptera,
aes(color = group),
layout='circular') +
geom_tiplab(size = 1,
aes(angle = angle))
```
## The `picante` package
One of the core packages for hypothesis testing in ecophylogenetics is `picante` (Kembel et a. 2010). `picante` mainly works with three types of objects: the phylogeny data as a `phylo` class object; the community presence-absence (binary) or abundance matrix; and a species-trait matrix.
## The `pez` package
`pez` is a package that contains many novel and wrappers functions for phylogenetic analyses in community data (Pearse et al. 2015). It has useful cuntions that will help us keep the different datasets we will be working on matching.
## Other classes of objects
`phylo4d` and `phylo4` are recent classes developed based on the `phylo` class that seeks to standardize and unify the representation of comparative data and phylogenetic trees in `R`.
## Exercises
*We are short in time! I will add a few exercises here later.*
# Community Phylogenetic Patterns
Ecologists share the goal of understanding and describing the processes that create the uneven distribution patterns of species, which create the variation in abundance, identity and diversity we know today.
Among many factors that determine the presence of taxa in communities are the abiotic and biotic conditions of such communities (e.g., Diamond 1975; Westoby and Wright 2006; but see Hubbell 2001).
Since taxa that recently diverged tend to be ecologically similar (Darwin 1859; Lord et al. 1995; Wiens and Graham 2005), a direct link may exist between the evolutionary relatedness of organisms in a community, the characters they possess, and the ecological processes that determine their distribution and abundance.
In fact, many communities exhibit nonrandom patterns of evolutionary relatedness among co-occurring species (reviewed in Webb et al. 2002): a phenomenon we refer to as **phylogenetic community structure**.
## Preparing data for community phylogenetic analyses
For this analysis, we will use the data on the Jasper Ridge plant communities available in the book "**Phylogenies in Ecology**", written by Marc Cadotte and Jonathan Davies (2016).
```{r JasperData, echo=TRUE, eval = TRUE}
JasperPlants.tree <- read.tree("data/Jasper/resources/data/Jasper/jasper_tree.phy")
JasperPlants.comm <- read.csv("data/Jasper/resources/data/Jasper/jasper_data.csv",
row.names = 1)
```
Before proceeding to our analyses, we have to make sure that both the community and phylogenetic data match. For this, the names of the tips should match the species names in the dataset. This also means that species present in one dataset should not be absent in the other.
(*You thought you were done preparing your dataset, eh?*)
For this, we can use the function `drop.tip` from the `ape` package:
```{r JasperData-cleaning1, echo=TRUE, eval = TRUE}
JasperPlants.cleanTree <- drop.tip(phy = JasperPlants.tree,
tip = setdiff(JasperPlants.tree$tip.label,
colnames(JasperPlants.comm)))
```
You can also match your datasets using the function `match.phylo.comm` from `picante`, or the function `comparative.comm` from `pez`.
```{r JasperData-cleaning2, echo=TRUE, eval = TRUE}
# Using picante
JasperPlants.picCleanTree <- match.phylo.comm(phy = JasperPlants.tree,
comm = JasperPlants.comm)$phy
JasperPlants.picCleanComm <- match.phylo.comm(phy = JasperPlants.tree,
comm = JasperPlants.comm)$comm
# Do the same using pez
# JasperPlants.pezCleanComm <- comparative.comm(phy = JasperPlants.tree,
# comm = as.matrix(JasperPlants.comm))
```
```{r JasperData-plotting, echo=TRUE, eval = TRUE}
plot(JasperPlants.picCleanTree,
cex = 0.4)
JasperPlants.picCleanComm[1:4, 1:4]
```
## Phylogenetic $\alpha$ diversity (PD)
One simple way to measure evolutionary diversity within a community is through Daniel Faith's PD metric (Faith, 1992; http://goo.gl/wM08Oy).
PD sums the branch lenghts of all co-occurring species in a given site, from the tips to the root of the phylogenetic tree.
Higher PD values are given to communities that has more evolutionary divergent taxa and older history, while lower PD values represent assemblages that have taxa with more recent evolutionar history.
Faith's PD can be implemented using the `pd` function in the `picante` package.
In addition to Faith's PD, the `pd` function also returns species richness (SR).
SR is the same as observed richness ($S_{obs}$) or $\alpha$ diversity.
```{r JasperData-PD, echo=TRUE, eval = TRUE}
Jasper.PD <- pd(JasperPlants.picCleanComm, JasperPlants.picCleanTree,
include.root = FALSE)
head(Jasper.PD) # See the first six rows of the resulting matrix
```
And how PD estimates compares with SR?
```{r JasperData-PD-cor, echo=TRUE, eval = TRUE}
cor.test(Jasper.PD$PD, Jasper.PD$SR)
plot(Jasper.PD$PD,
Jasper.PD$SR,
xlab = "Phylogenetic Diversity", ylab = "Species Richness",
pch = 16)
```
## Randomizations and null models
We are often interested in assessing whether or not observed patterns differ from null expectation. `picante` has many functions that allow us to test hypotheses under different types of *null models*:
```{r, results = "asis", echo = FALSE, message = FALSE}
tex2markdown <- function(texstring) {
writeLines(text = texstring,
con = myfile <- tempfile(fileext = ".tex"))
texfile <- pandoc(input = myfile, format = "html")
cat(readLines(texfile), sep = "\n")
unlink(c(myfile, texfile))
}
textable <- "
\\begin{center}
\\begin{tabular}{ m{5cm} m{10cm} }
\\textbf{Null Model} & \\textbf{Description} \\\\
\\hline \\hline \\\\
\\textbf{taxa.labels} &
Shuffles taxa labels across tips of phylogeny (across all taxa included in phylogeny) \\\\
\\\\
\\textbf{richness} &
Randomizes community data matrix abundances within samples (maintains sample species richness) \\\\
\\\\
\\textbf{frequency} &
Randomizes community data matrix abundances within species (maintains species occurence frequency) \\\\
\\\\
\\textbf{sample.pool} &
Randomizes community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability \\\\
\\\\
\\textbf{phylogeny.pool} &
Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability \\\\
\\\\
\\textbf{independentswap} &
Randomizes community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness \\\\
\\\\
\\textbf{trialswap} &
Randomizes community data matrix with the trial-swap algorithm (Miklos \\& Podani 2004) maintaining species occurrence frequency and sample species richness \\\\
\\\\
\\hline
\\end{tabular}
\\end{center}"
tex2markdown(textable)
```
We will start using null models to calculate the *standardized effect size* (*ses*) of PD for each community. For this, we can use the `ses.pd` function in `picante`:
```{r JasperData-SESPD, echo=TRUE, eval = TRUE}
# This is a computationally intensive process and take some time to run
Jasper.ses.pd <- ses.pd(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree,
null.model = "taxa.labels",
runs = 1000)
head(Jasper.ses.pd)
```
## Phylogenetic community structure
Another way of looking into the phylogenetic structure of communities is to quantify at what degree closely related taxa co-occur.
The basis for patterns of phylogenetic community structure can lie in a simple framework (table 1; Webb et al. 2002): when species traits responsible for their physiological tolerances are conserved, an environmental filtering that limits the range of viable ecological strategies at a given site is expected to select co-occurring species more related than expected by chance, *i.e.* generate a pattern known as *phylogenetic clustering*.
On the other hand, competitive exclusion can limit the ecological similarity of co-occurring species, generating a pattern of *phylogenetic overdispersion* or *phylogenetic evenness*.
Finally, when traits are diverging faster across the evolutionary time, the effects of habitat filtering should be weaker, producing evenly dispersed patterns of relatedness; while competition or limiting similarity is expected to produce random or clustered patterns.
If communities are assembled independently with respect to traits (e.g., Hubbell 2001), then patterns of relatedness should be resemble random expectation.
Over the recent years, other processes were shown to drive phylogenetic patterns of community structure, such as, isolation, time since isolation, diversification, scale-dependent assembly and local and regional composition and richness.
```{r}
library(knitr)
library(kableExtra)
text_tbl <- data.frame(
"Assembly process"= c("Limiting similarity",
"Habitat filtering",
"Neutral assembly"),
"Under niche conservatism" = c("Even dispersion",
"Clustered dispersion",
"Random dispersion"),
"Under niche divergence"= c("Random or clustered dispersion",
"Even dispersion",
"Random dispersion"))
kable(text_tbl, "html", align = "c") %>%
kable_styling(full_width = F)
```
### Phylogenetic resemblance matrix
An important step prior to the estimation of the phylogenetic structure of communities is to create a *phylogenetic resemblance matrix*, or a *phylogenetic variance-covariance matrix*. A *phylogenetic variance-covariance matrix* is nothing more than the calculated distances between taxa in a tree.
In R, we commonly refer to this matrix as a matrix of *cophenetic distances among taxa* and we can use the function `cophenetic.phylo` from `picante` to calculate it:
```{r CopheneticDist, echo=TRUE}
# Create a Phylogenetic Distance Matrix {picante}
JasperPlants.cophenDist <- cophenetic.phylo(JasperPlants.picCleanTree)
```
## Net relatedness index (NRI)
One common metric to evaluate the phylogenetic structure of commmunities is the **net relatedness index** (**NRI**), which is based on the **mean phylogenetic distances** (**MPD**) calculated from the pairwise cophenetic branch lengths distances of the co-occurring taxa in each community.
NRI is obtained by multiplying the *standardized effect size mean phylogenetic distances* (calculated via a null model of interest) by $-1$.
Negative NRI values indicate *phylogenetic overdispersion*, *i.e.* co-occurring taxa is less closely related than expected by a given null model; while positive NRI values represent *phylogenetic clustering*, *i.e.* co-occurring taxa are more closely related than expected by a null model.
In R, we will use the function `ses.mpd` from `picante` to calculate the *NRI* of our communities:
```{r sesMPD, echo=TRUE}
# Estimate Standardized Effect Size of the MPD on the cophenetic matrix
JasperPlants.ses.mpd <- ses.mpd(JasperPlants.picCleanComm,
JasperPlants.cophenDist,
null.model = "taxa.labels",
abundance.weighted = FALSE,
runs = 100)
head(JasperPlants.ses.mpd)
# Calculate NRI
JasperPlants.NRI <- as.matrix(-1 *
((JasperPlants.ses.mpd[,2] - JasperPlants.ses.mpd[,3]) /
JasperPlants.ses.mpd[,4]))
rownames(JasperPlants.NRI) <- row.names(JasperPlants.ses.mpd)
colnames(JasperPlants.NRI) <- "NRI"
head(JasperPlants.NRI)
```
## Nearest taxon index (NTI)
Another way to assess the phylogenetic structure of communities is through the calculation of the **nearest taxon index** (**NTI**). NTI differs from NRI by being based on the **mean nearest neighbour phylogenetic distance** (**MNTD**), instead of the overall mean pairwise distances (**MPD**).
MNTD is obtained by calculating the mean phylogenetic distance of each taxon of a given community to its nearest neighbour in a tree. By doing this, one can observe that NTI is more sensitive to the clustering or overdispersion patterns happening closer to the tips of a tree, while NRI represents an overall pattern of structuring across the whole phylogenetic tree.
As in NRI, higher values of NRI indicate phylogenetic clustering and lower values of NRI indicate phylogenetic overdispersion.
```{r sesMNTD, echo=TRUE}
# Estimate Standardized Effect Size of the MNTD on the cophenetic matrix
JasperPlants.ses.mntd <- ses.mntd(JasperPlants.picCleanComm,
JasperPlants.cophenDist,
null.model = "taxa.labels",
abundance.weighted = FALSE,
runs = 100)
head(JasperPlants.ses.mntd)
# Calculate NTI
JasperPlants.NTI <- as.matrix(-1 *
((JasperPlants.ses.mntd[,2] - JasperPlants.ses.mntd[,3]) /
JasperPlants.ses.mntd[,4]))
rownames(JasperPlants.NTI) <- row.names(JasperPlants.ses.mntd)
colnames(JasperPlants.NTI) <- "NTI"
head(JasperPlants.NTI)
```
## Phylogenetic beta diversity (PBD)
One may not be interested only in the patterns of $\alpha$ diversity, but in understanding how communities are evolutionary distinct from each other, *i.e.* knowing the pairwise phylogenetic $\beta$ diversity.
Many metrics are available to assess the phylogenetic $\beta$ diversity of communities. Here, we will do this by calculating the *PhyloSor* index, which is the phylogenetic analog of the Sørensen index (Bryant et al., 2008; Swenson, 2011; Feng et.al., 2012). *PhyloSor* can be calculated using the function `phylosor` from `picante`:
```{r JasperData-PhyloSor, echo=TRUE, eval = TRUE}
Jasper.pbd <- phylosor(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree)
as.matrix(Jasper.pbd)[1:7,1:7]
```
We can go one step further and test whether the observed PBD values differ from null expectation under different *null models*. First, we will use the function `phylosor.rnd` to obtain PhyloSor values of phylogenetic beta-diversity obtained by randomization:
```{r JasperData-PhyloSorRnd, echo=TRUE, eval = TRUE}
Jasper.pbd.rndTaxa <- phylosor.rnd(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree,
cstSor = TRUE,
null.model = "taxa.labels", # Pay attention to the null models!
runs = 100)
```
Apparently, the a *standardized effect size* function is not available within `picante`. Thus, we will create our own null model to test our hypothesis:
```{r JasperData-PhyloSorSES, echo=TRUE, message=TRUE, warning=TRUE, paged.print=TRUE}
# I first ran ses.pd to build this in a more-or-less similar format
ses.phylosor <- function(obs, rand){
rand <- t(as.data.frame(lapply(rand, as.vector)))
phySor.obs <- as.numeric(obs)
phySor.mean <- apply(rand, MARGIN = 2,
FUN = mean,
na.rm = TRUE)
phySor.sd <- apply(rand,
MARGIN = 2,
FUN = sd,
na.rm = TRUE)
phySor.ses <- (phySor.obs - phySor.mean)/phySor.sd
phySor.obs.rank <- apply(X = rbind(phySor.obs, rand),
MARGIN = 2,
FUN = rank)[1, ]
phySor.obs.rank <- ifelse(is.na(phySor.mean), NA, phySor.obs.rank)
data.frame(phySor.obs,
phySor.mean,
phySor.sd,
phySor.obs.rank,
phySor.ses,
phySor.obs.p = phySor.obs.rank/(dim(rand)[1] + 1))
}
JasperPlants.ses.pbd <- ses.phylosor(Jasper.pbd,
Jasper.pbd.rndTaxa)
# Project the table using knitr::kable
round(JasperPlants.ses.pbd[45:55,], 3) %>%
kable("html") %>%
kable_styling()
```
## Decomposing phylogenetic beta diversity
Our questions about phylogenetic can become even more complex!
One may also be interested in decomposing the turnover and nestedness components of beta diversity (Baselga, 2010; Leprieur et al., 2012).
Baselga (2010) proposed an additive partitioning framework that consists in decomposing the pair-wise Sørensen's dissimilarity index into two additive components accounting for (i) ‘true’ turnover of species and (ii) richness differences between nested communities. In 2012, Leprieur et al. (2012) extended Baselga's (2010) framework to decompose PhyloSor into two components accounting for ‘true’ phylogenetic turnover and phylogenetic diversity gradients, respectively.
*Still not covered in this tutorial!*
# Trait Evolution
One central concept in ecophylogenetics is the non‐independence among species traits because of their phylogenetic relatedness (Felsenstein 1985; Revell, Harmon & Collar 2008).
For this part of the workshop, we will use our newly acquired knowledge in the *Edge length and rate transformations* section about **Pagel's $\lambda$**, **$\kappa$** and **$\delta$**. We will also see about alternative evolutionary models and learn how to test whether **phylogenetic signal** is present in species traits, as well as, **fit traits to different models of evolution**.
## Dataset preparation
We are now introducing a third-type of data in our phylogenetic analyses: traits. Here we will use a different dataset.
For this part of the course, we will fetch our dataset from papers using a very useful R package: `fulltext`. `fulltext` is maintained by Scott Chamberlain, Will Pearse and Katrin Leinweber, and is able to get files directly from the supplementary material of published papers, with only the DOI references!
```{r willPearseFiles, echo=TRUE, message=TRUE}
# The idea of using this dataset was borrowed from Will Pearse's minicourse on ecophylogenetics.
# Nevertheless, we will use it for further analyses here!
library('fulltext')
MCDB.comm <- read.csv(ft_get_si("E092-201", "MCDB_communities.csv", "esa_data_archives"))
MCDB.species <- read.csv(ft_get_si("E092-201", "MCDB_species.csv", "esa_data_archives"))
Mammalian.tree <- read.nexus("data/ele_1307_sm_sa1.tre")[[1]] # fulltext is not fetching files from Wiley
PanTHERIA.traits <- read.delim(ft_get_si("E090-184", "PanTHERIA_1-0_WR05_Aug2008.txt", "esa_archives"))
```
*What have we just done?*
In `MCDB.comm`, we have loaded 7977 records of species composition and abundance of mammalian communities (excluding bats) for 1000 sites throughout the world, which was compiled from published literature.
`MCDB.species` contains 700 records that includes the name of species and their taxonomic information.
This data is part of the ecological archive of the paper by Thibault et al. (2011)
(*Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.*)
`Mammalian.tree` loaded the first tree described in the published paper by Fritz et al. 2009
(*Fritz, S. A., Bininda-Emonds, O. R. P. and Purvis, A. (2009), Geographical variation in predictors of mammalian extinction risk: big is bad, but only in the tropics. Ecology Letters 2(6):538–549. doi: 10.1111/J.1461-0248.2009.01307*)
And, finally, `PanTHERIA.traits` contain a species-level data set of life history, ecology, and geography of extant and recently extinct mammals. The metadata for this dataset can be consulted in its [website](http://esapubs.org/archive/ecol/e090/184/metadata.htm).
Now that we know about our dataset, we can explore each one of the loaded objects:
```{r exploreFiles, echo=TRUE, message=TRUE}
head(MCDB.comm)
head(MCDB.species)
str(Mammalian.tree)
str(PanTHERIA.traits)
```
Now, let's *treat* our dataset:
(*Yes, we spend a lot of time making our dataset ready for analysis!*)
```{r dataSetComm, echo=TRUE, message=TRUE}
# Starting with the community file:
# Let's make a species-site matrix
MCDB.comm$Abundance <- as.numeric(as.character(MCDB.comm$Abundance))
head(MCDB.comm$Abundance)
```
```{r dataSetSpecies, echo=TRUE, message=TRUE}
# Add a new column with real species names, extracted and matched from the MCDB.species dataset