PhyloCompMethodsMaterial.Rmd

---
title: "`Phylogenetic Comparative Methods` in R"
author: "Pedro Henrique Pereira Braga"
abstract: "MSc. in Ecology and Evolution; PhD Candidate at Concordia University, Montreal" 
date: "`r Sys.Date()`"
email: "ph.pereirabraga@gmail.com"
output:
  rmdformats::html_clean:
    highlight: kate
    thumbnails: true
    lightbox: true
    gallery: true
    fig_caption: TRUE
---


```{r knitr_init, echo=FALSE, results="asis", cache=FALSE}
library(knitr)
library(rmdformats)

## Global options
options(max.print = "75")
opts_chunk$set(echo = FALSE,
	             cache = TRUE,
               prompt = FALSE,
               tidy = FALSE,
               comment = NA,
               message = FALSE,
               warning = FALSE,
               strip.white = TRUE,
               fig.align  = 'center',
               fig.width = 9,
               fig.height = 9)
opts_knit$set(width = 75)
```

# What is this?

This is a short course/tutorial I developed to help researchers cover some of the families of phylogenetic comparative analyses of trait evolution and correlation, diversification rates, as well as community structure. My objective is to provide a short background in statistical and comparative phylogenetic methods, to further allow researchers to explore their research questions on their own.

A big part of the inspiration for this workshop came from the book *Phylogenies in Ecology*, by *Marc W. Cadotte* and *Jonathan Davies*, as well as from courses and workshops on ecophylogenetics taught by Dr. *Will Pearse*, Dr. *Jonathan Davies* and Dr. *Steven Kembel*. 

I am progressively adding more material to this document. You can access all topics addressed here in the outline on the right side of this page.

If you would like to provide feedback on this material, do not hesitate to get in touch!  

I will highly appreciate your comments!

Kind regards,

Pedro Henrique P. Braga 


**pedrohbraga.github.io**

**ph.pereirabraga [at] gmail [dot] com**.


## Preparation for this tutorial

I kindly ask you to run the following code to download the datasets, and install and load the libraries we will use through this course:

```{r dataset-download, eval=FALSE, echo=TRUE, message=FALSE, warning=FALSE, paged.print=FALSE}
# Check for needed packages, and install the missing ones
required.libraries <- c("ape", "picante", "pez", "phytools",
                        "vegan", "adephylo", "phylobase",
                        "geiger", "mvMORPH", "OUwie", 
                        "hisse", "BAMMtools",
                        "phylosignal", "Biostrings",
                        "devtools",
                        "ggplot2",
                        "fulltext")

needed.libraries <- required.libraries[!(required.libraries %in% installed.packages()[,"Package"])]

if(length(needed.libraries)) install.packages(needed.libraries)

# Load all required libraries at once
lapply(required.libraries, require, character.only = TRUE)

### Install missing libraries

source("https://bioconductor.org/biocLite.R")
biocLite("ggtree")

###

set.seed(1)

# Download files from Marc Cadotte's and Jonathan Davies' book:

temp <- tempfile()
download.file("http://www.utsc.utoronto.ca/~mcadotte/ewExternalFiles/resources.zip", temp)
dir.create("data/Jasper")
JasperData <- c("jasper_tree.phy", "jasper_data.csv")
lapply(JasperData, 
       FUN = function(x) 
         unzip(zipfile = temp, 
               files = paste("resources/data/Jasper/", x, sep = ""), 
               exdir = "data/Jasper"))

# Download the tree files from this paper

download.file("https://onlinelibrary.wiley.com/action/
downloadSupplement?doi=10.1111%2Fj.1461-0248.2009.01307.x&attachmentId=179085359", 
              destfile = "data/ele_1307_sm_sa1.tre")

# If the download of the above file does not work, 
# download the first supplementary material from the following paper
# and place it within your 'data' directory:

# https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2009.01307.x
```

# Why are phylogenies important?

Since many decades we recognize that many ecolgical patterns are processes are not independent of the evolution of the lineages involved in generating these patterns.

Here, we are taking it seriously what Theodosius Dobzhansky said in 1973: *“nothing in biology makes sense except in the light of evolution”*!

# A brief introduction to phylogenies

**Phylogenetic trees** (*i.e.*, **evolutionary tree** or **cladogram**) are branching diagrams illustrating the evolutionary relationships among taxa. 

Phylogenetic trees can be constructed based on species’ genes and/or on traits. Phylogenetic reconstruction through analyses of molecular sequences usually consists of three distinct procedures: (i) sequence alignment; (ii) character coding; and (iii) tree building. There is a range of available options for each of these steps, with no simple objective method for choosing between them. 

 <center>![The root of a phylogenetic tree indicates that an ancestral lineage gave rise to all organisms on the tree. A branch point indicates where two lineages diverged. A lineage that evolved early and remains unbranched is a basal taxon. When two lineages stem from the same branch point, they are sister taxa. A branch with more than two lineages is a polytomy. Extracted from: https://courses.lumenlearning.com/wm-biology1/chapter/reading-structure-of-phylogenetic-trees/ \label{figurelabel}](figures/Figure_20_01_02.jpg) </center>

In this course, we will not cover the steps necessary to reconstruct phylogenies. We will thus assume that a phylogenetic relationship is already available for the taxa of interest.

## `ape` and the `phylo` object in R

The core of phylogenetics analyses in R is the library `ape` - an acronym that stands for **A**nalysis of **P**hylogenetics and **E**volution. You can run the following commands to install, load and call help for `ape`:

```{r ape, echo=TRUE, eval=TRUE}
# install.packages("ape")

library(ape)

# help(package = ape)
```

From `ape`, we depict two important *functions* to read and load trees into `R`: `read.nexus` and `read.tree`: 

`read.nexus` reads NEXUS formatted trees, which hasis organized into major units known as *blocks* and usually have the following basic structure:

```{}
#NEXUS
[!This is the data file used for my dissertation]
BEGIN TAXA;
    TaxLabels Scarabaeus Drosophila Aranaeus;
END;

BEGIN TREES;
    Translate beetle Scarabaeus, fly Drosophila, spider Aranaeus;
    Tree tree1 = ((1,2),3);
    Tree tree2 = ((beetle,fly),spider);
    Tree tree3 = ((Scarabaeus,Drosophila),Aranaeus);
END;
```

`read.tree` reads Newick formatted trees, which commonly have the following format:

```{}
(Bovine:0.69395,(Hylobates:0.36079,(Pongo:0.33636,(G._Gorilla:0.17147, (P._paniscus:0.19268,H._sapiens:0.11927):0.08386):0.06124):0.15057):0.54939, Rodent:1.21460)
```

When phylogenies are created, loaded and handled in `ape`, their objects are represented as a list of *class* `phylo`. Let's try reading the following text string of a tree (*is it in Newick or in NEXUS format?*) and see what `R` tells us when we call it:  

```{r phylo-primates, echo=TRUE, eval=TRUE}
tree.primates <- read.tree(text="((((Homo:0.21, Pongo:0.21):0.28, Macaca:0.49):0.13, 
                           Ateles:0.62):0.38, Galago:1.00);")

tree.primates
```

A class `phylo` object distributes the information of phylogenetic trees into six main components:

```{r phylo-object-str, echo=TRUE, eval=TRUE}
str(tree.primates)
```

Since `phylo` is a list, we can access its elements by using the `$` symbol. We can access the *matrix* `edge` by running `phylo_object$edge`, which hasthe beginning and ending node number for all the nodes and tips of a given tree. In summary, this allows us to keep track of the internal and external nodes of the tree:

```{r phylo-object-edge, echo=TRUE, eval=TRUE}
tree.primates$edge
```

The *vector* `tip.label` element contains all the labels for all tips in the tree (*i.e.* the names of your OTUs):

```{r phylo-object-tiplabel, echo=TRUE, eval=TRUE}
tree.primates$tip.label
```

There is a little error in the tip labels, right? We can easily rewrite the labels with this:
```{r phylo-object-tiplabel-2, echo=TRUE, eval=TRUE}
tree.primates$tip.label <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")
```

Another important element is the vector `edge.length`, which indicates the length of each edge with relation to the root or base of the tree: 

```{r phylo-object-edges, echo=TRUE, eval=TRUE}
tree.primates$edge.length
```

Finally, `Nnode` contains the number of internal nodes in the tree:

```{r phylo-object-Nnode, echo=TRUE, eval=TRUE}
tree.primates$Nnode
```

One can see all the important information a `phylo` object carries from a tree by plotting it with all its elements:

```{r phylo-object-plotting, echo=TRUE, eval=TRUE}
plot(tree.primates, 
     edge.width = 2, 
     label.offset = 0.05, 
     type = "cladogram")

nodelabels()
tiplabels()
add.scale.bar()
```

There are other classes that store trees in R, such as `phylo4` and `phylo4d` objects. We will talk about them later in the course.

Sometimes, you may be interested in simulating trees. There are various ways of creating trees through simulations in `R`. Let's take a look at `phytools`'s `pbtree`:  

```{r phylo-object, echo=TRUE, eval=TRUE}
sampleTree <- pbtree(n = 20, nsim = 1)

plot(sampleTree)
```

## Handling and preparing phylogenies

Now that we know the main characteristics of a phylogenetic tree and of a `phylo` object, let's dive into more details on the structure of phylogenetic trees.

### Dicotomous and polytomous trees

While some phylogenetic analyses are able to handle polytomies (*i.e.* more than two OTUs originating from the same node in a tree; or an unresolved clade), one may be interesting in solving the polytomies to produce a binary tree.

```{r polytomy, echo=TRUE, eval=TRUE}
t1 <- read.tree(text = "((A,B,C),D);") 
# See how A, B and C are within the same round brackets, 
# *i.e.* departing from the same node.

plot(t1, 
     type = "cladogram")
```

One may check if a tree is binary by using `is.binary.tree` function and, in case it is `FALSE`, randomly resolve polytomies using the `multi2d` function:

```{r is-binary, echo=TRUE, eval=TRUE}
is.binary.tree (t1)
```

```{r multi2di-tree, echo=TRUE, eval=TRUE}
t2 <- multi2di(t1)

plot(t2, 
     type = "cladogram")
```

### Ultrametric, non-ultrametric and uninformative trees

*Ultrametric trees* (or approximates of them) can be used to infer both braching and temporal patterns of evolutionary history for a given clade. For a tree to be ultrametric, its matrix has to follow the following strong assumptions:

1. Each tip has to be uniquely labelled;
2. Each internal node of the tree diverges towards at least two "children", *i.e.* is dicotomous;
3. Along any path from the root to a tip (or leaf), the numbers labelling internal nodes strictly decrease, *i.e.* time must strictly increase along the path from the root;
4. The distance from the root to all tips of the tree is constant for all clades.

In many ultrametric trees, one can assume that all extant species are sampled at the present time, and branch lengths represent time calibrated in millions of years.

This information is obtained when we infer plausible history from data that reflects time since divergence (*i.e.* reconstruct evolutionary history based on the *molecular clock theory*).

When a tree violates one of the above assumptions (especially, number 4), a tree is then considered to be *non-ultrametric*. For instance, a non-ultrametric tree will have branches with different distances towards the tips. While this information reflects different rates of evolution across the phylogeny, many analyses will require an ultrametric tree (*i.e.* where models assume constant variance and equal means at the tips).

In these cases, one may be interested in "smoothing out" these differences across the evolutionary history to transform a non-ultrametric tree into a ultrametric tree. 

One method is to use the *mean path length* (MPL) that assumes random mutation and a fixed molecular clock to calculate the age of a node as the mean of all the distances from this node to all tips diverging from it. To run this, we use the `ape` function `chronoMPL`. MPL sometimes returns negative branch lengths, meaning that it should be used with caution.

Another function we can use is `chronos`, also from the `ape` package. `chronos` uses the penalized maximum likelihood method to estimate divergences times. 

Let's try these methods by first creating a random tree, then tansforming its edge lengths using both `chronoMPL` and `chronos` functions, and plotting them side-by-side:

```{r ultrametric-transformation, eval=TRUE, echo=TRUE, fig.height=6, fig.width=18, fig.align='center', message=TRUE}
tree.NonUlt <- rtree(7)
tree.Ult.MPL <- chronoMPL(tree.NonUlt)
tree.Ult.S <- chronos(tree.NonUlt)

par(mfrow=c(1,3))
plot(tree.NonUlt, edge.width = 2,
     cex = 2,
     main = "A) Non-ultrametric",
     cex.main = 2)

plot(tree.Ult.MPL, edge.width = 2,
     cex = 2,
     main = "B) Ultrametric chronoMPL",
     cex.main = 2)

plot(tree.Ult.S, edge.width = 2,
     cex = 2,
     main = "C) Ultrametric chronos",
     cex.main = 2)
```

You may verify if any tree is ultrametric using `is.ultrametric`:

```{r ultrametric-test, echo=TRUE, eval=TRUE}
is.ultrametric(tree.NonUlt) 
is.ultrametric(tree.Ult.MPL) 
is.ultrametric(tree.Ult.S)
```

Sometimes, trees may have no branch lengths available, but only the divergence relationships. We often refer to these trees as being *non-informative* trees (despite the fact that they can still give us some information about branching events):

```{r non-informative-tree, echo=TRUE, eval=TRUE}
tree.NonInf <- rtree(7)
tree.NonInf$edge.length <- NULL

plot(tree.NonInf, edge.width = 2,
     cex = 2,
     main = "D) Uninformative tree",
     cex.main = 2)
```


### Edge length and rate transformations

Very oten, we may also be interested in altering the edge lengths of a given tree for a number of reasons. We may be interested, for example, in chaning the edge distance from molecular distances to time. Alternatively, we may want to change edge lengths to meet specific evolutionary models - let's say, we may seek to assess whether a given ecological pattern might be explained by recent evolutionary changes [*i.e.* resembling a Pagel's $\delta$ time-dependent model, (Pagel 1999)], by branching events (*i.e.*, diversification rates; by changing Pagel's $\kappa$) or by the phylogenetic structure of a tree (by rescaling a tree using Pagel's $\lambda$).

We can use the function `rescale` to perform changes in the branches of a phylogenetic tree.

Pagel's $\delta$ time-dependent model rescales a tree by raising all node depths to an estimated power greater than 1 ($\delta$). A value greater than 1 means that recent evolution has been relatively fast; and if delta is less than 1, recent evolution has been comparatively slow.

We can try the $\delta$ transformation in a random ultrametric tree:

```{r rescaling-trees-delta, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
library(geiger)

tree.Ult1 <- rcoal(25)
tree.Ult1.Delta.01 <- rescale(tree.Ult1, 
                              model = "delta", 0.1) # Setting power of 0.1
tree.Ult1.Delta.1 <- rescale(tree.Ult1, 
                             model = "delta", 1) # with power 1; compare this tree with the original tree
tree.Ult1.Delta.10 <- rescale(tree.Ult1, 
                              model = "delta", 10) # with power 10

# Representing all three trees

par(mfrow = c(1,3), cex.main = 2)

plot(tree.Ult1.Delta.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("A) ", delta," = 0.1")))

plot(tree.Ult1.Delta.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("B) ", delta," = 1")))

plot(tree.Ult1.Delta.10, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("C) ", delta," = 10")))
```

*Note how $\delta$ = 1 returns the original tree, $\delta$ < 1 extends the branch lengths of the edges, and $\delta$ > 1 shortens the edge lengths of the tree.*


Pagel's $\kappa$ punctuational (speciational) model of trait evolution focus on the number of speciation events between species to assess whether different rates of evolution are associated with branching events. This transformation raises all branch lengths to an estimated power.

```{r rescaling-trees-kappa, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
tree.Ult1.kappa.01 <- rescale(tree.Ult1, 
                              model = "kappa", 0.1) # Kappa of 0.1
tree.Ult1.kappa.1 <- rescale(tree.Ult1, 
                             model = "kappa", 1) # Kappa of 1
tree.Ult1.kappa.2 <- rescale(tree.Ult1, 
                             model = "kappa", 2) # Kappa of 2

# Represent these trees

par(mfrow = c(1,3), 
    cex.main = 2)

plot(tree.Ult1.kappa.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("D) ", kappa," = 0.1")))

plot(tree.Ult1.kappa.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("E) ", kappa," = 1")))

plot(tree.Ult1.kappa.2, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("F) ", kappa," = 2")))
```


Finally, by rescaling a tree using Pagel's $\lambda$, one may change the structure of a phylogenetic tree to reduce the overall contribution of internal edges relative to the terminal edges of a given tree. 

A Pagel's $\lambda$ of closer to 0 removes such structure from the tree, making it resemble a "star" phylogeny; while the closest $\lambda$ is to 1, the more the rescaled tree resembles the original. Let's try it:

```{r rescaling-trees-lambda, echo=TRUE, eval=TRUE, fig.width = 18, fig.height = 6}
tree.Ult1.lambda.01 <- rescale(tree.Ult1, model = "lambda", 0.1)
tree.Ult1.lambda.05 <- rescale(tree.Ult1, model = "lambda", 0.5)
tree.Ult1.lambda.1 <- rescale(tree.Ult1, model = "lambda", 1)

# Represent the above trees

par(mfrow = c(1,3), 
    cex.main = 2)

plot(tree.Ult1.lambda.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("G) ", lambda," = 0.1")))

plot(tree.Ult1.lambda.05, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("H) ", lambda," = 0.5")))

plot(tree.Ult1.lambda.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("I) ", lambda," = 1")))
```


*The above examples are our opening door to start understanding three key features of evolutionary change: the tempo (δ), mode (κ) and the phylogenetic signal (λ) of the evolution (of a trait) (Pagel, 1994, 1999).*

You can compare all our transformations with the figure below:

```{r rescaling-trees-all, echo=FALSE, fig.height=18, fig.width=18}
# Set our figure facet grid

par(mfrow = c(3,3), 
    cex.main = 2)

# Pagel's delta

plot(tree.Ult1.Delta.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("A) ", delta," = 0.1")))

plot(tree.Ult1.Delta.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("B) ", delta," = 1")))

plot(tree.Ult1.Delta.10, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("C) ", delta," = 10")))

# Pagel's kappa

plot(tree.Ult1.kappa.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("D) ", kappa," = 0.1")))

plot(tree.Ult1.kappa.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("E) ", kappa," = 1")))

plot(tree.Ult1.kappa.2, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("F) ", kappa," = 2")))

# Pagel's lambda

plot(tree.Ult1.lambda.01, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("G) ", lambda," = 0.1")))

plot(tree.Ult1.lambda.05, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("H) ", lambda," = 0.5")))

plot(tree.Ult1.lambda.1, 
     show.tip.label = FALSE,
     edge.width = 2,
     cex = 2,
     main = expression(paste("I) ", lambda," = 1")))
```


## Representing trees and other class objects

There are multiple ways for *representing trees* in R. You can access the help page of any R function we are using by running the function name preceeded by question mark (`?`), *e.g.* `?plot.phylo`.  

```{r representingTrees, echo=TRUE}
treeVertebrates <- read.tree(text = "(((((((Cow, Pig), Whale), 
                             (Bat,(Lemur, Human))), 
                             (Robin, Iguana)), Coelacanth), 
                             Gold fish), Shark);")

par(mfrow = c(2,2))

# Using the plot function, which is based on plot.phylo
plot(treeVertebrates, no.margin = TRUE, 
     edge.width = 2)

# Plotting with roundPhylogram
roundPhylogram(treeVertebrates)

# Removing the root of the tree
plot(unroot(treeVertebrates), 
     type="unrooted", 
     no.margin = TRUE, lab4ut = "axial", 
     edge.width=2)

# Posting a type fan tree
plotTree(treeVertebrates, 
         type="fan", 
         fsize=0.7, 
         lwd=1,
    ftype="i")
```

One package that allows a fast and easily changeable visualization of phylogenetic trees is the `ggtree` package, which is based on the popular `ggplot2` package. To better work with `ggtree`, you should have a basic familiarity with the *Grammar of Graphics* and with `ggplot2`. I recommend the QCBS R Workshop material on [`ggplot2`](http://qcbs.ca/wiki/r_workshop3), which helped me acquire a general knowledge on this package.

You can access the documentation for `ggtree` [here](https://guangchuangyu.github.io/software/ggtree/).

We can use `ggtree`, for example, to attribute colors to different taxonomic groups (in this case, *genera* of bats).

```{r ChiropteraTree, echo=TRUE}
library(ggtree)

data(chiroptera, package="ape")

# Separate the genus names from species names
groupInfo <- split(chiroptera$tip.label, 
                   gsub("_\\w+", "", chiroptera$tip.label))

head(groupInfo)

# Group species accordingly to their genera 
chiroptera <- groupOTU(chiroptera, 
                       groupInfo)

# Plot tree using ggtree
ggtree(chiroptera, 
       aes(color = group), 
       layout='circular') + 
  geom_tiplab(size = 1, 
              aes(angle = angle))
```

## The `picante` package

One of the core packages for hypothesis testing in ecophylogenetics is `picante` (Kembel et a. 2010). `picante` mainly works with three types of objects: the phylogeny data as a `phylo` class object; the community presence-absence (binary) or abundance matrix; and a species-trait matrix.

## The `pez` package

`pez` is a package that contains many novel and wrappers functions for phylogenetic analyses in community data (Pearse et al. 2015). It has useful cuntions that will help us keep the different datasets we will be working on matching.

## Other classes of objects

`phylo4d` and `phylo4` are recent classes developed based on the `phylo` class that seeks to standardize and unify the representation of comparative data and phylogenetic trees in `R`.

## Exercises

*We are short in time! I will add a few exercises here later.*

# Community Phylogenetic Patterns

Ecologists share the goal of understanding and describing the processes that create the uneven distribution patterns of species, which create the variation in abundance, identity and diversity we know today. 

Among many factors that determine the presence of taxa in communities are the abiotic and biotic conditions of such communities (e.g., Diamond 1975; Westoby and Wright 2006; but see Hubbell 2001). 

Since taxa that recently diverged tend to be ecologically similar (Darwin 1859; Lord et al. 1995; Wiens and Graham 2005), a direct link may exist between the evolutionary relatedness of organisms in a community, the characters they possess, and the ecological processes that determine their distribution and abundance. 

In fact, many communities exhibit nonrandom patterns of evolutionary relatedness among co-occurring species (reviewed in Webb et al. 2002): a phenomenon we refer to as **phylogenetic community structure**.

## Preparing data for community phylogenetic analyses

For this analysis, we will use the data on the Jasper Ridge plant communities available in the book "**Phylogenies in Ecology**", written by Marc Cadotte and Jonathan Davies (2016). 

```{r JasperData, echo=TRUE, eval = TRUE}
JasperPlants.tree <- read.tree("data/Jasper/resources/data/Jasper/jasper_tree.phy")

JasperPlants.comm <- read.csv("data/Jasper/resources/data/Jasper/jasper_data.csv", 
                   row.names = 1)
```
 
Before proceeding to our analyses, we have to make sure that both the community and phylogenetic data match. For this, the names of the tips should match the species names in the dataset. This also means that species present in one dataset should not be absent in the other.

(*You thought you were done preparing your dataset, eh?*)

For this, we can use the function `drop.tip` from the `ape` package:

```{r JasperData-cleaning1, echo=TRUE, eval = TRUE}
JasperPlants.cleanTree <- drop.tip(phy = JasperPlants.tree, 
                              tip = setdiff(JasperPlants.tree$tip.label,
                                      colnames(JasperPlants.comm)))

```

You can also match your datasets using the function `match.phylo.comm` from `picante`, or the function `comparative.comm` from `pez`. 

```{r JasperData-cleaning2, echo=TRUE, eval = TRUE}
# Using picante
JasperPlants.picCleanTree <- match.phylo.comm(phy = JasperPlants.tree, 
                                              comm = JasperPlants.comm)$phy

JasperPlants.picCleanComm <- match.phylo.comm(phy = JasperPlants.tree, 
                                              comm = JasperPlants.comm)$comm


# Do the same using pez
# JasperPlants.pezCleanComm <- comparative.comm(phy = JasperPlants.tree, 
#                                               comm = as.matrix(JasperPlants.comm))

```

```{r JasperData-plotting, echo=TRUE, eval = TRUE}
plot(JasperPlants.picCleanTree,
     cex = 0.4)

JasperPlants.picCleanComm[1:4, 1:4]
```


## Phylogenetic $\alpha$ diversity (PD)

One simple way to measure evolutionary diversity within a community is through Daniel Faith's PD metric (Faith, 1992; http://goo.gl/wM08Oy).

PD sums the branch lenghts of all co-occurring species in a given site, from the tips to the root of the phylogenetic tree.

Higher PD values are given to communities that has more evolutionary divergent taxa and older history, while lower PD values represent assemblages that have taxa with more recent evolutionar history.

Faith's PD can be implemented using the `pd` function in the `picante` package.

In addition to Faith's PD, the `pd` function also returns species richness (SR).

SR is the same as observed richness ($S_{obs}$) or $\alpha$ diversity. 

```{r JasperData-PD, echo=TRUE, eval = TRUE}
Jasper.PD <- pd(JasperPlants.picCleanComm, JasperPlants.picCleanTree,
                include.root = FALSE)

head(Jasper.PD) # See the first six rows of the resulting matrix
```


And how PD estimates compares with SR?

```{r JasperData-PD-cor, echo=TRUE, eval = TRUE}
cor.test(Jasper.PD$PD, Jasper.PD$SR)

plot(Jasper.PD$PD, 
     Jasper.PD$SR, 
     xlab = "Phylogenetic Diversity", ylab = "Species Richness", 
     pch = 16)
```

## Randomizations and null models

We are often interested in assessing whether or not observed patterns differ from null expectation. `picante` has many functions that allow us to test hypotheses under different types of *null models*:
```{r, results = "asis", echo = FALSE, message = FALSE}
tex2markdown <- function(texstring) {
  writeLines(text = texstring,
             con = myfile <- tempfile(fileext = ".tex"))
  texfile <- pandoc(input = myfile, format = "html")
  cat(readLines(texfile), sep = "\n")
  unlink(c(myfile, texfile))
}

textable <- "
\\begin{center}
\\begin{tabular}{ m{5cm}  m{10cm} }
  \\textbf{Null Model} & \\textbf{Description} \\\\
  \\hline \\hline \\\\
  \\textbf{taxa.labels} & 
  Shuffles taxa labels across tips of phylogeny (across all taxa included in phylogeny) \\\\
  \\\\
  \\textbf{richness} & 
  Randomizes community data matrix abundances within samples (maintains sample species richness) \\\\
  \\\\  
  \\textbf{frequency} & 
  Randomizes community data matrix abundances within species (maintains species occurence frequency) \\\\
  \\\\   
  \\textbf{sample.pool} & 
  Randomizes community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability \\\\
  \\\\  
    \\textbf{phylogeny.pool} & 
  Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability \\\\
  \\\\
  \\textbf{independentswap} & 
  Randomizes community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness \\\\
  \\\\
  \\textbf{trialswap} & 
  Randomizes community data matrix with the trial-swap algorithm (Miklos \\& Podani 2004) maintaining species occurrence frequency and sample species richness \\\\
  \\\\
    \\hline
\\end{tabular}
\\end{center}"

tex2markdown(textable)
```


We will start using null models to calculate the *standardized effect size* (*ses*) of PD for each community. For this, we can use the `ses.pd` function in `picante`:

```{r JasperData-SESPD, echo=TRUE, eval = TRUE}
# This is a computationally intensive process and take some time to run
Jasper.ses.pd <- ses.pd(samp = JasperPlants.picCleanComm, 
                        tree = JasperPlants.picCleanTree, 
                        null.model = "taxa.labels", 
                        runs = 1000)

head(Jasper.ses.pd)
```


## Phylogenetic community structure

Another way of looking into the phylogenetic structure of communities is to quantify at what degree closely related taxa co-occur.

The basis for patterns of phylogenetic community structure can lie in a simple framework (table 1; Webb et al. 2002): when species traits responsible for their physiological tolerances are conserved, an environmental filtering that limits the range of viable ecological strategies at a given site is expected to select co-occurring species more related than expected by chance, *i.e.* generate a pattern known as *phylogenetic clustering*.

On the other hand, competitive exclusion can limit the ecological similarity of co-occurring species, generating a pattern of *phylogenetic overdispersion* or *phylogenetic evenness*.

Finally, when traits are diverging faster across the evolutionary time, the effects of habitat filtering should be weaker, producing evenly dispersed patterns of relatedness; while competition or limiting similarity is expected to produce random or clustered patterns. 

If communities are assembled independently with respect to traits (e.g., Hubbell 2001), then patterns of relatedness should be resemble random expectation.

Over the recent years, other processes were shown to drive phylogenetic patterns of community structure, such as, isolation, time since isolation, diversification, scale-dependent assembly and local and regional composition and richness.

```{r}
library(knitr)
library(kableExtra)

text_tbl <- data.frame(
    "Assembly process"= c("Limiting similarity", 
                          "Habitat filtering", 
                          "Neutral assembly"),
    "Under niche conservatism" = c("Even dispersion", 
                                   "Clustered dispersion", 
                                   "Random dispersion"),
    "Under niche divergence"= c("Random or clustered dispersion",
                                "Even dispersion",
                                "Random dispersion"))

kable(text_tbl, "html", align = "c") %>%
  kable_styling(full_width = F)
```

### Phylogenetic resemblance matrix 

An important step prior to the estimation of the phylogenetic structure of communities is to create a *phylogenetic resemblance matrix*, or a *phylogenetic variance-covariance matrix*. A *phylogenetic variance-covariance matrix* is nothing more than the calculated distances between taxa in a tree.

In R, we commonly refer to this matrix as a matrix of *cophenetic distances among taxa* and we can use the function `cophenetic.phylo` from `picante` to calculate it:

```{r CopheneticDist, echo=TRUE}
# Create a Phylogenetic Distance Matrix {picante}

JasperPlants.cophenDist <- cophenetic.phylo(JasperPlants.picCleanTree)
```


## Net relatedness index (NRI)

One common metric to evaluate the phylogenetic structure of commmunities is the **net relatedness index** (**NRI**), which is based on the **mean phylogenetic distances** (**MPD**) calculated from the pairwise cophenetic branch lengths distances of the co-occurring taxa in each community.

NRI is obtained by multiplying the *standardized effect size mean phylogenetic distances* (calculated via a null model of interest) by $-1$.

Negative NRI values indicate *phylogenetic overdispersion*, *i.e.* co-occurring taxa is less closely related than expected by a given null model; while positive NRI values represent *phylogenetic clustering*, *i.e.* co-occurring taxa are more closely related than expected by a null model.

In R, we will use the function `ses.mpd` from `picante` to calculate the *NRI* of our communities:

```{r sesMPD, echo=TRUE}
# Estimate Standardized Effect Size of the MPD on the cophenetic matrix 
JasperPlants.ses.mpd <- ses.mpd(JasperPlants.picCleanComm, 
                   JasperPlants.cophenDist, 
                   null.model = "taxa.labels", 
                   abundance.weighted = FALSE, 
                   runs = 100) 

head(JasperPlants.ses.mpd)

# Calculate NRI
JasperPlants.NRI <- as.matrix(-1 * 
                                ((JasperPlants.ses.mpd[,2] - JasperPlants.ses.mpd[,3]) /
                                   JasperPlants.ses.mpd[,4]))
rownames(JasperPlants.NRI) <- row.names(JasperPlants.ses.mpd)
colnames(JasperPlants.NRI) <- "NRI"


head(JasperPlants.NRI)
```


## Nearest taxon index (NTI)

Another way to assess the phylogenetic structure of communities is through the calculation of the **nearest taxon index** (**NTI**). NTI differs from NRI by being based on the **mean nearest neighbour phylogenetic distance** (**MNTD**), instead of the overall mean pairwise distances (**MPD**). 

MNTD is obtained by calculating the mean phylogenetic distance of each taxon of a given community to its nearest neighbour in a tree. By doing this, one can observe that NTI is more sensitive to the clustering or overdispersion patterns happening closer to the tips of a tree, while NRI represents an overall pattern of structuring across the whole phylogenetic tree.

As in NRI, higher values of NRI indicate phylogenetic clustering and lower values of NRI indicate phylogenetic overdispersion.

```{r sesMNTD, echo=TRUE}
# Estimate Standardized Effect Size of the MNTD on the cophenetic matrix 
JasperPlants.ses.mntd <- ses.mntd(JasperPlants.picCleanComm, 
                   JasperPlants.cophenDist, 
                   null.model = "taxa.labels", 
                   abundance.weighted = FALSE, 
                   runs = 100) 

head(JasperPlants.ses.mntd)

# Calculate NTI
JasperPlants.NTI <- as.matrix(-1 * 
                                ((JasperPlants.ses.mntd[,2] - JasperPlants.ses.mntd[,3]) / 
                                   JasperPlants.ses.mntd[,4]))
rownames(JasperPlants.NTI) <- row.names(JasperPlants.ses.mntd)
colnames(JasperPlants.NTI) <- "NTI"


head(JasperPlants.NTI)
```

## Phylogenetic beta diversity (PBD)

One may not be interested only in the patterns of $\alpha$ diversity, but in understanding how communities are evolutionary distinct from each other, *i.e.* knowing the pairwise phylogenetic $\beta$ diversity.

Many metrics are available to assess the phylogenetic $\beta$ diversity of communities. Here, we will do this by calculating the *PhyloSor* index, which is the phylogenetic analog of the Sørensen index (Bryant et al., 2008; Swenson, 2011; Feng et.al., 2012). *PhyloSor* can be calculated using the function `phylosor` from `picante`:

```{r JasperData-PhyloSor, echo=TRUE, eval = TRUE}
Jasper.pbd <- phylosor(samp = JasperPlants.picCleanComm, 
                        tree = JasperPlants.picCleanTree)

as.matrix(Jasper.pbd)[1:7,1:7]
```

We can go one step further and test whether the observed PBD values differ from null expectation under different *null models*. First, we will use the function `phylosor.rnd` to obtain PhyloSor values of phylogenetic beta-diversity obtained by randomization:  

```{r JasperData-PhyloSorRnd, echo=TRUE, eval = TRUE}
Jasper.pbd.rndTaxa <- phylosor.rnd(samp = JasperPlants.picCleanComm, 
                        tree = JasperPlants.picCleanTree,
                        cstSor = TRUE,
                        null.model = "taxa.labels", # Pay attention to the null models!
                        runs = 100)

```

Apparently, the a *standardized effect size* function is not available within `picante`. Thus, we will create our own null model to test our hypothesis:

```{r JasperData-PhyloSorSES, echo=TRUE, message=TRUE, warning=TRUE, paged.print=TRUE}
# I first ran ses.pd to build this in a more-or-less similar format

ses.phylosor <- function(obs, rand){
  rand <- t(as.data.frame(lapply(rand, as.vector)))
  
  phySor.obs <- as.numeric(obs)
  
  phySor.mean <- apply(rand, MARGIN = 2, 
                      FUN = mean, 
                      na.rm = TRUE)
  
  phySor.sd <- apply(rand, 
                    MARGIN = 2, 
                    FUN = sd, 
                    na.rm = TRUE)
  
  phySor.ses <- (phySor.obs - phySor.mean)/phySor.sd
  
  phySor.obs.rank <- apply(X = rbind(phySor.obs, rand), 
                          MARGIN = 2, 
                          FUN = rank)[1, ]
  
  phySor.obs.rank <- ifelse(is.na(phySor.mean), NA, phySor.obs.rank)
  
  data.frame(phySor.obs, 
             phySor.mean, 
             phySor.sd, 
             phySor.obs.rank, 
             phySor.ses, 
             phySor.obs.p = phySor.obs.rank/(dim(rand)[1] + 1))
}

  
JasperPlants.ses.pbd  <- ses.phylosor(Jasper.pbd,
                                      Jasper.pbd.rndTaxa)

# Project the table using knitr::kable
round(JasperPlants.ses.pbd[45:55,], 3) %>%
  kable("html") %>%
  kable_styling()
```

## Decomposing phylogenetic beta diversity

Our questions about phylogenetic can become even more complex!

One may also be interested in decomposing the turnover and nestedness components of beta diversity (Baselga, 2010; Leprieur et al., 2012). 
Baselga (2010) proposed an additive partitioning framework that consists in decomposing the pair-wise Sørensen's dissimilarity index into two additive components accounting for (i) ‘true’ turnover of species and (ii) richness differences between nested communities. In 2012, Leprieur et al. (2012) extended Baselga's (2010) framework to decompose PhyloSor into two components accounting for ‘true’ phylogenetic turnover and phylogenetic diversity gradients, respectively.

*Still not covered in this tutorial!*

# Trait Evolution

One central concept in ecophylogenetics is the non‐independence among species traits because of their phylogenetic relatedness (Felsenstein 1985; Revell, Harmon & Collar 2008).

For this part of the workshop, we will use our newly acquired knowledge in the *Edge length and rate transformations* section about **Pagel's $\lambda$**, **$\kappa$** and **$\delta$**. We will also see about alternative evolutionary models and learn how to test whether **phylogenetic signal** is present in species traits, as well as, **fit traits to different models of evolution**. 

## Dataset preparation

We are now introducing a third-type of data in our phylogenetic analyses: traits. Here we will use a different dataset.

For this part of the course, we will fetch our dataset from papers using a very useful R package: `fulltext`. `fulltext` is maintained by Scott Chamberlain, Will Pearse and Katrin Leinweber, and is able to get files directly from the supplementary material of published papers, with only the DOI references!

```{r willPearseFiles, echo=TRUE, message=TRUE}
# The idea of using this dataset was borrowed from Will Pearse's minicourse on ecophylogenetics.
# Nevertheless, we will use it for further analyses here!

library('fulltext')

MCDB.comm <- read.csv(ft_get_si("E092-201", "MCDB_communities.csv", "esa_data_archives"))
MCDB.species <- read.csv(ft_get_si("E092-201", "MCDB_species.csv", "esa_data_archives"))
Mammalian.tree <- read.nexus("data/ele_1307_sm_sa1.tre")[[1]] # fulltext is not fetching files from Wiley
PanTHERIA.traits <- read.delim(ft_get_si("E090-184", "PanTHERIA_1-0_WR05_Aug2008.txt", "esa_archives"))
```

*What have we just done?* 

In `MCDB.comm`, we have loaded 7977 records of species composition and abundance of mammalian communities (excluding bats) for 1000 sites throughout the world, which was compiled from published literature.

`MCDB.species` contains 700 records that includes the name of species and their taxonomic information.

This data is part of the ecological archive of the paper by Thibault et al. (2011) 

(*Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.*)

`Mammalian.tree` loaded the first tree described in the published paper by Fritz et al. 2009 

(*Fritz, S. A., Bininda-Emonds, O. R. P. and Purvis, A. (2009), Geographical variation in predictors of mammalian extinction risk: big is bad, but only in the tropics. Ecology Letters 2(6):538–549. doi: 10.1111/J.1461-0248.2009.01307*)

And, finally, `PanTHERIA.traits` contain a species-level data set of life history, ecology, and geography of extant and recently extinct mammals. The metadata for this dataset can be consulted in its [website](http://esapubs.org/archive/ecol/e090/184/metadata.htm).

Now that we know about our dataset, we can explore each one of the loaded objects:

```{r exploreFiles, echo=TRUE, message=TRUE}
head(MCDB.comm)

head(MCDB.species)

str(Mammalian.tree)

str(PanTHERIA.traits)
```

Now, let's *treat* our dataset:

(*Yes, we spend a lot of time making our dataset ready for analysis!*)

```{r dataSetComm, echo=TRUE, message=TRUE}
# Starting with the community file:
# Let's make a species-site matrix

MCDB.comm$Abundance <- as.numeric(as.character(MCDB.comm$Abundance))

head(MCDB.comm$Abundance)
```

```{r dataSetSpecies, echo=TRUE, message=TRUE}
# Add a new column with real species names, extracted and matched from the MCDB.species dataset

MCDB.comm$Species <- with(MCDB.species, 
                          paste(Genus, Species)[match(MCDB.comm$Species_ID, Species_ID)])

# Have you seen "with" before? Click on the link below 
# and see how this could have been done without "with". 
# Tell us the difference!

MCDB.comm$Species <- gsub("  ", " ", MCDB.comm$Species)

head(MCDB.comm)

```

*Looking good?* It still does not look like a *species by site matrix*, right? Let's transform the sample dataset into a matrix using `picante`'s `sample2matrix` function:

```{r dataSetSpecSite, echo=TRUE, message=TRUE}
MCDB.comm <- with(MCDB.comm,
                  as.matrix(sample2matrix(
                    data.frame(Site_ID, # This will be our row.names
                               as.numeric(as.character(Abundance)),
                               Species))))

# How is this looking now?
MCDB.comm[1:6, 1:6]
```

Now, let's go to the *phylogenetic tree*:

```{r dataSetTree, echo=TRUE, message=TRUE}
# We have noticed that the labels of the phylogenetic tree 
# has a different separator than the community dataset

# Let's replace the "_" in the phylogenetic tree by " "

Mammalian.tree$tip.label <- gsub("_", " ", Mammalian.tree$tip.label)

# Solve potential polytomies

Mammalian.tree <- multi2di(Mammalian.tree)

Mammalian.tree
```

And, now, the *trait dataset*. Let's select only a few characters of interest:

```{r dataSetTrait, echo=TRUE, message=TRUE, warning=TRUE}
PanTHERIA.traits <- with(PanTHERIA.traits,
                         data.frame(BodyMass = log10(X5.1_AdultBodyMass_g),
                                    HeadLength = (X13.1_AdultHeadBodyLen_mm),
                                    LatitudinalRange = (X26.4_GR_MidRangeLat_dd),
                                    mPrecipRange = (X28.1_Precip_Mean_mm),
                                    mTempRange = (X28.2_Temp_Mean_01degC),
                                    HomeRange = (X22.1_HomeRange_km2),
                                    Terrestriality = (X12.2_Terrestriality),
                                    row.names = MSW05_Binomial))

str(PanTHERIA.traits)
```

```{r dataSetTraitNaN, echo=TRUE, message=TRUE, warning=TRUE}
# Remove "-999"" and "NaN" values

PanTHERIA.traits[PanTHERIA.traits == -999] <- NA
PanTHERIA.traits <- na.omit(PanTHERIA.traits)

str(PanTHERIA.traits)
```

Finally, let's assure that all datasets are matching in species number and names. This time, instead of using the `match.phylo.data` and `match.phylo.comm` functions from `picante`, we will use the `comparative.comm` from `pez`:

```{r comparativeComm, echo=TRUE, message=TRUE, warning=TRUE}
Mammal.CompData <- comparative.comm(Mammalian.tree, 
                    MCDB.comm, 
                    PanTHERIA.traits)

# Also, for the moment, there is no reason to have sites 
# that have no species recorded in them

Mammal.CompData <- Mammal.CompData[rowSums(comm(Mammal.CompData)) > 0, ]
Mammal.CompData <- Mammal.CompData[,colSums(comm(Mammal.CompData)) > 0]

str(Mammal.CompData)
```

## Phylogenetic signal

One way to assess the existence and degree of phylogenetic non‐independence in species traits is through the calculation of the *phylogenetic signal* of a given trait. **Phylogenetic signal** can be defined as the **tendency for related species to resemble each other more than they resemble species drawn at a null or alternative evolutionary model** (Blomberg & Garland 2002). 

*Phylogenetic signal* has been commonly used to investigate questions in a wide range of research areas:  How strongly are certain traits correlated with each other (Felsenstein 1985)?  Which processes drive community assembly (Webb et al. 2002)? Are niches conserved along phylogenies (Losos 2008) and is vulnerability to climate change clustered in the phylogeny (Thuiller et al. 2011)?

Let's learn about a few ways of measuring phylogenetic signal:

### Moran's I and Moran's Correlogram

One simple way of measuring phylogenetic signal is through **Moran's I** (Gittleman and Kot, 1990; Moran, 1950), which estimates the existing autocorrelation between species and their traits as a function of their phylogenetic relatedness.

Positive Moran's $I$ values indicate positive phylogenetic correlation between species; negative values indicate that closely related species are more different than distantly related species; and an $I$ of $0$ indicates absence of autocorrelation.

For this, we will use `geiger`'s `Morain.I` function. See an example below:

```{r MoransI, echo=TRUE}
# We will first calculate a proximity matrix, where we can 
# change the weight of distantly related species by changing 
# the parameter alpha

alphaI <- 1
wI <- 1/cophenetic(Mammal.CompData$phy)^alphaI
diag(wI) <- 0

wI[1:5, 1:5]

# We can then calculate Moran's I

Mammal.BodyMass.I <- Moran.I(x = Mammal.CompData$data$BodyMass,
                             weight = wI,
                             scaled = TRUE)

Mammal.BodyMass.I
```

*What is the meaning of this?*

We have just calculated the **global Moran's I**, which allowed us to assess the existence of overall phylogenetic autocorrelation for a given trait. However, we also know that the degree of autocorrelation can change across different phylogenetic distances.

We can build a **Moran's correlogram** to assess how phylogenetic autocorrelation changes across different phylogenetic distance classes.

For this, we will use the functions `phyloCorrelogram` and `plot.phylocorrelogram` from `adephylo`:

```{r phyloCorrel, echo=TRUE}
# Calculate the correlogram under a phylo4d object
Mammal.BodyMass.MoranCor <- phyloCorrelogram(phylo4d(x = Mammal.CompData$phy, 
                                 tip.data = Mammal.CompData$data$BodyMass), 
                           n.points = 10, 
                           ci.bs = 10)

# Represent the correlogram
plot.phylocorrelogram(Mammal.BodyMass.MoranCor)
```

*How do you interprete this correlogram?*

### Blomberg's K

A common way to measure phylogenetic signal is through **Blomberg's K statistic** (Blomberg et al. 2003), which compares the observed signal in a trait to its expected signal under multiple estimations under a Brownian motion model of trait evolution on a phylogeny.

*K* works as a mean square ratio (MSE), where the numerator is the error assuming that the trait evolves independently from phylogenetic structure, and the denominator is corrected by the phylogenetic covariances.


*K* values equal to 1 correspond to a Brownian motion process. *K* values greater than 1 indicate strong phylogenetic signal and that traits are being conserved through the evolutionary time. Alternatively, *K* values closer to zero correspond to a random or convergent pattern of evolution.


```{r KBodyMass, echo=TRUE, message=TRUE, warning=TRUE}
(Mammal.BodyMass.K <- phylosig(Mammal.CompData$phy, 
               Mammal.CompData$data$BodyMass, 
               method = "K", test = TRUE))
```

How about other traits?

```{r KHeadLengthmTempLength, echo=TRUE, message=TRUE, warning=TRUE}
(Mammal.HeadLength.K <- phylosig(Mammal.CompData$phy, 
               Mammal.CompData$data$HeadLength, 
               method = "K", test = TRUE))

(Mammal.mTempRange.K <- phylosig(Mammal.CompData$phy, 
               Mammal.CompData$data$mTempRange, 
               method = "K", test = TRUE))
```

*Look at how can we interprete K in the text above the code, and tell how if there is phylogenetic signal for each one of the above traits.*

## Evolutionary models

### Brownian motion model

For continuous traits, we commonly assume a **Brownian motion (BM) model** of evolutionary change. This model was proposed by Felsenstein (1985) and represents traits divergence independently across the evolutionary time, due to genetic drift (and sometimes other processes, such as natural selection), in resemblance to a random walk (or to a *drunkard's walk*).

The BM model assumes that the correlation among trait values is proportional to the extent of shared ancestry for pairs of species.

It is a stochastic model in which changes from one time to the next are random draws from a normal distribution with mean $0$ and variance $σ^2 × Δt$. This means that variance under BM motion increases linearly through time with instantaneous rate $σ^2$.

BM is a model that can be easily simulated:

```{r BMsim1, echo=TRUE, warning=TRUE}
# For the moment, this is a wrapper of 
# Liam Revell's tutorial on Brownian Motion simulation 
# http://www.phytools.org/eqg/Exercise_4.1/

t <- 0:100  # time
sig2 <- 0.01

## first, simulate a set of random deviates
x <- rnorm(n = length(t) - 1, 
           sd = sqrt(sig2))

## now compute their cumulative sum
x <- c(0, cumsum(x))
plot(t, x, 
     type = "l", 
     ylim = c(-2, 2))
```

Now, with multiple simulations and changing $\sigma^2$:

```{r BMsim2, echo=TRUE, fig.height=6, fig.width=18, warning=TRUE}
nsim <- 100

X <- matrix(rnorm(n = nsim * (length(t) - 1), 
                  sd = sqrt(sig2)), 
            nsim, 
            length(t) - 1)
X <- cbind(rep(0, nsim), 
           t(apply(X, 1, cumsum)))

par(mfrow = c(1,3))

plot(t, X[1, ], 
     xlab = "time", 
     ylab = "phenotype", 
     ylim = c(-2, 2), 
     type = "l")

apply(X[2:nsim, ], 1, 
      function(x, t) 
        lines(t, x), t = t)

##

X <- matrix(rnorm(n = nsim * (length(t) - 1), 
                  sd = sqrt(sig2/10)), 
            nsim, 
            length(t) - 1)

X <- cbind(rep(0, nsim), 
           t(apply(X, 1, cumsum)))

plot(t, X[1, ], 
     xlab = "time", ylab = "phenotype", 
     ylim = c(-2, 2), 
     type = "l")

apply(X[2:nsim, ], 1, 
      function(x, t) lines(t, x), t = t)

##

nsim = 100

X <- matrix(rnorm(mean = 0.02, 
                  n = nsim * (length(t) - 1), sd = sqrt(sig2/4)), 
    nsim, length(t) - 1)
X <- cbind(rep(0, nsim), 
           t(apply(X, 1, cumsum)))

plot(t, X[1, ], 
     xlab = "time", ylab = "phenotype", 
     ylim = c(-1, 3), type = "l")

apply(X[2:nsim, ], 1, 
      function(x, t) lines(t, x), t = t)
```

If we want to think on the perspective of a tree:

```{r phyloBM, echo=TRUE}
t <- 100  # total time
n <- 30  # total taxa
b <- (log(n) - log(2))/t
tree <- pbtree(b = b, n = n, t = t, type = "discrete")  

## simulate Brownian evolution on a tree with fastBM
x <- fastBM(tree, sig2 = sig2, internal = TRUE)
## visualize Brownian evolution on a tree

phenogram(tree, x, spread.labels = TRUE, spread.cost = c(1, 0))
```

### Ornstein-Uhlenbeck model

The **Ornstein-Uhlenbeck (OU) evolutionary model** is based on stabilizing selection (Hansen, 1997; but see Butler and King 2004), where a given species trait is determined by its common ancestral state ($a$) and the strengh of the selection ($\alpha$) that attracts traits towards its optimal value ($\theta$).

```{r OUsim, echo=TRUE}
#Ornstein-Uhlenbeck (OU)
nsim=50
time.steps = 50

ou<-matrix(ncol=nsim,nrow=time.steps)
ou[1,]<-0

alpha=-0.05

for(i in 2:time.steps){
    ou[i,] = ou[i-1,] + alpha*(ou[i-1,]) + rnorm(nsim, mean=0, sd=1) 
}

plot(1:time.steps, ou[,1],
     xlim=c(0,time.steps),
     ylim=c(-max(abs(ou)),
            max(abs(ou))),
     type="n",
     xlab="Time", ylab="Trait value",
     bty="n")

matlines(1:time.steps,
         ou,
         lty=1,
         col="#FF303080", 
         lwd=2)

```


We can again use `rescale` on a tree to see how it stretches to an *OU* model:

```{r echo=TRUE}
tree <- rcoal(10)

par1 <- par(mfrow = c(1, 2))

plot(tree)

tree_scaled <- rescale(tree, model = "OU")
plot(tree_scaled(2))
par(par1)
```

### Early-burst model

The **early-burst evolutionary model**, also known as **ACDC model** (**accelerating-decelerating**; Blomberg et al. 2003) simulates scenarios where evolutionary rates increase and decrease exponentially through time, under the model $r[t] = r[0] * exp(a * t)$, where $r[0]$ is the initial rate, $a$ is the rate change parameter, and $t$ is time. 


```{r EBsim}
nsim=50
time.steps = 50

#Early-Burst (aka ACDC)
eb <- matrix(ncol= nsim, nrow=time.steps)
eb[1,] <- 0

r = -0.2

for(i in 2:time.steps){
    eb[i,] = eb[(i-1),] +  (exp(r*i)*  rnorm(nsim,mean=0,sd=1))
    }

plot(1:time.steps, eb[,1],
     xlim=c(0, time.steps),
     ylim=c(-max(abs(eb)),
            max(abs(eb))),
     type="n",
     xlab="Time", ylab="Trait value",
     bty="n")

matlines(1:time.steps, 
         eb,
         lty=1,
         col="#00BFFF70", lwd=2)
```

Finally, we can rescale a tree to see an EB model of evolution:

```{r}
par1 <- par(mfrow = c(1, 3))

plot(tree)

tree_scaled <- rescale(tree, model = "EB")
plot(tree_scaled(-5))

tree_scaled <- rescale(tree, model = "EB")
plot(tree_scaled(5))

par(par1)
```


We can see all three simulations together with this:

```{r allImages, echo=FALSE}

nsim=50
time.steps = 50

#Brownian Motion (BM)
bm<-matrix(ncol=nsim,nrow=time.steps)
bm[1,]<-0

for(i in 2:time.steps){
    bm[i,] = bm[(i-1),] + rnorm(nsim,mean=0,sd=1)    
    }

plot(1:time.steps,bm[,1],xlim=c(0,time.steps),ylim=c(-max(abs(bm)),max(abs(bm))),type="n",xlab="Time",ylab="Trait value",bty="n")
matlines(1:time.steps,bm,lty=1,col="#70707080", lwd=2)

#Ornstein-Uhlenbeck (OU)
ou<-matrix(ncol=nsim,nrow=time.steps)
ou[1,]<-0

alpha=-0.05

for(i in 2:time.steps){
    ou[i,] = ou[i-1,] + alpha*(ou[i-1,]) + rnorm(nsim,mean=0,sd=1) 
    }

matlines(1:time.steps,ou,lty=1,col="#FF303080", lwd=2)


#Early-Burst (aka ACDC)
eb<-matrix(ncol=nsim,nrow=time.steps)
eb[1,]<-0

r=-.2

for(i in 2:time.steps){
    eb[i,] = eb[(i-1),] +  (exp(r*i)*  rnorm(nsim,mean=0,sd=1))
    }

matlines(1:time.steps,eb,lty=1,col="#00BFFF70", lwd=2)
```


## Fitting evolutionary models to trait data

Knowing if there is phylogenetic signal in traits is nice, but it is just *knowing*, right? Perhaps, a more informative way of assessing the phylogenetic non-independence in species traits is to properly test how traits fit to different models of trait evolution.

Here, we will use the functions `fitContinuous` from `geiger` to compare the relative fit of our data to different evolutionary models: 

```{r fitContinous, echo=TRUE, message=TRUE, warning=TRUE}
EvoModels <- c("BM", 
               "OU", 
               "EB")

fitEvoModels <- lapply(EvoModels,
           FUN = function(x) fitContinuous(multi2di(Mammal.CompData$phy), 
                                           Mammal.CompData$data[, "BodyMass", drop = FALSE], 
                                           model = x, 
                                           ncores = 2))
```   

Let's take sometime to explore the results:

```{r fitContinous-BM, echo=TRUE, message=TRUE, warning=TRUE}
# checking the Brownian motion fit
fitEvoModels[[1]]

head(fitEvoModels[[1]]$res)
```

```{r fitContinous-EB, echo=TRUE, message=TRUE, warning=TRUE}
# checking the Brownian motion fit
fitEvoModels[[2]]

head(fitEvoModels[[2]]$res)
```

```{r fitContinous-OU, echo=TRUE, message=TRUE, warning=TRUE}
# checking the Brownian motion fit
fitEvoModels[[3]]

head(fitEvoModels[[3]]$res)
```

```{r fitContinous-BM2, echo=TRUE, message=TRUE, warning=TRUE}
AICs <- sapply(fitEvoModels, 
               FUN = function(x) x$opt$aicc)

names(AICs) <- EvoModels

# Which one is the best model?
(AICtable <- aicw(AICs))
```   

# Diversification rates


```{r eval=FALSE, include=FALSE}
if (.Platform$OS.type == "windows") {
    windowsFonts(
        Impact = windowsFont("Impact"),
        Courier = windowsFont("Courier")
    )
}

library(meme)

u <- system.file("success.jpg", package="meme")
meme(u, "Time to eat!", "YAY!")
```