This repository has been archived by the owner on Jul 28, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
supplement.tex
73 lines (53 loc) · 5.04 KB
/
supplement.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
\documentclass[10pt,a4paper]{article}
\usepackage[left=2cm,right=2cm,top=3cm,bottom=3cm]{geometry}
\usepackage{hyperref}
\usepackage{alltt}
\usepackage{supertabular}
\setcounter{table}{0}
\renewcommand{\thetable}{S\arabic{table}}%
\setcounter{figure}{0}
\renewcommand{\thefigure}{S\arabic{figure}}%
\begin{document}
\title{Supplementary Material for \emph{BEASTling: a software tool for linguistic phylogenetics using BEAST 2}}
\author{Luke Maurits, Robert Forkel, Gereon A. Kaiping, Quentin D. Atkinson}
\maketitle
\section{Indo-European example configuration}
\begin{alltt}
\input{examples/indoeuropean/indoeuropean.conf}
\end{alltt}
\section{Categorisation of Indo-European meaning slots}
Table \ref{tab:categories} shows how the 200 meaning slots in the Indo-European cognate data set used for our first example analysis were categorised to produce the rate variation distributions in Figure 3. Note that the categories are disjoint: each meaning slot is assigned to one category only. So, the category of nouns should be thought of as ``all those nouns which are not pronouns or body-parts'', and the category of adjectives should be thought of as ``adjectives other than colours''. Note that a two meanings were excluded for the purposes of this figure: \emph{not}, which is the only adverb in the data set, and \emph{one}, which is the only numeral.
\begin{table}[]
\begin{center}
\small
\input{examples/indoeuropean/supp_meaning_table.tex}
\end{center}
\caption{\textbf{Categorisation of the meaning slots in the Indo-European example analysis used to display rate variation}.}
\label{tab:categories}
\end{table}
\section{Austronesian example configuration}
\begin{alltt}
\input{examples/austronesian/austronesian.conf}
\end{alltt}
\section{Austronesian language set and tree derivation}
The maximum clade credibility tree from Gray et. al.'s 2009 publication contains 400 Austronesian languages, identified by written names such as ``Sediq'', ``Bunan'', ``CentralAmis''. Along with the tree, Simon Greenhill kindly provided us with a file mapping the 400 unique names to ISO codes, to facilitate attaching WALS data to the tree (the majority of languages in WALS have ISO codes assigned to them). However, some of the languages in the reference tree had no corresponding ISO code, and some ISO codes were used for multiple named languages. Similarly, many distinct languages in WALS are assigned the same ISO code. Achieving a unique mapping between languages in WALS and languages in the reference tree is therefore not a straightforward process. The final set of languages used for our example analysis, and their association with languges in WALS, was derived programmatically as follows.
First, all named languages in the 400 taxa tree which were not mapped to ISO codes were discarded. After this process, 395 of the original 400 languages remained in the tree.
Next, all ISO codes which mapped to multiple named Austronesian languages were identified. For each of these codes, we compared the list of all the names associated with that code in the Austronesian tree with the list of all the names of languages in WALS associated with that code. After converting all names to lowercase, if there was exactly one name which was common to both lists, then the WALS language with that name was associated with the language on the Austronesian tree associated with that name, and all other languages on the Austronesian tree associated with that ISO code were removed. If there was no such matching name, then \emph{all} languages on the Austronesian tree corresponding to that ISO code were removed. After this process, 332 of the original 400 languages remained in the tree, with 332 unique ISO codes.
These 332 ISO codes map to a set of languages in WALS, and some ISO codes map to multiple languages. We resolved these duplicates in a similar manner to our resolution for duplicated ISO codes in the Austronesian tree. If an ISO code was associated with multiple WALS languages, but exactly one WALS language had a name which matched the (unique) name associated with that ISO code in the Austronesian tree, then that WALS language was retained and all others sharing its ISO code were removed. If no such match could be found, all languages with that ISO code were removed. After this process, 169 WALS languages are mapped to the tree via unique ISO codes.
The final languages' ISO codes and their names in the original Austronesian tree file are shown in Table \ref{tab:langs}. The WALS features for which these languages have data are shown in Table \ref{tab:features}.
\begin{table}[]
\begin{center}
\tiny
\input{examples/austronesian/supp_language_table.tex}
\end{center}
\caption{\textbf{Languages in the Austronesian example analysis, including their ISO code and unique names as used by the Gray et. al. study and by WALS}.}
\label{tab:langs}
\end{table}
\begin{table}[ht]
\begin{center}
\input{examples/austronesian/supp_feature_table.tex}
\end{center}
\caption{\textbf{WALS features included in the Austronesian example analysis}.}
\label{tab:features}
\end{table}
\end{document}