Skip to content
This repository has been archived by the owner on Jul 28, 2020. It is now read-only.

Commit

Permalink
Minor changes to bring numerical values etc. into line with official …
Browse files Browse the repository at this point in the history
…runs of example analyses.
  • Loading branch information
lmaurits committed Mar 10, 2017
1 parent e6134ff commit 1eabd94
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions beastling.tex
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ \subsection{Estimating Indo-European family tree from cognate data}

Our first example is an inference of a phylogenetic tree for the Indo-European language family, using cognate data and the binary Covarion model. The dataset\cite{List2014a} (prepared by List\cite{List2014} uses material from the ``Tower of Babel'' project\cite{Starostin2008}) and is comparatively small, containing 19 languages and 110 features, each of which corresponds to a word meaning. The datapoints are cognate class assignments, coded as integers. That is, two languages have the same integer for a given meaning if their words for that meaning are cognate. Known borrowings are indicated by negative values, i.e. a datapoint of -4 indicates that a language has borrowed a meaning from cognate class 4. Before running the analysis, we replace all known borrowings with question marks, so that they are treated by BEAST as missing data. Seven meanings in the dataset are automatically removed by BEASTling because they are constant for the 19 languages included, and thus cannot provide information about the tree topology (these meanings are \emph{claw, name, new, salt, two, what} and \emph{who}). Because the binary Covarion model was specified, BEASTling automatically reformats the cognate data for the 103 remaining meanings into binary form, resulting in 645 binary features. Because the languages in the datafile are identified by English names (``Dutch'', ``Swedish'', ``English'', etc.) and not ISO codes or Glottocodes, BEASTling cannot automatically impose monophyly constraints, so this feature is disabled. No calibration dates are provided, and rate variation across features is enabled.

The maximum clade credibility tree produced by this analysis is shown in Figure \ref{fig:ie}. Note that despite the lack of monophyly constraints, the tree is in good agreement with conventional wisdom on Indo-European history. The Slavic, Germanic and Romance sub-families are all correctly positioned in their own clades. The Slavic clade is correctly divided into East, South and West Slavic, and the Germanic clade is correctly divided into North and West Germanic. The order in which Armenian, Greek and Hindi branch differs from previous analysis\cite{Gray2003,Bouckaert2012}, which may be at least partially due to the small number of languages in the dataset (Hindi is the only representative of the Indo-Iranian subfamily). Further, the close relationship between Romanian and French is unexpected, which may be due to the influence of occasional erroneous cognate judgements in the dataset\cite{Geisler2010} as well as efforts at ``purification'' of Romanian\cite{Nelson-Sathi2010}. It is important to understand that the tree shown in Figure \ref{fig:ie} is one of a posterior sample of 10,000 trees, in particular the tree which best represents the clades which are most strongly supported in the sample. Different parts of the tree topology may be more or less uncertain, and this is indicated graphically in the figure by the solidity of the branches. While the Slavic, Germanic and Romance sub-families have...
The maximum clade credibility tree produced by this analysis is shown in Figure \ref{fig:ie}. Note that despite the lack of monophyly constraints, the tree is in good agreement with conventional wisdom on Indo-European history. The Slavic, Germanic and Romance sub-families are all correctly positioned in their own clades. The Slavic clade is correctly divided into East, South and West Slavic, and the Germanic clade is correctly divided into North and West Germanic. The order in which Armenian, Greek and Hindi branch differs from previous analysis\cite{Gray2003,Bouckaert2012}, which may be at least partially due to the small number of languages in the dataset (note Hindi is the only representative of the Indo-Iranian subfamily). The close relationship between Romanian and French is also unexpected, and may be due to the influence of an erroneous cognate judgement in the dataset\cite{Geisler2010} as well as efforts at ``purification'' of Romanian\cite{Nelson-Sathi2010}. It is important to understand that the tree shown in Figure \ref{fig:ie} is one of a posterior sample of 10,000 trees, in particular the tree which best represents the relationships which are most strongly supported in the overall sample. Different parts of the tree topology are more or less strongly supported, and this is indicated graphically in the figure by the solidity of the branches. While the well-established Slavic, Germanic and Romance sub-families have posterior probabilities of 1.0, the more questionable Romanian-French clade has a posterior probability of 0.60 and the Armenian-Greek clade has a probability of just 0.29.

\begin{figure}[t]
\begin{center}
Expand All @@ -152,13 +152,13 @@ \subsection{Estimating Indo-European family tree from cognate data}
\label{fig:ie}
\end{figure}

In addition to a posterior sample of trees, the analysis logs posterior distributions over the relative substitution rate parameters for the 103 meaning slots. A considerable amount of rate variation is inferred, with the fastest meaning undergoing change 22 times faster than the slowest meaning. Table \ref{tab:ie} shows the meanings with the ten highest and ten lowest rates, while Figure \ref{fig:ie_rates} shows how the distribution over rates varies across different parts of speech (see Supplementary Material for part of speech assignments). Verbs and nouns both have median rates well below the average of 1.0, with long tails toward higher rates. In contrast, adjectives have a median rate very close to average, with nearly symmetric tails toward lower and higher rates. Words for body parts evolve somewhat more slowly than other nouns, and pronouns have a tight rate distribution with only a single outlier with an above average rate, consistent with previous accounts of Indo-European pronouns showing little evidence of borrowing or grammaticaliation\cite{Muysken2008}. Similarly, colour terms are markedly more stable than other adjectives, with no colour terms having faster than average rates.
In addition to a posterior sample of trees, the analysis logs posterior distributions over the relative substitution rate parameters for the 103 meaning slots. A considerable amount of rate variation is inferred, with the fastest meaning undergoing change 20 times faster than the slowest meaning. Table \ref{tab:ie} shows the meanings with the ten highest and ten lowest rates, while Figure \ref{fig:ie_rates} shows how the distribution over rates varies across different parts of speech (see Supplementary Material for part of speech assignments). Verbs and nouns both have median rates well below the average of 1.0, with long tails toward higher rates. In contrast, adjectives have a median rate very close to average, with symmetric tails toward lower and higher rates. Words for body parts evolve somewhat more slowly than other nouns, and pronouns have a tight rate distribution with only a single outlier with an above average rate, consistent with previous accounts of Indo-European pronouns showing little evidence of borrowing or grammaticalisation\cite{Muysken2008}. Similarly, colour terms are markedly more stable than other adjectives, with no colour terms having faster than average rates.

\begin{table}[t]
\begin{center}
\input{examples/indoeuropean/table.tex}
\end{center}
\caption{\textbf{Table \ref{tab:ie}}. \textbf{Relative substitution rates of the ten slowest and fastest changing meaning slots in our example analysis of Indo-European cognate data}. Rates are relative to the average across all features, e.g. \emph{sun} evolves roughly 10 times more slowly than average, while \emph{belly} evolves at more than twice the average rate. Note that many of the slowest meanings are body parts.}
\caption{\textbf{Table \ref{tab:ie}}. \textbf{Relative substitution rates of the ten slowest and fastest changing meaning slots in our example analysis of Indo-European cognate data}. Rates are relative to the average across all features, e.g. \emph{tooth} evolves almost 10 times more slowly than average, while \emph{know} evolves at just over twice the average rate. Note that many of the slowest meanings are body parts.}
\label{tab:ie}
\end{table}

Expand All @@ -178,13 +178,13 @@ \subsection{Fitting substitution rates to WALS features using a fixed Austronesi

We label the leaves of the reference tree with ISO codes, and BEASTling automatically prunes the tree to include only those languages whose ISO codes are present in the WALS database. We configure BEASTling to exclude any features which have known values for less than 25\% of languages. We also manually exclude 3 WALS features (IDs 95A, 96A and 97A) which are not features in their own right, but instead encode the relationship between other features (these feature exclusions are specified in the BEASTling configuration file and do not require editing of the data file). The final analysis involves 169 Austronesian languages and 25 WALS features (see Supplementary Material for a full discussion of the languages and features involved). A Lewis Mk model is specified for the data, with rate variation across features enabled. The inferred per-feature substitution rates are the subject of interest. Since the tree is fixed to a known value, BEASTling automatically disables tree logging to save disk space.

The inferred rates of change of these typological features show a slightly wider variation than the lexical rates of change in the Indo-European example above. The fastest changing feature has a rate around 27 times higher than the slowest changing feature. Table \ref{tab:austro} shows the 10 slowest and fastest changing features, while Figure \ref{fig:austro} shows a histogram and fitted distribution of the relative substitution rates across WALS features, which indicates that most features have a rate close to the average while below average rates are more common than above average rates. Many of the slowest features are categorised by WALS as word order features, consistent with a previous finding that these are some of the most stable structural features\cite{Dediu2013}.
The inferred rates of change of these typological features show a wider variation than the lexical rates of change in the Indo-European example above. The fastest changing feature has a rate around 27 times higher than the slowest changing feature. Table \ref{tab:austro} shows the 10 slowest and fastest changing features, while Figure \ref{fig:austro} shows a histogram and fitted distribution of the relative substitution rates across WALS features, which indicates that most features have a rate close to the average while below average rates are more common than above average rates. Many of the slowest features are categorised by WALS as word order features, consistent with a previous finding that these are some of the most stable structural features\cite{Dediu2013}.

\begin{table*}[ht]
\begin{center}
\input{examples/austronesian/table.tex}
\end{center}
\caption{\textbf{Table \ref{tab:austro}}. \textbf{Relative substitution rates of the ten slowest and fastest changing features in our example analysis of Austronesian typological data}. Rates are relative to the average across all features, i.e. rates above 1.0 indicate faster than average evolution while rates below 1.0 indicate slower than average evolution. Note that many of the slowest features relate to word order.}
\caption{\textbf{Table \ref{tab:austro}}. \textbf{Relative substitution rates of the ten slowest and fastest changing features in our example analysis of Austronesian typological data}. Rates are relative to the average across all features. Note that many of the slowest features relate to word order.}
\label{tab:austro}
\end{table*}

Expand Down

0 comments on commit 1eabd94

Please sign in to comment.