Finding Cognate Groups Using Phylogenies

A central problem in historical linguistics is the identification of historically related cognate words.

A crowning achievement of historical linguistics is the comparative method (Ohala, 1993), wherein linguists use word similarity to elucidate the hidden phonological and morphological processes which govern historical descent.

In this section, we describe a new generative model for vocabulary lists in multiple related languages given the phylogenetic relationship between the languages (their family tree).

In this section, we discuss the inference method for determining cognate assignments under fixed parameters go.

So far we have only addressed searching for Viterbi alignments 7r under fixed parameters.

In our model, it is not just the edit distances that are finite state machines.

Recall that in inference, when summing out interior nodes 212,- we calculated the product over incoming messages MHz-(mi) (Equation 1), and that these products are calculated using transducer composition.

We conduct three experiments.

We presented a new generative model of word lists that automatically finds cognate groups from scrambled vocabulary lists.

Appears in 21 sentences as: unigram (11) Unigrams (10) unigrams (3) unigrams.” (1)

In *Finding Cognate Groups Using Phylogenies*

- To find this maximizer for any given 7m, we need to find a marginal distribution over the edges connecting any two languages a and d. With this distribution, we calculate the expected “alignment unigrams.” That is, for each pair of phonemes cc and y (or empty phoneme 5), we need to find the quantity:Page 5, “Learning”
- In the context of transducers, previous authors have focused on a combination of n-best lists and unigram back-off models (Dreyer and Eisner, 2009), a schematic diagram of which is in Figure 2(d).Page 6, “Message Approximation”
- Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009).Page 6, “Message Approximation”
- Another is to choose 7'(w) to be a unigram language model over the language in question with a geometric probability over lengths.Page 7, “Message Approximation”
- The first is a plain unigram model, the second is a bigram model, and the third is an anchored unigram topology: a position-specific unigram model for each position up to some maximum length.Page 7, “Message Approximation”
- The first we consider is a standard unigram model, which is illustrated in Figure 2(a).Page 7, “Message Approximation”
- It is similar to the unigram topology except that, instead of a single state, we have a state for each phoneme in 2, along with a special start state.Page 7, “Message Approximation”
- Normalization is similar to the unigram case, except that we normalize the transitions from each state.Page 7, “Message Approximation”
- The final topology we consider is the positional unigram model in Figure 2(c).Page 7, “Message Approximation”
- Namely, for each position (up to some maximum position), we have a unigram model over phonemes emitted at that position, along with the probability of stopping at that position (i.e.Page 7, “Message Approximation”
- Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams , with and without learning the parametric edit distances.Page 8, “Experiments”

See all papers in *Proc. ACL 2010* that mention unigram.

See all papers in *Proc. ACL* that mention unigram.

Back to top.

Appears in 14 sentences as: edit distance (6) edit distances (8)

In *Finding Cognate Groups Using Phylogenies*

- In this paper, each (pg takes the form of a parameterized edit distance similar to the standard Levenshtein distance.Page 2, “Model”
- practice, it is important to estimate better parametric edit distances cpg and survival variables 85.Page 4, “Learning”
- However, a naively constructed edit distance , which for example might penalize vowel substitutions lightly, would fail to learn that Latin words that are borrowed into English would not undergo the sound change /I/—>/eI/.Page 4, “Learning”
- For the transducers go, we learn parameterized edit distances that model the probabilities of different sound changes.Page 4, “Learning”
- For each (pg we fit a nonuniform substitution, insertion, and deletion matrix 0(30, These edit distances define a condi-Page 4, “Learning”
- In our model, it is not just the edit distances that are finite state machines.Page 5, “Transducers and Automata”
- Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances .Page 8, “Experiments”
- When we did not use learning, we set the parameters of the edit distance to (0, -3, -4) for matches, substitutions, and deletions/insertions, respectively.Page 8, “Experiments”
- Levenshtein refers to fixed parameter edit distance transducer.Page 8, “Experiments”
- Learned refers to automatically learned edit distances .Page 8, “Experiments”
- This makes sense because the default edit distance parameter settings strongly favor exact matches, while the learned transducers learn more realistic substitution and deletion matrices, at the expense of making more mistakes.Page 9, “Experiments”

See all papers in *Proc. ACL 2010* that mention edit distances.

See all papers in *Proc. ACL* that mention edit distances.

Back to top.

Appears in 9 sentences as: bigram (4) Bigrams (3) bigrams (3)

In *Finding Cognate Groups Using Phylogenies*

- Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009).Page 6, “Message Approximation”
- The first is a plain unigram model, the second is a bigram model, and the third is an anchored unigram topology: a position-specific unigram model for each position up to some maximum length.Page 7, “Message Approximation”
- The second topology we consider is the bigram topology, illustrated in Figure 2(b).Page 7, “Message Approximation”
- We follow prior work and use sets of bigrams within words.Page 8, “Experiments”
- In our case, during bipartite matching the set X is the set of bigrams in the language being re-permuted, and Y is the union of bigrams in the other languages.Page 8, “Experiments”
- Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances.Page 8, “Experiments”
- Levenshtein Unigrams 37.2 26.2 Levenshtein Bigrams 43.0 26.5 Levenshtein Anch.Page 8, “Experiments”
- Both Unigrams and Bigrams significantly under-perform the baseline, while Anchored Unigrams easily outperforms it both with and without learning.Page 8, “Experiments”
- With the remainder, we executed a bipartite matching based on bigram overlap.Page 9, “Experiments”

See all papers in *Proc. ACL 2010* that mention bigram.

See all papers in *Proc. ACL* that mention bigram.

Back to top.

Appears in 4 sentences as: generative model (4)

In *Finding Cognate Groups Using Phylogenies*

- In this paper, we present a new generative model for the automatic induction of cognate groups given only (1) a known family tree of languages and (2) word lists from those languages.Page 2, “Introduction”
- In this section, we describe a new generative model for vocabulary lists in multiple related languages given the phylogenetic relationship between the languages (their family tree).Page 2, “Model”
- Figure 1(a) graphically describes our generative model for three Romance languages: Italian, Portuguese, and Spanish.1 In each cognate group, each word W5 is generated from its parent according to a conditional distribution with parameter (pg, which is specific to that edge in the tree, but shared between all cognate groups.Page 2, “Model”
- We presented a new generative model of word lists that automatically finds cognate groups from scrambled vocabulary lists.Page 9, “Conclusion”

See all papers in *Proc. ACL 2010* that mention generative model.

See all papers in *Proc. ACL* that mention generative model.

Back to top.

Appears in 4 sentences as: probability distribution (2) probability distributions (2)

In *Finding Cognate Groups Using Phylogenies*

- Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006).Page 5, “Learning”
- An alternative approach might be to simply treat messages as unnormalized probability distributions , and to minimize the KL divergence be-Page 6, “Message Approximation”
- tween some approximating message mm) and the true message However, messages are not always probability distributions and — because the number of possible strings is in principle infinite —they need not sum to a finite number.5 Instead, we propose to minimize the KL divergence between the “expected” marginal distribution and the approximated “expected” marginal distribution:Page 6, “Message Approximation”
- The procedure for calculating these statistics is described in Li and Eisner (2009), which amounts to using an expectation semiring (Eisner, 2001) to compute expected transitions in 7' o 71* under the probability distribution 7' o ,u.Page 7, “Message Approximation”

See all papers in *Proc. ACL 2010* that mention probability distribution.

See all papers in *Proc. ACL* that mention probability distribution.

Back to top.

Appears in 3 sentences as: alignment model (2) alignment models (1)

In *Finding Cognate Groups Using Phylogenies*

- Finally, an alignment model maps the flat word lists to cognate groups.Page 2, “Introduction”
- Inference requires a combination of message-passing in the evolutionary model and iterative bipartite graph matching in the alignment model .Page 2, “Introduction”
- This task highlights the evolution and alignment models .Page 7, “Experiments”

See all papers in *Proc. ACL 2010* that mention alignment model.

See all papers in *Proc. ACL* that mention alignment model.

Back to top.

Appears in 3 sentences as: conditional probabilities (2) conditional probability (1)

In *Finding Cognate Groups Using Phylogenies*

- The edge affinities afl' between these nodes are the conditional probabilities of each word wg belonging to each cognate group 9:Page 4, “Inference of Cognate Assignments”
- Because we enforce that a word in language d must be dead if its parent word in language a is dead, we just need to learn the conditional probabilities p(Sd = dead|Sa = alive).Page 4, “Learning”
- Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006).Page 5, “Learning”

See all papers in *Proc. ACL 2010* that mention conditional probabilities.

See all papers in *Proc. ACL* that mention conditional probabilities.

Back to top.