Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
Kim, Young-Bum and Snyder, Benjamin

Article Structure

Abstract

In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet.

Introduction

Over the past centuries, dozens of lost languages have been deciphered through the painstaking work of scholars, often after decades of slow progress and dead ends.

Related Work

The most direct precedent to the present work is a section in Knight et al.

Model

Our generative Bayesian model over the observed vocabularies of hundreds of languages is

Inference

In this section we detail the inference procedure we followed to make predictions under our model.

Experiments

To test our model, we apply it to a corpus of 503 languages for two decipherment tasks.

Results

The results of our experiments are shown in Table 2.

Analysis

To further compare our model to the EM baseline, we show confusion matrices for the three-way classification task in Figure 3.

Conclusion

In this paper, we presented a successful solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet.

Topics

hyperparameters

Appears in 8 sentences as: hyperparameter (2) hyperparameters (7)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
    Page 5, “Inference”
  2. Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
    Page 5, “Inference”
  3. To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:
    Page 5, “Inference”
  4. Sampling film To sample the Dirichlet hyperparameter for cluster k and tag 75 we need to compute:
    Page 6, “Inference”
  5. The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
    Page 7, “Experiments”
  6. Next we examine the transition Dirichlet hyperparameters learned by our model.
    Page 8, “Analysis”
  7. As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
    Page 8, “Analysis”
  8. Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
    Page 8, “Analysis”

See all papers in Proc. ACL 2013 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

CLUST

Appears in 6 sentences as: CLUST (8)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. EM 93.37 74.59 ft SYMM 95.99 80.72 MERGE 97.14 86.13 CLUST 98.85 89.37 33 EM 94.50 74.53 E SYMM 96.18 78.13 E MERGE 97.66 86.47 CLUST 98.55 89.07 .5 EM 92.93 78.26 3 SYMM 95.90 79.04 g MERGE 96.06 83.78 2 CLUST 97.03 85.79
    Page 7, “Experiments”
  2. Finally, we consider the full version of our model, CLUST , with 20 language clusters.
    Page 7, “Experiments”
  3. Figure 3: Confusion matrix for CLUST (left) and EM (right).
    Page 8, “Results”
  4. Both MERGE and CLUST break symmetries over tags by way of the asymmetric posterior over transition Dirichlet parameters.
    Page 8, “Results”
  5. Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
    Page 8, “Analysis”
  6. Finally, we examine the relationship between the induced clusters and language families in Table 3, for the trigram consonant vs. vowel CLUST model with 20 clusters.
    Page 9, “Analysis”

See all papers in Proc. ACL 2013 that mention CLUST.

See all papers in Proc. ACL that mention CLUST.

Back to top.

bigram

Appears in 4 sentences as: bigram (5)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. We note that in practice, we implemented a trigram version of the model,2 but we present the bigram version here for notational clarity.
    Page 3, “Model”
  2. where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
    Page 5, “Inference”
  3. where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
    Page 6, “Inference”
  4. with a bigram HMM with four language clusters.
    Page 9, “Analysis”

See all papers in Proc. ACL 2013 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

Gibbs sampling

Appears in 4 sentences as: Gibbs sampler (1) Gibbs Sampling (1) Gibbs sampling (2)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. 4.2 Gibbs Sampling
    Page 5, “Inference”
  2. To sample values (tg,z, a, [3) from their posterior (the integrand of Equation 1), we use Gibbs sampling , a Monte Carlo technique that constructs a Markov chain over a high-dimensional sample space by iteratively sampling each variable conditioned on the currently drawn sample values for the others, starting from a random initialization.
    Page 5, “Inference”
  3. our Gibbs sampling inference method for the type-based HMM, even in the absence of multilingual priors.
    Page 7, “Experiments”
  4. Simply using our Gibbs sampler with symmetric priors boosts the performance up to 96%.
    Page 7, “Results”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

classification task

Appears in 3 sentences as: classification task (3)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. On a three-way classification task between vowels, nasals, and non-nasal consonants, our model yields unsupervised accuracy of 89% across the same set of languages.
    Page 1, “Abstract”
  2. To further compare our model to the EM baseline, we show confusion matrices for the three-way classification task in Figure 3.
    Page 8, “Analysis”
  3. We further experimented on a three-way classification task involving nasal characters, achieving nearly 90% accuracy.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention classification task.

See all papers in Proc. ACL that mention classification task.

Back to top.

unigram

Appears in 3 sentences as: unigram (5)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
  1. where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
    Page 5, “Inference”
  2. where is the unigram count of character w, and n(t’) is the unigram count of tag 75, over all characters tokens (including 7.0).
    Page 5, “Inference”
  3. where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
    Page 6, “Inference”

See all papers in Proc. ACL 2013 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.