Article Structure
Abstract
In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet.
Introduction
Over the past centuries, dozens of lost languages have been deciphered through the painstaking work of scholars, often after decades of slow progress and dead ends.
Related Work
The most direct precedent to the present work is a section in Knight et al.
Model
Our generative Bayesian model over the observed vocabularies of hundreds of languages is
Inference
In this section we detail the inference procedure we followed to make predictions under our model.
Experiments
To test our model, we apply it to a corpus of 503 languages for two decipherment tasks.
Results
The results of our experiments are shown in Table 2.
Analysis
To further compare our model to the EM baseline, we show confusion matrices for the three-way classification task in Figure 3.
Conclusion
In this paper, we presented a successful solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet.
Topics
hyperparameters
Appears in 8 sentences as: hyperparameter (2) hyperparameters (7)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
Page 5, “Inference”
- Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
Page 5, “Inference”
- To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:
Page 5, “Inference”
- Sampling film To sample the Dirichlet hyperparameter for cluster k and tag 75 we need to compute:
Page 6, “Inference”
- The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
Page 7, “Experiments”
- Next we examine the transition Dirichlet hyperparameters learned by our model.
Page 8, “Analysis”
- As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
Page 8, “Analysis”
- Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Page 8, “Analysis”
See all papers in Proc. ACL 2013 that mention hyperparameters.
See all papers in Proc. ACL that mention hyperparameters.
Back to top.
CLUST
Appears in 6 sentences as: CLUST (8)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- EM 93.37 74.59 ft SYMM 95.99 80.72 MERGE 97.14 86.13 CLUST 98.85 89.37 33 EM 94.50 74.53 E SYMM 96.18 78.13 E MERGE 97.66 86.47 CLUST 98.55 89.07 .5 EM 92.93 78.26 3 SYMM 95.90 79.04 g MERGE 96.06 83.78 2 CLUST 97.03 85.79
Page 7, “Experiments”
- Finally, we consider the full version of our model, CLUST , with 20 language clusters.
Page 7, “Experiments”
- Figure 3: Confusion matrix for CLUST (left) and EM (right).
Page 8, “Results”
- Both MERGE and CLUST break symmetries over tags by way of the asymmetric posterior over transition Dirichlet parameters.
Page 8, “Results”
- Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Page 8, “Analysis”
- Finally, we examine the relationship between the induced clusters and language families in Table 3, for the trigram consonant vs. vowel CLUST model with 20 clusters.
Page 9, “Analysis”
See all papers in Proc. ACL 2013 that mention CLUST.
See all papers in Proc. ACL that mention CLUST.
Back to top.
bigram
Appears in 4 sentences as: bigram (5)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- We note that in practice, we implemented a trigram version of the model,2 but we present the bigram version here for notational clarity.
Page 3, “Model”
- where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
Page 5, “Inference”
- where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
Page 6, “Inference”
- with a bigram HMM with four language clusters.
Page 9, “Analysis”
See all papers in Proc. ACL 2013 that mention bigram.
See all papers in Proc. ACL that mention bigram.
Back to top.
Gibbs sampling
Appears in 4 sentences as: Gibbs sampler (1) Gibbs Sampling (1) Gibbs sampling (2)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- 4.2 Gibbs Sampling
Page 5, “Inference”
- To sample values (tg,z, a, [3) from their posterior (the integrand of Equation 1), we use Gibbs sampling , a Monte Carlo technique that constructs a Markov chain over a high-dimensional sample space by iteratively sampling each variable conditioned on the currently drawn sample values for the others, starting from a random initialization.
Page 5, “Inference”
- our Gibbs sampling inference method for the type-based HMM, even in the absence of multilingual priors.
Page 7, “Experiments”
- Simply using our Gibbs sampler with symmetric priors boosts the performance up to 96%.
Page 7, “Results”
See all papers in Proc. ACL 2013 that mention Gibbs sampling.
See all papers in Proc. ACL that mention Gibbs sampling.
Back to top.
classification task
Appears in 3 sentences as: classification task (3)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- On a three-way classification task between vowels, nasals, and non-nasal consonants, our model yields unsupervised accuracy of 89% across the same set of languages.
Page 1, “Abstract”
- To further compare our model to the EM baseline, we show confusion matrices for the three-way classification task in Figure 3.
Page 8, “Analysis”
- We further experimented on a three-way classification task involving nasal characters, achieving nearly 90% accuracy.
Page 9, “Conclusion”
See all papers in Proc. ACL 2013 that mention classification task.
See all papers in Proc. ACL that mention classification task.
Back to top.
unigram
Appears in 3 sentences as: unigram (5)
In Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
- where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
Page 5, “Inference”
- where is the unigram count of character w, and n(t’) is the unigram count of tag 75, over all characters tokens (including 7.0).
Page 5, “Inference”
- where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
Page 6, “Inference”
See all papers in Proc. ACL 2013 that mention unigram.
See all papers in Proc. ACL that mention unigram.
Back to top.