Conclusion | We have presented a probabilistic model for bilingual grammar induction which uses raw parallel text to learn tree pairs and their alignments. |
Experimental setup | During preprocessing of the corpora we remove all punctuation marks and special symbols, following the setup in previous grammar induction work (Klein and Manning, 2002). |
Introduction | We test the effectiveness of our bilingual grammar induction model on three corpora of parallel text: English-Korean, English-Urdu and English-Chinese. |
Related Work | The unsupervised grammar induction task has been studied extensively, mostly in a monolingual setting (Charniak and Carroll, 1992; Stolcke and Omohundro, 1994; Klein and Manning, 2002; Seginer, 2007). |
Related Work | We know of only one study which evaluates these bilingual grammar formalisms on the task of grammar induction itself (Smith and Smith, 2004). |
Related Work | In contrast to this work, our goal is to explore the benefits of multilingual grammar induction in a fully unsupervised setting. |
Introduction | For example, Smith and Eisner (2006) have penalized the approximate posterior over dependency structures in a natural language grammar induction task to avoid long range dependencies between words. |
Introduction | We show that empirically, injecting prior knowledge improves performance on an unsupervised Chinese grammar induction task. |
Variational Mixtures with Constraints | This is a strict model reminiscent of the successful application of structural bias to grammar induction (Smith and Eisner, 2006). |
Variational Mixtures with Constraints | We demonstrated the effectiveness of the algorithm on a dependency grammar induction task. |