A Probabilistic Model for Phrase Table Extraction | If 6 takes the form of a scored phrase table , we can use traditional methods for phrase-based SMT to find P(e|f, 6) and concentrate on creating a model for P(6| (5 , .7: We decompose this posterior probability using Bayes law into the corpus likelihood and parameter prior probabilities |
Abstract | This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. |
Abstract | Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size. |
Flat ITG Model | The traditional flat ITG generative probability for a particular phrase (or sentence) pair Pflat((e, f ); 635, 67;) is parameterized by a phrase table 6,; and a symbol distribution 635. |
Flat ITG Model | (a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;). |
Flat ITG Model | We assign 635 a Dirichlet priorl, and assign the phrase table parameters 67; a prior using the Pitman-Yor process (Pitman and Yor, 1997; Teh, 2006), which is a generalization of the Dirichlet process prior used in previous research. |
Introduction | This phrase table is traditionally generated by going through a pipeline of two steps, first generating word (or minimal phrase) alignments, then extracting a phrase table that is consistent with these alignments. |
Introduction | phrase tables that are used in translation. |
Introduction | This makes it possible to directly use probabilities of the phrase model as a replacement for the phrase table generated by heuristic extraction techniques. |
Models 2.1 Baseline Models | Table 1: Morpheme occurences in the phrase table and in translation. |
Related Work | They use a segmented phrase table and language model along With the word-based versions in the decoder and in tuning a Finnish target. |
Related Work | Habash (2007) provides various methods to incorporate morphological variants of words in the phrase table in order to help recognize out of vocabulary words in the source language. |