Flat ITG Model | (a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;). |
Flat ITG Model | (b) If cc 2 REG, a regular ITG rule, generate phrase pairs (el, f1) and <62, f2) from Pflat, and concatenate them into a single phrase pair (6162, f1f2>. |
Flat ITG Model | While the previous formulation can be used as-is in maximum likelihood training, this leads to a degenerate solution where every sentence is memorized as a single phrase pair . |
Introduction | The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs . |
Introduction | The model is similar to previously proposed phrase alignment models based on inversion transduction grammars (ITGs) (Cherry and Lin, 2007; Zhang et al., 2008; Blunsom et al., 2009), with one important change: ITG symbols and phrase pairs are generated in the opposite order. |
Introduction | In traditional ITG models, the branches of a biparse tree are generated from a nonterminal distribution, and each leaf is generated by a word or phrase pair distribution. |
Clustering phrase pairs directly using the K-means algorithm | Using multiple word clusterings simultaneously, each based on a different number of classes, could turn this global, hard tradeoff into a local, soft one, informed by the number of phrase pair instances available for a given granularity. |
Clustering phrase pairs directly using the K-means algorithm | We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors, i.e., points of a vector space. |
Clustering phrase pairs directly using the K-means algorithm | then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label. |
Hard rule labeling from word classes | (2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences. |
Hard rule labeling from word classes | We convert each extracted phrase pair , represented by its source span (2', j) and target span (19,6), into an initial rule |
Hard rule labeling from word classes | Then (depending on the extracted phrase pairs ), the resulting initial rules could be: |
Introduction | Zollmann and Venugopal (2006) directly extend the rule extraction procedure from Chiang (2005) to heuristically label any phrase pair based on target language parse trees. |
PSCFG-based translation | Chiang (2005) learns a single-nonterminal PSCFG from a bilingual corpus by first identifying initial phrase pairs using the technique from Koehn et al. |
PSCFG-based translation | (2003), and then performing a generalization operation to generate phrase pairs with gaps, which can be viewed as PSCFG rules with generic ‘X’ nonterminal left-hand-sides and substitution sites. |
Conclusion | The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments. |
Experimental Results | In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs . |
Experimental Results | Table 3: Phrase pair extraction accuracy for phrase pairs up to length 5. |
Experimental Results | Possible links are both included and excluded from phrase pairs during extraction, as in DeNero and Klein (2010). |