Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Head-Driven HPB Translation Model | We extract HD-HRs and NRRs based on initial phrase pairs , respectively. |
Head-Driven HPB Translation Model | We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads. |
Experiments | We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations. |
Related Work | (2010) introduce topic model for filtering topic-mismatched phrase pairs . |
Related Work | Similarly, each phrase pair is also assigned with one specific topic. |
Related Work | A phrase pair will be discarded if its topic mismatches the document topic. |
Model Description | Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003). |
Model Description | Phrase pair probabilities p(€|7) are computed from these as described in Koehn et al. |
Model Description | For example, if topic 1:: is dominant in T, pk(é|7) may be quite large, but if p(k:|V) is very small, then we should steer away from this phrase pair and select a competing phrase pair which may have a lower probability in T, but which is more relevant to the test sentence at hand. |
Discussion of Translation Results | 0 Baseline system translation output: 44.6% o Phrase pairs matching source n-grams: 67.8% |
Inference during Translation Decoding | 0 Extend a hypothesis with a new phrase pair 0 Recombine hypotheses with identical states |
Related Work | Och (1999) showed a method for inducing bilingual word classes that placed each phrase pair into a two-dimensional equivalence class. |
Abstract | To prevent overf1tting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence. |
Abstract | A set of phrase pairs are extracted from word-aligned parallel corpus according to phrase extraction rules (Koehn et al., 2003). |
Abstract | We use the word translation table from IBM Model 1 (Brown et al., 1993) and compute the sum over all possible word alignments within a phrase pair without normalizing for length (Quirk et al., 2005). |
Alignment Methods | Phrasal ITGs are ITGs that allow for non-terminals that can emit phrase pairs with multiple elements on both the source and target sides. |
LookAhead Biparsing | This probability is the combination of the generative probability of each phrase pair Pt(e:, f3) as well as the sum the probabilities over all shorter spans in straight and inverted order2 |
Substring Prior Probabilities | While the Bayesian phrasal ITG framework uses the previously mentioned phrase distribution Pt during search, it also allows for definition of a phrase pair prior probability Ppmofieg, f3), which can efficiently seed the search process with a bias towards phrase pairs that satisfy certain properties. |
Baselines | Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é| f) to a small epsilon value. |
Baselines | Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded. |
Ensemble Decoding | probability for each phrase pair (6, f) is given by: |