Abstract | Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data. |
Alignment | Therefore those long phrases are trained to fit only a few sentence pairs, strongly overestimating their translation probabilities and failing to generalize. |
Alignment | When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair. |
Introduction | The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities , is one of the main components of an SMT system. |
Introduction | In this work, we propose to directly train our phrase models by applying a forced alignment procedure where we use the decoder to find a phrase alignment between source and target sentences of the training data and then updating phrase translation probabilities based on this alignment. |
Phrase Model Training | We have developed two different models for phrase translation probabilities which make use of the force-aligned training data. |
Phrase Model Training | The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment. |
Phrase Model Training | The translation probability of a phrase pair (1216) is estimated as |
Related Work | That in turn leads to over-fitting which shows in overly deter-minized estimates of the phrase translation probabilities . |
Related Work | They show that by applying a prior distribution over the phrase translation probabilities they can prevent over-fitting. |
Substructure Spaces for BTKs | The lexical features with directions are defined as conditional feature functions based on the conditional lexical translation probabilities . |
Substructure Spaces for BTKs | where P(v|u) refers to the lexical translation probability from the source word u to the target word 12 within the subtree spans, while P(u|v) refers to that from target to source; in(S) refers to the word set for the internal span of the source subtree 5, while in(T) refers to that of the target subtree T. |
Substructure Spaces for BTKs | Internal—External Lexical Features: These features are motivated by the fact that lexical translation probabilities within the translational equivalence tend to be high, and that of the non- |
Experiments | Here, s/t represent the source/target part of a rule in PTT, TRS, or PRS; and are translation probabilities and lexical weights of rules from PTT, TRS, and PRS. |
Fine-grained rule extraction | Maximum likelihood estimation is used to calculate the translation probabilities of each rule. |
Introduction | Table l: Bidirectional translation probabilities of rules, denoted in the brackets, change when voice is attached to “killed”. |
Introduction | As will be testified by our experiments, we argue that the simple POS/phrasal tags are too coarse to reflect the accurate translation probabilities of the translation rules. |
Related Work | (2006) constructed a derivation-forest, in which composed rules were generated, unaligned words of foreign language were consistently attached, and the translation probabilities of rules were estimated by using Expectation-Maximization (EM) (Dempster et al., 1977) training. |