Abstract | As a side effect, the phrase table size is reduced by more than 80%. |
Alignment | To be able to perform the re-computation in an efficient way, we store the source and target phrase marginal counts for each phrase in the phrase table . |
Alignment | It is then straightforward to compute the phrase counts after leaving-one-out using the phrase probabilities and marginal counts stored in the phrase table . |
Alignment | From the initial phrase table , each of these blocks only loads the phrases that are required for alignment. |
Introduction | The most common method for obtaining the phrase table is heuristic extraction from automatically word-aligned bilingual training data (Och et al., 1999). |
Phrase Model Training | 4.3 Phrase Table Interpolation |
Phrase Model Training | As (DeNero et al., 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using lo g-linear interpolation of the estimated model with the original model. |
Phrase Model Training | When interpolating phrase tables containing different sets of phrase pairs, we retain the intersection of the two. |
Related Work | Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999). |
Related Work | In addition, (DeNero et al., 2006) found that the trained phrase table shows a highly peaked distribution in opposition to the more flat distribution resulting from heuristic extraction, leaving the decoder only few translation options at decoding time. |
Related Work | They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality. |
Abstract | We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. |
Conclusion | Then the collocation information was employed to improve BWA for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Conclusion | To improve phrase table , we calculate phrase collocation probabilities based on word collocation probabilities. |
Experiments on Phrase-Based SMT | We also investigate the performance of the system employing both the word alignment improvement and phrase table improvement methods. |
Introduction | Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Introduction | To improve phrase table , we calculate phrase collocation probabilities based on word collocation probabilities. |
Introduction | In section 3 and 4, we show how to improve the BWA method and the phrase table using collocation models respectively. |
Error Analysis | 12After having the MERT parameters, we add the 600 dev sentences back into the training corpus, retrain GIZA, and then estimate a new phrase table on all 5600 sentences. |
Previous Work | do not compete with internal phrase tables . |
Previous Work | (2008) use a tagger to identify good candidates for transliteration (which are mostly NEs) in input text and add transliterations to the SMT phrase table dynamically such that they can directly compete with translations during decoding. |
Experimental Setup and Results | Phrase table entries for the surface factors produced by Moses after it does an alignment on the roots, contain the English (e) and Turkish (t) parts of a pair of aligned phrases, and the probabilities, p(e|t), the conditional probability that the English phrase is 6 given that the Turkish phrase is t, and p(t|e), the conditional probability that the Turkish phrase is t given the English phrase is 6. |
Experimental Setup and Results | Among these phrase table entries, those with p(e|t) m p(t|e) and p(t|e) + p(e|t) larger than some threshold, can be considered as reliable mutual translations, in that they mostly translate to each other and not much to others. |
Experimental Setup and Results | from the phrase table those phrases with 0.9 S p(elt)/p(tle) S 1.1 and Mile) + Melt) 2 1.5 and added them to the training data to further bias the alignment process. |