A Generic Phrase Training Procedure | We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair. |
A Generic Phrase Training Procedure | Beginning with a flat lexicon, we train IBM Model-l word alignment model with 10 iterations for each translation direction. |
A Generic Phrase Training Procedure | We then train HMM word alignment models (Vogel et al., 1996) in two directions simultaneously by merging statistics collected in the |
Abstract | Experimental results demonstrate consistent and significant improvement over the widely used method that is based on word alignment matrix only. |
Introduction | The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003). |
Introduction | Other methods do not depend on word alignments only, such as directly modeling phrase alignment in a joint generative way (Marcu and Wong, 2002), pursuing information extraction perspective (Venugopal et al., 2003), or augmenting with model-based phrase pair posterior (Deng and Byrne, 2005). |
Introduction | On the other hand, there are valid translation pairs in the training corpus that are not learned due to word alignment errors as shown in Deng and Byrne (2005). |
Abstract | Automatic word alignment is a key step in training statistical machine translation systems. |
Abstract | Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. |
Introduction | The word alignment problem has received much recent attention, but improvements in standard measures of word alignment performance often do not result in better translations. |
Introduction | In this work, we show that by changing the way the word alignment models are trained and |
Introduction | We present extensive experimental results evaluating a new training scheme for unsupervised word alignment models: an extension of the Expectation Maximization algorithm that allows effective injection of additional information about the desired alignments into the unsupervised training process. |
Statistical word alignment | Statistical word alignment (Brown et al., 1994) is the task identifying which words are translations of each other in a bilingual sentence corpus. |
Statistical word alignment | Figure 2 shows two examples of word alignment of a sentence pair. |
Statistical word alignment | Due to the ambiguity of the word alignment task, it is common to distinguish two kinds of alignments (Och and Ney, 2003). |
Inflection prediction models | ture of English and word alignment information. |
Integration of inflection models with MT systems | Stemming the target sentences is expected to be helpful for word alignment, especially when the stemming operation is defined so that the word alignment becomes more one-to-one (Goldwater and McClosky, 2005). |
Integration of inflection models with MT systems | However, for some language pairs, stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target. |
Integration of inflection models with MT systems | Note that it may be better to use the word alignment maintained as part of the translation hypotheses during search, but our solution is more suitable to situations where these can not be easily obtained. |
Introduction | Evidence for this difficulty is the fact that there has been very little work investigating the use of such independent sub-components, though we started to see some successful cases in the literature, for example in word alignment (Fraser and Marcu, 2007), target language capitalization (Wang et al., 2006) and case marker generation (Toutanova and Suzuki, 2007). |
MT performance results | Finally, we can see that using stemming at the word alignment stage further improved both the oracle and the achieved results. |
MT performance results | pressed in English at the word level, these results are consistent with previous results using stemming to improve word alignment . |
Machine translation systems and data | It uses the same lexicalized-HMM model for word alignment as the treelet system, and uses the standard extraction heuristics to extract phrase pairs using forward and backward alignments. |
Abstract | Incorporating a sparse prior using Variational Bayes, biases the models toward generalizable, parsimonious parameter sets, leading to significant improvements in word alignment . |
Abstract | This preference for sparse solutions together with effective pruning methods forms a phrase alignment regimen that produces better end-to-end translations than standard word alignment approaches. |
Bootstrapping Phrasal ITG from Word-based ITG | The scope of iterative phrasal ITG training, therefore, is limited to determining the boundaries of the phrases anchored on the given one-to-one word alignments . |
Bootstrapping Phrasal ITG from Word-based ITG | Second, we do not need to worry about non-ITG word alignments , such as the (2, 4, l, 3) permutation patterns. |
Bootstrapping Phrasal ITG from Word-based ITG | Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair. |
Experiments | 7.1 Word Alignment Evaluation |
Experiments | The output of the word alignment systems (GIZA++ or ITG) were fed to a standard phrase extraction procedure that extracted all phrases of length up to 7 and estimated the conditional probabilities of source given target and target given source using relative frequencies. |
Introduction | As these word-level alignment models restrict the word alignment complexity by requiring each target word to align to zero or one source words, results are improved by aligning both source-to-target as well as target-to-source, |
Introduction | Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair; these form the basis of the decoding process. |
Introduction | Furthermore it would obviate the need for heuristic combination of word alignments . |
Phrasal Inversion Transduction Grammar | First we train a lower level word alignment model, then we place hard constraints on the phrasal alignment space using confident word links from this simpler model. |
Experiments | It is not surprising, since Bannard and Callison-Burch (2005) have pointed out that word alignment error is the major factor that influences the performance of the methods learning paraphrases from bilingual corpora. |
Experiments | The LW based features validate the quality of word alignment and assign low scores to those aligned EC pattern pairs with incorrect alignment. |
Introduction | parsing and English-foreign language word alignment , (2) aligned patterns induction, which produces English patterns along with the aligned pivot patterns in the foreign language, (3) paraphrase patterns extraction, in which paraphrase patterns are extracted based on a log-linear model. |
Proposed Method | We conduct word alignment with Giza++ (Och and Ney, 2000) in both directions and then apply the grow-diag heuristic (Koehn et al., 2005) for symmetrization. |
Proposed Method | where a denotes the word alignment between 6 and 6. n is the number of words in 6. |
Cohesive Decoding | showed that a soft cohesion constraint is superior to a hard constraint for word alignment . |
Cohesive Phrasal Output | Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003). |
Experiments | Word alignments are provided by GIZA++ (Och and Ney, 2003) with grow-diag-final combination, with infrastructure for alignment combination and phrase extraction provided by the shared task. |
Introduction | Fox (2002) showed that cohesion is held in the vast majority of cases for English-French, while Cherry and Lin (2006) have shown it to be a strong feature for word alignment . |
Conclusions and Future Work | In addition, word alignment is a hard constraint in our rule extraction. |
Conclusions and Future Work | We will study direct structure alignments to reduce the impact of word alignment errors. |
Experiments | We used GIZA++ (Och and Ney, 2004) and the heuristics “grow-diag-final” to generate m-to-n word alignments . |
Experiments | (2006) reports that discontinuities are very useful for translational equivalence analysis using binary-branching structures under word alignment and parse tree constraints while they are almost of no use if under word alignment constraints only. |
Automatic Evaluation Metrics | Given a pair of strings to compare (a system translation and a reference translation), METEOR (Banerjee and Lavie, 2005) first creates a word alignment between the two strings. |
Automatic Evaluation Metrics | These word alignments are created incrementally through a series of stages, where each stage only adds alignments between unigrams which have not been matched in previous stages. |
Introduction | Although a maximum weight bipartite graph was also used in the recent work of (Taskar et al., 2005), their focus was on learning supervised models for single word alignment between sentences from a source and target language. |
Model Training and Application 3.1 Training | For the bilingual corpus, we also perform word alignment to get correspondences between source and target words. |
Model Training and Application 3.1 Training | According to word alignment results, we classify |
Model Training and Application 3.1 Training | We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair. |