Index of papers in Proc. ACL 2010 that mention
  • word alignment
Chen, Wenliang and Kazama, Jun'ichi and Torisawa, Kentaro
Abstract
In our method, a target-side tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned.
Bilingual subtree constraints
Then we perform word alignment using a word-level aligner (Liang et al., 2006; DeNero and Klein, 2007).
Bilingual subtree constraints
Figure 8 shows an example of a processed sentence pair that has tree structures on both sides and word alignment links.
Bilingual subtree constraints
Then through word alignment links, we obtain the corresponding words of the words of 3758.
Experiments
Word alignments were generated from the Berkeley Aligner (Liang et al., 2006; DeNero and Klein, 2007) trained on a bilingual corpus having approximately 0.8M sentence pairs.
Introduction
Basically, a (candidate) dependency subtree in a source-language sentence is mapped to a subtree in the corresponding target-language sentence by using word alignment and mapping rules that are automatically learned.
Motivation
Suppose that we have an input sentence pair as shown in Figure l, where the source sentence is in English, the target is in Chinese, the dashed undirected links are word alignment links, and the directed links between words indicate that they have a (candidate) dependency relation.
Motivation
We obtain their corresponding words “I’El(meat)”, “H3 (use)”, and “X¥(fork)” in Chinese Via the word alignment links.
word alignment is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Duan, Xiangyu and Zhang, Min and Li, Haizhou
Abstract
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
Abstract
But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist.
Introduction
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
Introduction
But there is a deficiency in such manner that word is too fine-grained in some cases such as non-compositional phrasal equivalences, where clear word alignments do not exist.
Introduction
No clear word alignments
Searching for Pseudo-words
Then we apply word alignment techniques to build pseudo-word alignments.
word alignment is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Abstract
We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT.
Abstract
The experimental results show that our method improves the performance of both word alignment and translation quality significantly.
Collocation Model
This method adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations only from monolingual corpora.
Collocation Model
2.1 Monolingual word alignment
Collocation Model
Then the monolingual word alignment algorithm is employed to align the potentially collocated words in the monolingual sentences.
Introduction
Statistical bilingual word alignment (Brown et al.
Introduction
Although many methods were proposed to improve the quality of word alignments (Wu, 1997; Och and Ney, 2000; Marcu and Wong, 2002; Cherry and Lin, 2003; Liu et al., 2005; Huang, 2009), the correlation of the words in multi-word alignments is not fully considered.
Introduction
In phrase-based SMT (Koehn et al., 2003), the phrase boundary is usually determined based on the bidirectional word alignments .
word alignment is mentioned in 46 sentences in this paper.
Topics mentioned in this paper:
Riesa, Jason and Marcu, Daniel
Abstract
We present a simple yet powerful hierarchical search algorithm for automatic word alignment .
Abstract
We report results on Arabic-English word alignment and translation tasks.
Introduction
Automatic word alignment is generally accepted as a first step in training any statistical machine translation system.
Introduction
has motivated much recent work in discriminative modeling for word alignment (Moore, 2005; Itty-cheriah and Roukos, 2005; Liu et al., 2005; Taskar et al., 2005; Blunsom and Cohn, 2006; Lacoste-Julien et al., 2006; Moore et al., 2006).
Introduction
We borrow ideas from both k—best parsing (Klein and Manning, 2001; Huang and Chiang, 2005; Huang, 2008) and forest-based, and hierarchical phrase-based translation (Huang and Chiang, 2007; Chiang, 2007), and apply them to word alignment .
Word Alignment as a Hypergraph
Word alignments are built bottom-up on the parse tree.
Word Alignment as a Hypergraph
Initial partial alignments are enumerated and scored at preterminal nodes, each spanning a single column of the word alignment matrix.
Word Alignment as a Hypergraph
Initial alignments We can construct a word alignment hierarchically, bottom-up, by making use of the structure inherent in syntactic parse trees.
word alignment is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
A Phrase-Based Error Model
Let J be the length of Q, L be the length of C, and A = a1, ..., a] be a hidden variable representing the word alignment .
A Phrase-Based Error Model
When scoring a given candidate pair, we further restrict our attention to those S, T, M triples that are consistent with the word alignment , which we denote as B(C, Q, A*).
A Phrase-Based Error Model
Once the word alignment is fixed, the final permutation is uniquely determined, so we can safely discard that factor.
word alignment is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Abstract
It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment .
Introduction
However, most of the syntax based systems construct the syntactic translation rules based on word alignment , which not only suffers from the pipeline errors, but also fails to effectively utilize the syntactic structural features.
Substructure Spaces for BTKs
4.1 Lexical and Word Alignment Features
Substructure Spaces for BTKs
Internal Word Alignment Features: The word alignment links account much for the co-occurrence of the aligned terms.
Substructure Spaces for BTKs
We define the internal word alignment features as follows:
word alignment is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Zhou, Ming
Basics of ITG
From the viewpoint of word alignment , the terminal unary rules provide the links of word pairs, whereas the binary rules represent the reordering factor.
Basics of ITG
First of all, it imposes a l-to-l constraint in word alignment .
Basics of ITG
Secondly, the simple ITG leads to redundancy if word alignment is the sole purpose of applying ITG.
Basics of ITG Parsing
Based on the rules in normal form, ITG word alignment is done in a similar way to chart parsing (Wu, 1997).
Conclusion and Future Work
This paper reviews word alignment through ITG parsing, and clarifies the problem of ITG pruning.
Evaluation
Table 3 lists the word alignment time cost and SMT performance of different pruning methods.
Introduction
For this reason ITG has gained more and more attention recently in the word alignment community (Zhang and Gildea, 2005; Cherry and Lin, 2006; Haghighi et al., 2009).
The DPDI Framework
Discriminative approaches to word alignment use manually annotated alignment for sentence pairs.
The DPDI Framework
However, in reality there are often the cases where a foreign word aligns to more than one English word.
word alignment is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Alignment
A phrase extraction is performed for each training sentence pair separately using the same word alignment as for the initialization.
Experimental Evaluation
For the heuristic phrase model, we first use GIZA++ (Och and Ney, 2003) to compute the word alignment on TRAIN.
Experimental Evaluation
Next we obtain a phrase table by extraction of phrases from the word alignment .
Introduction
Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
Phrase Model Training
The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment .
Related Work
Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999).
Related Work
In addition, we do not restrict the training to phrases consistent with the word alignment , as was done in (DeNero et al., 2006).
Related Work
This allows us to recover from flawed word alignments .
word alignment is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Liu, Qun
Introduction
For dependency projection, the relationship between words in the parsed sentences can be simply projected across the word alignment to words in the unparsed sentences, according to the DCA assumption (Hwa et al., 2005).
Introduction
Such a projection procedure suffers much from the word alignment errors and syntactic isomerism between languages, which usually lead to relationship projection conflict and incomplete projected dependency structures.
Introduction
Because of the free translation, the syntactic isomerism between languages and word alignment errors, it would be strained to completely project the dependency structure from one language to another.
Projected Classification Instance
In order to alleviate the effect of word alignment errors, we base the projection on the alignment matrix, a compact representation of multiple GIZA++ (Och and Ney, 2000) results, rather than a single word alignment in previous dependency projection works.
Projected Classification Instance
Figure 2: The word alignment matrix between a Chinese sentence and its English translation.
Related Works
Because of the free translation, the word alignment errors, and the heterogeneity between two languages, it is reluctant and less effective to project the dependency tree completely to the target language sentence.
word alignment is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yamangil, Elif and Shieber, Stuart M.
Abstract
These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments , yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism.
Conclusion
The future for this work would involve natural extensions such as mixing over the space of word alignments ; this would allow application to MT—like tasks where flexible word reordering is allowed, such as abstractive sentence compression and paraphrasing.
Introduction
One approach is to use word alignments (where these can be reliably estimated, as in our testbed application) to align subtrees and extract rules (Och and Ney, 2004; Galley et al., 2004) but this leaves open the question of finding the right level of generality of the rules — how deep the rules should be and how much lexicalization they should involve — necessitating resorting to heuristics such as minimality of rules, and leading to
Introduction
possibility of searching over the infinite space of grammars (and, in machine translation, possible word alignments ), thus sidestepping the narrowness problem outlined above as well.
Introduction
This task is characterized by the availability of word alignments , providing a clean testbed for investigating the effects of grammar extraction.
The STSG Model
In particular, we visit every tree pair and each of its source nodes i, and update its alignment by selecting between and within two choices: (a) unaligned, (b) aligned with some target node j or e. The number of possibilities j in (b) is significantly limited, firstly by the word alignment (for instance, a source node dominating a deleted subspan cannot be aligned with a target node), and secondly by the current alignment of other nearby aligned source nodes.
word alignment is mentioned in 6 sentences in this paper.
Topics mentioned in this paper: