Abstract | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. |
Abstract | But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. |
Introduction | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step. |
Introduction | But there is a deficiency in such manner that word is too fine-grained in some cases such as non-compositional phrasal equivalences, where clear word alignments do not exist. |
Introduction | No clear word alignments |
Searching for Pseudo-words | Then we apply word alignment techniques to build pseudo-word alignments. |
Abstract | Two decades after their invention, the IBM word-based translation models, widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems. |
Abstract | We explain how to implement this extension efliciently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Experiments | We measured the accuracy of word alignments generated by GIZA++ with and without the {O-norm, |
Introduction | Automatic word alignment is a Vital component of nearly all current statistical translation pipelines. |
Introduction | Although state-of—the-art translation models use rules that operate on units bigger than words (like phrases or tree fragments), they nearly always use word alignments to drive extraction of those translation rules. |
Introduction | The dominant approach to word alignment has been the IBM models (Brown et al., 1993) together with the HMM model (Vogel et al., 1996). |
Method | We start with a brief review of the IBM and HMM word alignment models, then describe how to extend them with a smoothed (0 prior and how to efficiently train them. |
Method | In word alignment , one well-known manifestation of overfitting is that rare words can act as “garbage collectors” |
Method | Previously (Vaswani et al., 2010), we used ALGENCAN, a nonlinear optimization toolkit, but this solution does not scale well to the number of parameters involved in word alignment models. |
BLEU and PORT | We use word alignment to compute the two permutations (LRscore also uses word alignment ). |
BLEU and PORT | The word alignment between the source input and reference is computed using GIZA++ (Och and Ney, 2003) beforehand with the default settings, then is refined with the heuristic grow-diag-final-and; the word alignment between the source input and the translation is generated by the decoder with the help of word alignment inside each phrase pair. |
BLEU and PORT | These encode one-to-one relations but not one-to-many, many-to-one, many-to-many or null relations, all of which can occur in word alignments . |
Experiments | In order to compute the v part of PORT, we require source-target word alignments for the references and MT outputs. |
Experiments | Also, v depends on source-target word alignments for reference and test sets. |
Experiments | 3.2.5 Robustness to word alignment errors |
Abstract | Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and recall measured against gold standard word alignments . |
Experiments | We evaluate our transliteration mining algorithm on three tasks: transliteration mining from Wikipedia InterLanguage Links, transliteration mining from parallel corpora, and word alignment using a word aligner with a transliteration component. |
Experiments | In the word alignment experiment, we integrate a transliteration module which is trained on the transliterations pairs extracted by our method into a word aligner and show a significant improvement. |
Experiments | We use the English/Hindi corpus from the shared task on word alignment , organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts (WA05) (Martin et al., 2005). |
Introduction | Finally we integrate a transliteration module into the GIZA++ word aligner and show that it improves word alignment quality. |
Introduction | We evaluate our word alignment system on two language pairs using gold standard word alignments and achieve improvements of 10% and 13.5% in precision and 3.5% and 13.5% in recall. |
Introduction | Section 4 describes the evaluation of our mining method through both gold standard evaluation and through using it to improve word alignment quality. |
Experimental Evaluation | We compare the accuracy of our proposed method of joint phrase alignment and extraction using the FLAT, HIER and HLEN models, with a baseline of using word alignments from GIZA++ and heuristic phrase extraction. |
Flat ITG Model | It should be noted that while Model 1 probabilities are used, they are only soft constraints, compared with the hard constraint of choosing a single word alignment used in most previous phrase extraction approaches. |
Hierarchical ITG Model | Because of this, previous research has combined FLAT with heuristic phrase extraction, which exhaustively combines all adjacent phrases permitted by the word alignments (Och et al., 1999). |
Hierarchical ITG Model | Figure l: A word alignment (a), and its derivations according to FLAT (b), and HIER (C). |
Introduction | However, as DeNero and Klein (2010) note, this two step approach results in word alignments that are not optimal for the final task of generating |
Introduction | As a solution to this, they proposed a supervised discriminative model that performs joint word alignment and phrase extraction, and found that joint estimation of word alignments and extraction sets improves both word alignment accuracy and translation results. |
Phrase Extraction | Figure 3: The phrase, block, and word alignments used in heuristic phrase extraction. |
Phrase Extraction | The traditional method for heuristic phrase extraction from word alignments exhaustively enumerates all phrases up to a certain length consistent with the alignment (Och et al., 1999). |
Phrase Extraction | We will call this heuristic extraction from word alignments HEUR-W. |
Abstract | Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation 6. |
Experimental Results | Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006). |
Introduction | Word alignment is the task of identifying corresponding words in sentence pairs. |
Introduction | The standard approach to word alignment employs directional Markov models that align the words of a sentence f to those of its translation 6, such as IBM Model 4 (Brown et al., 1993) or the HMM-based alignment rnodel(ngeletal,l996) |
Model Definition | Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values. |
Model Definition | The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time. |
Model Definition | An alignment vector a can be converted trivially into a set of word alignment links A: |
Related Work | In addition, supervised word alignment models often use the output of directional unsupervised aligners as features or pruning signals. |
Related Work | A parallel idea that closely relates to our bidirectional model is posterior regularization, which has also been applied to the word alignment problem (Graca et al., 2008). |
Related Work | Another similar line of work applies belief propagation to factor graphs that enforce a one-to-one word alignment (Cromieres and Kurohashi, 2009). |
Abstract | Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baselinel. |
Experiments | Therefore, it is important to use name-replaced corpora for rule extraction to fully take advantage of improved word alignment . |
Experiments | 5.4 Word Alignment |
Experiments | It is also important to investigate the impact of our NAMT approach on improving word alignment . |
Introduction | names in parallel corpora, updating word segmentation, word alignment and grammar extraction (Section 3.1). |
Name-aware MT | We pair two entities from two languages, if they have the same entity type and are mapped together by word alignment . |
Name-aware MT | First, we replace tagged name pairs with their entity types, and then use Giza++ and symmetrization heuristics to regenerate word alignment . |
Name-aware MT | Since the name tags appear very frequently, the existence of such tags yields improvement in word alignment quality. |
Abstract | It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment . |
Introduction | However, most of the syntax based systems construct the syntactic translation rules based on word alignment , which not only suffers from the pipeline errors, but also fails to effectively utilize the syntactic structural features. |
Substructure Spaces for BTKs | 4.1 Lexical and Word Alignment Features |
Substructure Spaces for BTKs | Internal Word Alignment Features: The word alignment links account much for the co-occurrence of the aligned terms. |
Substructure Spaces for BTKs | We define the internal word alignment features as follows: |
A Phrase-Based Error Model | Let J be the length of Q, L be the length of C, and A = a1, ..., a] be a hidden variable representing the word alignment . |
A Phrase-Based Error Model | When scoring a given candidate pair, we further restrict our attention to those S, T, M triples that are consistent with the word alignment , which we denote as B(C, Q, A*). |
A Phrase-Based Error Model | Once the word alignment is fixed, the final permutation is uniquely determined, so we can safely discard that factor. |
Abstract | We present a simple yet powerful hierarchical search algorithm for automatic word alignment . |
Abstract | We report results on Arabic-English word alignment and translation tasks. |
Introduction | Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. |
Introduction | has motivated much recent work in discriminative modeling for word alignment (Moore, 2005; Itty-cheriah and Roukos, 2005; Liu et al., 2005; Taskar et al., 2005; Blunsom and Cohn, 2006; Lacoste-Julien et al., 2006; Moore et al., 2006). |
Introduction | We borrow ideas from both k—best parsing (Klein and Manning, 2001; Huang and Chiang, 2005; Huang, 2008) and forest-based, and hierarchical phrase-based translation (Huang and Chiang, 2007; Chiang, 2007), and apply them to word alignment . |
Word Alignment as a Hypergraph | Word alignments are built bottom-up on the parse tree. |
Word Alignment as a Hypergraph | Initial partial alignments are enumerated and scored at preterminal nodes, each spanning a single column of the word alignment matrix. |
Word Alignment as a Hypergraph | Initial alignments We can construct a word alignment hierarchically, bottom-up, by making use of the structure inherent in syntactic parse trees. |
Abstract | We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. |
Abstract | The experimental results show that our method improves the performance of both word alignment and translation quality significantly. |
Collocation Model | This method adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations only from monolingual corpora. |
Collocation Model | 2.1 Monolingual word alignment |
Collocation Model | Then the monolingual word alignment algorithm is employed to align the potentially collocated words in the monolingual sentences. |
Introduction | Statistical bilingual word alignment (Brown et al. |
Introduction | Although many methods were proposed to improve the quality of word alignments (Wu, 1997; Och and Ney, 2000; Marcu and Wong, 2002; Cherry and Lin, 2003; Liu et al., 2005; Huang, 2009), the correlation of the words in multi-word alignments is not fully considered. |
Introduction | In phrase-based SMT (Koehn et al., 2003), the phrase boundary is usually determined based on the bidirectional word alignments . |
Abstract | In our method, a target-side tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. |
Bilingual subtree constraints | Then we perform word alignment using a word-level aligner (Liang et al., 2006; DeNero and Klein, 2007). |
Bilingual subtree constraints | Figure 8 shows an example of a processed sentence pair that has tree structures on both sides and word alignment links. |
Bilingual subtree constraints | Then through word alignment links, we obtain the corresponding words of the words of 3758. |
Experiments | Word alignments were generated from the Berkeley Aligner (Liang et al., 2006; DeNero and Klein, 2007) trained on a bilingual corpus having approximately 0.8M sentence pairs. |
Introduction | Basically, a (candidate) dependency subtree in a source-language sentence is mapped to a subtree in the corresponding target-language sentence by using word alignment and mapping rules that are automatically learned. |
Motivation | Suppose that we have an input sentence pair as shown in Figure l, where the source sentence is in English, the target is in Chinese, the dashed undirected links are word alignment links, and the directed links between words indicate that they have a (candidate) dependency relation. |
Motivation | We obtain their corresponding words “I’El(meat)”, “H3 (use)”, and “X¥(fork)” in Chinese Via the word alignment links. |
Abstract | In contrast, alignment based methods used word alignment model to fulfill this task, which could avoid parsing errors without using parsing. |
Introduction | A word can find its corresponding modifiers by using a word alignment |
Introduction | Furthermore, this paper naturally addresses another question: is it useful for opinion targets extraction when we combine syntactic patterns and word alignment model into a unified model? |
Introduction | Then, these partial alignment links can be regarded as the constrains for a standard unsupervised word alignment model. |
Opinion Target Extraction Methodology | In the first component, we respectively use syntactic patterns and unsupervised word alignment model (WAM) to capture opinion relations. |
Opinion Target Extraction Methodology | In addition, we employ a partially supervised word alignment model (PSWAM) to incorporate syntactic information into WAM. |
Opinion Target Extraction Methodology | 3.1.2 Unsupervised Word Alignment Model |
Conclusion | We have shown that the word alignment models IBM-3 and IBM-4 can be turned into nondeficient |
Introduction | While most people think of the translation and word alignment models IBM-3 and IBM-4 as inherently deficient models (i.e. |
Introduction | The source code of this project is available in our word alignment software RegAlignerl, version 1.2 and later. |
Introduction | Related Work Today’s most widely used models for word alignment are still the models IBM 1-5 of Brown et al. |
The models IBM-3, IBM-4 and IBM-5 | bility p(fi] |e{) of getting the foreign sentence as a translation of the English one is modeled by introducing the word alignment a as a hidden variable: |
The models IBM-3, IBM-4 and IBM-5 | ,I, decide on the number (I),-of foreign words aligned to 6,. |
The models IBM-3, IBM-4 and IBM-5 | Foreachi=l,2,...,I, andk = l,...,<l>z-decide on (a) the identity fiyc of the next foreign word aligned to 6,. |
Training the New Variants | For the task of word alignment , we infer the parameters of the models using the maximum likeli- |
Training the New Variants | This task is also needed for the actual task of word alignment (annotating a given sentence pair with an alignment). |
Conclusions and Future Work | Although the characteristic of more sensitiveness to word alignment error enables SncTSSG to capture the additional noncontiguous language phenomenon, it also induces many redundant noncontiguous rules. |
Experiments | We base on the m-to-n word alignments dumped by GIZA++ to extract the tree sequence pairs. |
Experiments | The STSSG or any contiguous translational equivalence based model is unable to attain the corresponding target output for this idiom word via the noncontiguous word alignment and consider it as an out-of—vocabulary (OOV). |
Experiments | On the contrary, the SncTSSG based model can capture the noncontiguous tree sequence pair consistent with the word alignment and further provide a reasonable target translation. |
Introduction | (2006) statistically report that discontinuities are very useful for translational equivalence analysis using binary branching structures under word alignment and parse tree constraints. |
Tree Sequence Pair Extraction | Data structure: p[j1, jg] to store tree sequence pairs covering source SpanU1,j2] 1: foreach source span [j1,j2], do 2: find a target span [i1,i2] with minimal length covering all the target words aligned to [j1,j2] 3: if all the target words in [i1,i2] are aligned with source words only in [j1,j2], then 4: Pair each source tree sequence covering [j1,j2] with those in target covering [i1,i2] as a contiguous tree sequence pair |
Tree Sequence Pair Extraction | 7: create sub-span set s([i1,i2]) to cover all the target words aligned to [j1,j2] |
Tree Sequence Pair Extraction | 13: find a source span [1'], jg] with minimal length covering all the source words aligned to [i1,i2] |
Abstract | However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. |
Abstract | We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment , by combining two monolingual tagging models with two unidirectional alignment models. |
Abstract | Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines. |
Bilingual NER by Agreement | We also assume that a set of word alignments (A = {(z’,j) : e,- <—> fj}) is given by a word aligner and remain fixed in our model. |
Bilingual NER by Agreement | The assumption in the hard agreement model can also be violated if there are word alignment errors. |
Introduction | In this work, we first develop a bilingual NER model (denoted as BI-NER) by embedding two monolingual CRF-based NER models into a larger undirected graphical model, and introduce additional edge factors based on word alignment (WA). |
Introduction | Our method does not require any manual annotation of word alignments or named entities over the bilingual training data. |
Introduction | The aforementioned BI-NER model assumes fixed alignment input given by an underlying word aligner . |
Joint Alignment and NER Decoding | To capture this intuition, we extend the BI-NER model to jointly perform word alignment and NER decoding, and call the resulting model BI-NER-WA. |
Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. |
Abstract | The RNN-based model outperforms the feed-forward neural network-based model (Yang et al., 2013) as well as the IBM Model 4 under Japanese-English and French-English word alignment tasks, and achieves comparable translation performance to those baselines for Japanese-English and Chinese-English translation tasks. |
Introduction | Automatic word alignment is an important task for statistical machine translation. |
Introduction | We assume that this property would fit with a word alignment task, and we propose an RNN-based word alignment model. |
Introduction | (2013) trained their model from word alignments produced by traditional unsupervised probabilistic models. |
Related Work | Various word alignment models have been proposed. |
Related Work | As an instance of discriminative models, we describe an FFNN-based word alignment model (Yang et al., 2013), which is our baseline. |
Training | GEN is a subset of all possible word alignments (I), which is generated by beam search. |
Training | We evaluated the alignment performance of the proposed models with two tasks: Japanese-English word alignment with the Basic Travel Expression Corpus (BTEC) (Takezawa et a1., 2002) and French-English word alignment with the Hansard dataset (H ansards) from the 2003 NAACL shared task (Mihalcea and Pedersen, 2003). |
A Generic Phrase Training Procedure | We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair. |
A Generic Phrase Training Procedure | Beginning with a flat lexicon, we train IBM Model-l word alignment model with 10 iterations for each translation direction. |
A Generic Phrase Training Procedure | We then train HMM word alignment models (Vogel et al., 1996) in two directions simultaneously by merging statistics collected in the |
Abstract | Experimental results demonstrate consistent and significant improvement over the widely used method that is based on word alignment matrix only. |
Introduction | The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003). |
Introduction | Other methods do not depend on word alignments only, such as directly modeling phrase alignment in a joint generative way (Marcu and Wong, 2002), pursuing information extraction perspective (Venugopal et al., 2003), or augmenting with model-based phrase pair posterior (Deng and Byrne, 2005). |
Introduction | On the other hand, there are valid translation pairs in the training corpus that are not learned due to word alignment errors as shown in Deng and Byrne (2005). |
Abstract | Automatic word alignment is a key step in training statistical machine translation systems. |
Abstract | Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. |
Introduction | The word alignment problem has received much recent attention, but improvements in standard measures of word alignment performance often do not result in better translations. |
Introduction | In this work, we show that by changing the way the word alignment models are trained and |
Introduction | We present extensive experimental results evaluating a new training scheme for unsupervised word alignment models: an extension of the Expectation Maximization algorithm that allows effective injection of additional information about the desired alignments into the unsupervised training process. |
Statistical word alignment | Statistical word alignment (Brown et al., 1994) is the task identifying which words are translations of each other in a bilingual sentence corpus. |
Statistical word alignment | Figure 2 shows two examples of word alignment of a sentence pair. |
Statistical word alignment | Due to the ambiguity of the word alignment task, it is common to distinguish two kinds of alignments (Och and Ney, 2003). |
Inflection prediction models | ture of English and word alignment information. |
Integration of inflection models with MT systems | Stemming the target sentences is expected to be helpful for word alignment, especially when the stemming operation is defined so that the word alignment becomes more one-to-one (Goldwater and McClosky, 2005). |
Integration of inflection models with MT systems | However, for some language pairs, stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target. |
Integration of inflection models with MT systems | Note that it may be better to use the word alignment maintained as part of the translation hypotheses during search, but our solution is more suitable to situations where these can not be easily obtained. |
Introduction | Evidence for this difficulty is the fact that there has been very little work investigating the use of such independent sub-components, though we started to see some successful cases in the literature, for example in word alignment (Fraser and Marcu, 2007), target language capitalization (Wang et al., 2006) and case marker generation (Toutanova and Suzuki, 2007). |
MT performance results | Finally, we can see that using stemming at the word alignment stage further improved both the oracle and the achieved results. |
MT performance results | pressed in English at the word level, these results are consistent with previous results using stemming to improve word alignment . |
Machine translation systems and data | It uses the same lexicalized-HMM model for word alignment as the treelet system, and uses the standard extraction heuristics to extract phrase pairs using forward and backward alignments. |
Data and Tools | 3.2 Word Alignments |
Data and Tools | In our approach, word alignments for the parallel text are required. |
Data and Tools | We perform word alignments with the open source GIZA++ toolkit5. |
Experiments | By using IGT Data, not only can we obtain more accurate word alignments , but also extract useful cross-lingual information for the resource-poor language. |
Our Approach | In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U. |
Our Approach | We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments: |
Our Approach | By reducing unaligned edges to their deleXicalized forms, we can still use those deleXicalized features, such as part-of-speech tags, for those unaligned edges, and can address problem that automatically generated word alignments include errors. |
Abstract | Incorporating a sparse prior using Variational Bayes, biases the models toward generalizable, parsimonious parameter sets, leading to significant improvements in word alignment . |
Abstract | This preference for sparse solutions together with effective pruning methods forms a phrase alignment regimen that produces better end-to-end translations than standard word alignment approaches. |
Bootstrapping Phrasal ITG from Word-based ITG | The scope of iterative phrasal ITG training, therefore, is limited to determining the boundaries of the phrases anchored on the given one-to-one word alignments . |
Bootstrapping Phrasal ITG from Word-based ITG | Second, we do not need to worry about non-ITG word alignments , such as the (2, 4, l, 3) permutation patterns. |
Bootstrapping Phrasal ITG from Word-based ITG | Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair. |
Experiments | 7.1 Word Alignment Evaluation |
Experiments | The output of the word alignment systems (GIZA++ or ITG) were fed to a standard phrase extraction procedure that extracted all phrases of length up to 7 and estimated the conditional probabilities of source given target and target given source using relative frequencies. |
Introduction | As these word-level alignment models restrict the word alignment complexity by requiring each target word to align to zero or one source words, results are improved by aligning both source-to-target as well as target-to-source, |
Introduction | Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair; these form the basis of the decoding process. |
Introduction | Furthermore it would obviate the need for heuristic combination of word alignments . |
Phrasal Inversion Transduction Grammar | First we train a lower level word alignment model, then we place hard constraints on the phrasal alignment space using confident word links from this simpler model. |
Abstract | Bidirectional models of word alignment are an appealing alternative to post-hoc combinations of directional word aligners . |
Background | The focus of this work is on the word alignment decoding problem. |
Background | Before turning to the model of interest, we first introduce directional word alignment . |
Background | 2.1 Word Alignment |
Introduction | Word alignment is a critical first step for building statistical machine translation systems. |
Introduction | In order to ensure accurate word alignments, most systems employ a post-hoc symmetrization step to combine directional word aligners , such as IBM Model 4 (Brown et al., 1993) or hidden Markov model (HMM) based aligners (Vogel et al., 1996). |
Introduction | We begin in Section 2 by formally describing the directional word alignment problem. |
Related Work | Cromieres and Kurohashi (2009) use belief propagation on a factor graph to train and decode a one-to-one word alignment problem. |
Related Work | (2008) use posterior regularization to constrain the posterior probability of the word alignment problem to be symmetric and bij ective. |
Abstract | In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). |
Abstract | We describe in detail how we adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method introduced in speech recognition to the HMM-based word alignment model, in which bilingual word embedding is discrimina-tively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. |
Abstract | Experiments on a large scale English-Chinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score. |
DNN for word alignment | Our DNN word alignment model extends classic HMM word alignment model (Vogel et al., 1996). |
DNN for word alignment | Given a sentence pair (e, f), HMM word alignment takes the following form: |
Introduction | Inspired by successful previous works, we propose a new DNN-based word alignment method, which exploits contextual and semantic similarities between words. |
Introduction | Figure 1: Two examples of word alignment |
Introduction | In the rest of this paper, related work about DNN and word alignment are first reviewed in Section 2, followed by a brief introduction of DNN in Section 3. |
Related Work | For the related works of word alignment , the most popular methods are based on generative models such as IBM Models (Brown et al., 1993) and HMM (Vogel et al., 1996). |
Related Work | Discriminative approaches are also proposed to use hand crafted features to improve word alignment . |
Abstract | We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system. |
Experimental Results | Then we run GIZA++ (Och and Ney, 2000) to generate the word alignment for each direction and apply grow-diagonal-final (Koehn et al., 2003), same as in the baseline. |
Integrating Empty Categories in Machine Translation | With the preprocessed MT training corpus, an unsupervised word aligner, such as GIZA++, can be used to generate automatic word alignment , as the first step of a system training pipeline. |
Integrating Empty Categories in Machine Translation | The effect of inserting ECs is twofold: first, it can impact the automatic word alignment since now it allows the target-side words, especially the function words, to align to the inserted ECs and fix some errors in the original word alignment ; second, new phrases and rules can be extracted from the preprocessed training data. |
Integrating Empty Categories in Machine Translation | A few examples of the extracted Hiero rules and tree-to-string rules are also listed, which we would not have been able to extract from the original incorrect word alignment when the *pro* was missing. |
Introduction | In addition, the pro-drop problem can also degrade the word alignment quality in the training data. |
Introduction | A sentence pair observed in the real data is shown in Figure 1 along with the word alignment obtained from an automatic word aligner , where the English subject pronoun |
Introduction | Figure 1: Example of incorrect word alignment due to missing pronouns on the Chinese side. |
Conclusion and Future work | The approach also exploits methods from statistical MT ( word alignment ) and therefore integrates techniques from statistical syntactic parsing, MT, and compositional semantics to produce an effective semantic parser. |
Experimental Evaluation | We also evaluated the impact of the word alignment component by replacing Giza++ by gold-standard word alignments manually annotated for the CLANG corpus. |
Experimental Evaluation | The results consistently showed that compared to using gold-standard word alignment , Giza++ produced lower semantic parsing accuracy when given very little training data, but similar or better results when given sufficient training data (> 160 examples). |
Experimental Evaluation | This suggests that, given sufficient data, Giza++ can produce effective word alignments, and that imperfect word alignments do not seriously impair our semantic parsers since the disambiguation model evaluates multiple possible interpretations of ambiguous words. |
Introduction | The learning system first employs a word alignment method from statistical machine translation (GIZA++ (Och and Ney, 2003)) to acquire a semantic lexicon that maps words to logical predicates. |
Learning Semantic Knowledge | We use an approach based on Wong and Mooney (2006), which constructs word alignments between NL sentences and their MRs. |
Learning Semantic Knowledge | Normally, word alignment is used in statistical machine translation to match words in one NL to words in another; here it is used to align words with predicates based on a ”parallel corpus” of NL sentences and MRs. We assume that each word alignment defines a possible mapping from words to predicates for building a SAPT and semantic derivation which compose the correct MR. A semantic lexicon and composition rules are then extracted directly from the |
Learning Semantic Knowledge | Generation of word alignments for each training example proceeds as follows. |
Learning a Disambiguation Model | Here, unique word alignments are not required, and alternative interpretations compete for the best semantic parse. |
Abstract | In this paper we present a confidence measure for word alignment based on the posterior probability of alignment links. |
Abstract | Based on these measures, we improve the alignment quality by selecting high confidence sentence alignments and alignment links from multiple word alignments of the same sentence pair. |
Abstract | Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. |
Introduction | Many MT systems, such as statistical phrase-based and syntax-based systems, learn phrase translation pairs or translation rules from large amount of bilingual data with word alignment . |
Introduction | The quality of the parallel data and the word alignment have significant impacts on the learned translation models and ultimately the quality of translation output. |
Introduction | Given the huge amount of bilingual training data, word alignments are automatically generated using various algorithms ((Brown et al., 1994), (Vogel et al., 1996) |
Abstract | Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. |
Abstract | In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. |
Abstract | To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. |
Introduction | These methods use a small corpus of manual word alignments (where the words in the source sentence are manually aligned to the words in the target sentence) to learn a model to preorder the source sentence to match target order. |
Introduction | In this paper, we build upon the approach in (Visweswariah et al., 2011) which uses manual word alignments for learning a reordering model. |
Introduction | Specifically, we show that we can significantly improve reordering performance by using a large number of sentence pairs for which manual word alignments are not available. |
Parallel Segment Retrieval | Finally, a represents the word alignment between the words in the left and the right segments. |
Parallel Segment Retrieval | Then, we would use an word alignment model (Brown et al., 1993; Vogel et al., 1996), with source s = sup, . |
Parallel Segment Retrieval | Finally, from the probability of the word alignments , we can determine whether the segments are parallel. |
Basics of ITG | From the viewpoint of word alignment , the terminal unary rules provide the links of word pairs, whereas the binary rules represent the reordering factor. |
Basics of ITG | First of all, it imposes a l-to-l constraint in word alignment . |
Basics of ITG | Secondly, the simple ITG leads to redundancy if word alignment is the sole purpose of applying ITG. |
Basics of ITG Parsing | Based on the rules in normal form, ITG word alignment is done in a similar way to chart parsing (Wu, 1997). |
Conclusion and Future Work | This paper reviews word alignment through ITG parsing, and clarifies the problem of ITG pruning. |
Evaluation | Table 3 lists the word alignment time cost and SMT performance of different pruning methods. |
Introduction | For this reason ITG has gained more and more attention recently in the word alignment community (Zhang and Gildea, 2005; Cherry and Lin, 2006; Haghighi et al., 2009). |
The DPDI Framework | Discriminative approaches to word alignment use manually annotated alignment for sentence pairs. |
The DPDI Framework | However, in reality there are often the cases where a foreign word aligns to more than one English word. |
Alignment | A phrase extraction is performed for each training sentence pair separately using the same word alignment as for the initialization. |
Experimental Evaluation | For the heuristic phrase model, we first use GIZA++ (Och and Ney, 2003) to compute the word alignment on TRAIN. |
Experimental Evaluation | Next we obtain a phrase table by extraction of phrases from the word alignment . |
Introduction | Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table |
Phrase Model Training | The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment . |
Related Work | Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999). |
Related Work | In addition, we do not restrict the training to phrases consistent with the word alignment , as was done in (DeNero et al., 2006). |
Related Work | This allows us to recover from flawed word alignments . |
Experiment | GIZA++ and grow-diag-final-and heuristics were used to obtain word alignments . |
Experiment | In order to reduce word alignment errors, we removed articles {a, an, the} in English and particles {ga, wo, wa} in Japanese before performing word alignments because these function words do not correspond to any words in the other languages. |
Experiment | After word alignment, we restored the removed words and shifted the word alignment positions to the original word positions. |
Proposed Method | The lines represent word alignments . |
Proposed Method | The English side arrows point to the nearest word aligned on the right. |
Proposed Method | The training data is built from a parallel corpus and word alignments between corresponding source words and target words. |
Introduction | The algorithm uses learned word alignments to aggressively generalize the seeds, producing a large set of possible lexical equivalences. |
Learning | We call this procedure InduceLex(:c, :c’, y, A), which takes a paraphrase pair (at, :c’ ), a derivation y of at, and a word alignment A, and returns a new set of lexical entries. |
Learning | , A word alignment A between at and :c’ is a subset of x [n’ A phrase alignment is a pair of index sets (1,1’) where I g and I’ g A phrase alignment (1,1’) is consistent with a word alignment A if for all (i, z") E A, i E I if and only if z" E I ’ . |
Learning | In other words, a phrase alignment is consistent with a word alignment if the words in the phrases are aligned only with each other, and not with any outside words. |
Backgrounds | 1These numbers are language/corpus-dependent and are not necessarily to be taken as a general reflection of the overall quality of the word alignments for arbitrary language pairs. |
Composed Rule Extraction | Input: HPSG forest F5, target sentence T, word alignment A = j)}, target function word set {fw} appeared in T, and target chunk set {C} |
Introduction | However, forest-based translation systems, and, in general, most linguistically syntax-based SMT systems (Galley et al., 2004; Galley et al., 2006; Liu et al., 2006; Zhang et al., 2007; Mi et al., 2008; Liu et al., 2009; Chiang, 2010), are built upon word aligned parallel sentences and thus share a critical dependence on word alignments . |
Introduction | For example, even a single spurious word alignment can invalidate a large number of otherwise extractable rules, and unaligned words can result in an exponentially large set of extractable rules for the interpretation of these unaligned words (Galley et al., 2006). |
Introduction | What makes word alignment so fragile? |
Related Research | By dealing with the ambiguous word alignment instead of unaligned target words, syntax-based realignment models were proposed by (May |
Related Research | Specially, we observed that most incorrect or ambiguous word alignments are caused by function words rather than content words. |
Experiments | The alignment was obtained using GIZA++ (Och and Ney, 2003) and then we symmetrized the word alignment using the grow-diag-fmal heuristic. |
Extraction of Paraphrase Rules | 3.3 Word Alignments Filtering |
Extraction of Paraphrase Rules | We can construct word alignment between S0 and S1 through T 0. |
Extraction of Paraphrase Rules | On the initial corpus of (S0, T 0), we conduct word alignment with Giza++ (Och and Ney, 2000) in both directions and then apply the grow-diag-fmal heuristic (Koehn et al., 2005) for symmetrization. |
Abstract | The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. |
Experiments | We use Giza++ (Och and Ney, 2003) to generate the word alignment for the parallel corpus. |
Experiments | By manual analysis, we find that the gap is due to both errors of the ranking reorder model and errors from word alignment and parser. |
Experiments | The reason is that our annotators tend to align function words which might be left unaligned by automatic word aligner . |
Introduction | The ranking model is automatically derived from the word aligned parallel data, viewing the source tree nodes to be reordered as list items to be ranked. |
Ranking Model Training | As pointed out by (Li et al., 2007), in practice, nodes often have overlapping target spans due to erroneous word alignment or different syntactic structures between source and target sentences. |
Word Reordering as Syntax Tree Node Ranking | Constituent tree is shown above the source sentence; arrows below the source sentences show head-dependent arcs for dependency tree; word alignment links are lines without arrow between the source and target sentences. |
Introduction | DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment , language modelling, translation modelling and distortion modelling. |
Introduction | (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method to HMM-based word alignment model. |
Phrase Pair Embedding | where, fa, is the corresponding target word aligned to 6, , and it is similar for ea]. |
Phrase Pair Embedding | The recurrent neural network is trained with word aligned bilingual corpus, similar as (Auli et al., 2013). |
Related Work | (2013) adapt and extend CD-DNN-HMM (Dahl et al., 2012) to word alignment . |
Related Work | Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance. |
Related Work | Unfortunately, the better word alignment result generated by this model, cannot bring significant performance improvement on a end-to-end SMT evaluation task. |
Abstract | We rederive all the steps of KN smoothing to operate on count distributions instead of integral counts, and apply it to two tasks where KN smoothing was not applicable before: one in language model adaptation, and the other in word alignment . |
Introduction | One is language model domain adaptation, and the other is word alignment using the IBM models (Brown et al., 1993). |
Word Alignment | In this section, we show how to apply expected KN to the IBM word alignment models (Brown et al., 1993). |
Word Alignment | Of course, expected KN can be applied to other instances of EM besides word alignment . |
Word Alignment | The IBM models and related models define probability distributions p(a, f | e, 6), which model how likely a French sentence f is to be generated from an English sentence e with word alignment a. |
Comparative Study | monotone for current phrase, if a word alignment to the bottom left (point A) exists and there is no word alignment point at the bottom right position (point B) . |
Comparative Study | swap for current phrase, if a word alignment to the bottom right (point B) exists and there is no word alignment point at the bottom left position (point A) . |
Comparative Study | 0 when one source word aligned to multiple target words, duplicate the source word for each target word, e.g. |
Experiments | The main reason is that the “annotated” corpus is converted from word alignment which contains lots of error. |
Tagging-style Reordering Model | The first step is word alignment training. |
Tagging-style Reordering Model | We also have the word alignment within the new phrase pair, which is stored during the phrase extraction process. |
Abstract | The notion of fertility in word alignment (the number of words emitted by a single state) is useful but difficult to model. |
Evaluation | We explore the impact of this improved MAP inference procedure on a task in German-English word alignment . |
Evaluation | For training data we use the news commentary data from the WMT 2012 translation task.1 120 of the training sentences were manually annotated with word alignments . |
HMM alignment | , f J and word alignment vectors a = a1, . |
HMM alignment | For the standard HMM, there is a dynamic programming algorithm to compute the posterior probability over word alignments Pr(a|e, f These are the sufficient statistics gathered in the E step of EM. |
Introduction | These word alignments are a crucial training component in most machine translation systems. |
Introduction | 2 and 3 incorporate a positional model based on the absolute position of the word; Models 4 and 5 use a relative position model instead (an English word tends to align to a French word that is nearby the French word aligned to the previous English word). |
Conclusion and Future Work | In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons. |
Cross-Lingual Mixture Model for Sentiment Classification | We estimate word projection probability using word alignment probability generated by the Berkeley aligner (Liang et al., 2006). |
Cross-Lingual Mixture Model for Sentiment Classification | The word alignment probabilities serves two purposes. |
Cross-Lingual Mixture Model for Sentiment Classification | Figure 2 gives an example of word alignment probability. |
Data and task | Word alignment features |
Data and task | We eXploit a feature set based on HMM word alignments in both directions (Och and Ney, 2000). |
Data and task | The first oracle ORACLEl has access to the gold-standard English entities and gold-standard word alignments among English and foreign words. |
Experiments | We obtained word alignments of the training data by first running GIZA++ (Och and Ney, 2003) and then applying the refinement rule “grow-diag-final-and” (Koehn et al., 2003). |
Introduction | The solid lines denote hyperedges and the dashed lines denote word alignments . |
Model | The solid lines denote hyperedges and the dashed lines denote word alignments between the two forests. |
Rule Extraction | By constructing a theory that gives formal semantics to word alignments , Galley et al. |
Rule Extraction | Their GHKM procedure draws connections among word alignments , derivations, and rules. |
Rule Extraction | They first identify the tree nodes that subsume tree-string pairs consistent with word alignments and then extract rules from these nodes. |
Introduction | For dependency projection, the relationship between words in the parsed sentences can be simply projected across the word alignment to words in the unparsed sentences, according to the DCA assumption (Hwa et al., 2005). |
Introduction | Such a projection procedure suffers much from the word alignment errors and syntactic isomerism between languages, which usually lead to relationship projection conflict and incomplete projected dependency structures. |
Introduction | Because of the free translation, the syntactic isomerism between languages and word alignment errors, it would be strained to completely project the dependency structure from one language to another. |
Projected Classification Instance | In order to alleviate the effect of word alignment errors, we base the projection on the alignment matrix, a compact representation of multiple GIZA++ (Och and Ney, 2000) results, rather than a single word alignment in previous dependency projection works. |
Projected Classification Instance | Figure 2: The word alignment matrix between a Chinese sentence and its English translation. |
Related Works | Because of the free translation, the word alignment errors, and the heterogeneity between two languages, it is reluctant and less effective to project the dependency tree completely to the target language sentence. |
Abstract | These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments , yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. |
Conclusion | The future for this work would involve natural extensions such as mixing over the space of word alignments ; this would allow application to MT—like tasks where flexible word reordering is allowed, such as abstractive sentence compression and paraphrasing. |
Introduction | One approach is to use word alignments (where these can be reliably estimated, as in our testbed application) to align subtrees and extract rules (Och and Ney, 2004; Galley et al., 2004) but this leaves open the question of finding the right level of generality of the rules — how deep the rules should be and how much lexicalization they should involve — necessitating resorting to heuristics such as minimality of rules, and leading to |
Introduction | possibility of searching over the infinite space of grammars (and, in machine translation, possible word alignments ), thus sidestepping the narrowness problem outlined above as well. |
Introduction | This task is characterized by the availability of word alignments , providing a clean testbed for investigating the effects of grammar extraction. |
The STSG Model | In particular, we visit every tree pair and each of its source nodes i, and update its alignment by selecting between and within two choices: (a) unaligned, (b) aligned with some target node j or e. The number of possibilities j in (b) is significantly limited, firstly by the word alignment (for instance, a source node dominating a deleted subspan cannot be aligned with a target node), and secondly by the current alignment of other nearby aligned source nodes. |
Conclusions and Future Work | In this paper the model was only used to infer word alignments ; in future work we intend to develop a decoding algorithm for directly translating with the model. |
Experiments | However in this paper we limit our focus to inducing word alignments , i.e., by using the model to infer alignments which are then used in a standard phrase-based translation pipeline. |
Experiments | We present results on translation quality and word alignment . |
Gibbs Sampling | Given the structure of our model, a word alignment uniquely specifies the translation decisions and the sequence follows the order of the target sentence left to right. |
Introduction | In this paper we propose a new model to drop the independence assumption, by instead modelling correlations between translation decisions, which we use to induce translation derivations from aligned sentences (akin to word alignment ). |
Model | Given a source sentence, our model infers a latent derivation which produces a target translation and meanwhile gives a word alignment between the source and the target. |
Conclusions | We presented a novel information theoretic model for bilingual word clustering which seeks a clustering with high average mutual information between clusters of adjacent words, and also high mutual information across observed word alignment links. |
Experiments | The corpus was word aligned in two directions using an unsupervised word aligner (Dyer et al., 2013), then the intersected alignment points were taken. |
Experiments | For Turkish the F1 score improves by 1.0 point over when there are no distributional clusters which clearly shows that the word alignment information improves the clustering quality. |
Experiments | Thus we propose to further refine the quality of word alignment links as follows: Let c be a word in language 2 and y be a word in language 9 and let there exists an alignment link between cc and 3/. |
Introduction | The second term ensures that the cluster alignments induced by a word alignment have high mutual information across languages (§2.2). |
Word Clustering | For concreteness, A(:c, y) will be the number of times that cc is aligned to y in a word aligned parallel corpus. |
Experiments | They employed a word alignment model to capture opinion relations among words, and then used a random walking algorithm to extract opinion targets. |
Experiments | Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments , which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words. |
Introduction | They have investigated a series of techniques to enhance opinion relations identification performance, such as nearest neighbor rules (Liu et al., 2005), syntactic patterns (Zhang et al., 2010; Popescu and Etzioni, 2005), word alignment models (Liu et al., 2012; Liu et al., 2013b; Liu et al., 2013a), etc. |
Related Work | (Liu et al., 2012; Liu et al., 2013a; Liu et al., 2013b) employed word alignment model to capture opinion relations rather than syntactic parsing. |
The Proposed Method | This approach models capturing opinion relations as a monolingual word alignment process. |
The Proposed Method | After performing word alignment , we obtain a set of word pairs composed of a noun (noun phrase) and its corresponding modified word. |
A semantic span can include one or more eus. | Instead, we reserve the cohesive information in the training process by converting the original source sentence into tagged-flattened CSS and then perform word alignment and extract the translation rules from the bilingual flattened source CSS and the target string. |
A semantic span can include one or more eus. | We then perform word alignment on the modified bilingual sentences, and extract the new translation rules based on the new alignment, as shown in Figure 3(b) to Figure 3(c). |
A semantic span can include one or more eus. | bound by the word alignment , the alignment complies with EUC only if there is no overlap between pSA and pSB. |
Experiments | We obtain the word alignment with the grow-diag-final-and strategy with GIZA++. |
Experiments | The merits of “Flattened Rule” are twofold: 1) In training process, the new word alignment upon modified sentence pairs can align transitional expressions to flattened CSS tags; 2) In decoding process, the CSS-based rules are more discriminating than the original rules, which is more flexible than “TFS”. |
Experiments | Following (Levenberg et al., 2012; Neubig et al., 2011), we evaluate our model by using its output word alignments to construct a phrase table. |
Experiments | As a baseline, we train a phrase-based model using the moses toolkit12 based on the word alignments obtained using GIZA++ in both directions and symmetrized using the grow-diag-final-and heuristic13 (Koehn et al., 2003). |
Experiments | 11These are taken from the final model 4 word alignments , using the intersection of the source-target and target-source models. |
Related Work | In the context of machine translation, ITG has been explored for statistical word alignment in both unsupervised (Zhang and Gildea, 2005; Cherry and Lin, 2007; Zhang et al., 2008; Pauls et al., 2010) and supervised (Haghighi et al., 2009; Cherry and Lin, 2006) settings, and for decoding (Petrov et al., 2008). |
Related Work | Our paper fits into the recent line of work for jointly inducing the phrase table and word alignment (DeNero and Klein, 2010; Neubig et al., 2011). |
Introduction | Our syntactic constituent reordering model considers context free grammar (CFG) rules in the source language and predicts the reordering of their elements on the target side, using word alignment information. |
Introduction | We introduce novel soft reordering constraints, using syntactic constituents or semantic roles, composed over word alignment information in translation rules used during decoding time; |
Unified Linguistic Reordering Models | parse tree and its word alignment links to the target language. |
Unified Linguistic Reordering Models | Unlike the conventional phrase and lexical translation features, whose values are phrase pair-determined and thus can be calculated offline, the value of the reordering features can only be obtained during decoding time, and requires word alignment information as well. |
Unified Linguistic Reordering Models | Before we present the algorithm integrating the reordering models, we define the following functions by assuming XP,- and XP,-+1 are the constituent pair of interest in CFG rule cfg, H is the translation hypothesis and a is its word alignment: |
Approach Overview | To establish a soft correspondence between the two languages, we use a second similarity function, which leverages standard unsupervised word alignment statistics (§3.3).3 |
Graph Construction | 3The word alignment methods do not use POS information. |
Graph Construction | To define a similarity function between the English and the foreign vertices, we rely on high-confidence word alignments . |
Graph Construction | Since our graph is built from a parallel corpus, we can use standard word alignment techniques to align the English sentences “De |
Clustering for Cross Lingual Sentiment Analysis | Given a parallel bilingual corpus, word clusters in S can be aligned to clusters in T. Word alignments are created using parallel corpora. |
Clustering for Cross Lingual Sentiment Analysis | Here, LTIS and LSIT(...) are factors based on word alignments , which can be represented as: |
Conclusion and Future Work | A naive cluster linkage algorithm based on word alignments was used to perform CLSA. |
Introduction | To perform CLSA, this study leverages unlabelled parallel corpus to generate the word alignments . |
Introduction | These word alignments are then used to link cluster based features to obliterate the language gap for performing SA. |
Experiments | It is not surprising, since Bannard and Callison-Burch (2005) have pointed out that word alignment error is the major factor that influences the performance of the methods learning paraphrases from bilingual corpora. |
Experiments | The LW based features validate the quality of word alignment and assign low scores to those aligned EC pattern pairs with incorrect alignment. |
Introduction | parsing and English-foreign language word alignment , (2) aligned patterns induction, which produces English patterns along with the aligned pivot patterns in the foreign language, (3) paraphrase patterns extraction, in which paraphrase patterns are extracted based on a log-linear model. |
Proposed Method | We conduct word alignment with Giza++ (Och and Ney, 2000) in both directions and then apply the grow-diag heuristic (Koehn et al., 2005) for symmetrization. |
Proposed Method | where a denotes the word alignment between 6 and 6. n is the number of words in 6. |
Experiment | We run GIZA++ and then employ the grow-diag—final—and (gdfa) strategy to produce symmetric word alignments . |
Inside Context Integration | We demand that every element and its corresponding target span must be consistent with word alignment . |
Inside Context Integration | Note that we only apply the source-side PAS and word alignment for IC—PASTR extraction. |
Inside Context Integration | Thus to get a high recall for PASs, we only utilize word alignment instead of capturing the relation between bilingual elements. |
Maximum Entropy PAS Disambiguation (MEPD) Model | t_range(PAS) refers to the target range covering all the words that are reachable from the PAS via word alignment . |
Approach | First, parser and word alignment errors cause much of the transferred information to be wrong. |
Experiments | For both corpora, we performed word alignments with the open source PostCAT (Graca et al., 2009) toolkit. |
Experiments | Preliminary experiments showed that our word alignments were not always appropriate for syntactic transfer, even when they were correct for translation. |
Introduction | Nevertheless, several challenges to accurate training and evaluation from aligned bitext remain: (1) partial word alignment due to non-literal or distant translation; (2) errors in word alignments and source language parses, (3) grammatical annotation choices that differ across languages and linguistic theories (e. g., how to analyze auxiliary verbs, conjunctions). |
Related Work | (2005) found that transferring dependencies directly was not sufficient to get a parser with reasonable performance, even when both the source language parses and the word alignments are performed by hand. |
Conclusions and Future Work | In addition, word alignment is a hard constraint in our rule extraction. |
Conclusions and Future Work | We will study direct structure alignments to reduce the impact of word alignment errors. |
Experiments | We used GIZA++ (Och and Ney, 2004) and the heuristics “grow-diag-final” to generate m-to-n word alignments . |
Experiments | (2006) reports that discontinuities are very useful for translational equivalence analysis using binary-branching structures under word alignment and parse tree constraints while they are almost of no use if under word alignment constraints only. |
Cohesive Decoding | showed that a soft cohesion constraint is superior to a hard constraint for word alignment . |
Cohesive Phrasal Output | Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003). |
Experiments | Word alignments are provided by GIZA++ (Och and Ney, 2003) with grow-diag-final combination, with infrastructure for alignment combination and phrase extraction provided by the shared task. |
Introduction | Fox (2002) showed that cohesion is held in the vast majority of cases for English-French, while Cherry and Lin (2006) have shown it to be a strong feature for word alignment . |
Experiments | We also extract the bidirectional word alignments between Chinese and English using GIZA++ (Och and Ney, 2003). |
Experiments | While ptLDA-align performs better than baseline SMT and LDA, it is worse than ptLDA-dict, possibly because of errors in the word alignments , making the tree priors less effective. |
Introduction | Topic models bridge the chasm between languages using document connections (Mimno et al., 2009), dictionaries (Boyd-Graber and Resnik, 2010), and word alignments (Zhao and Xing, 2006). |
Polylingual Tree-based Topic Models | In addition, we extract the word alignments from aligned sentences in a parallel corpus. |
Decoding with the NNJ M | For aligned target words, the normal affiliation heuristic can be used, since the word alignment is available within the rule. |
Model Variations | We treat NULL as a normal target word, and if a source word aligns to multiple target words, it is treated as a single concatenated token. |
Model Variations | For word alignment , we align all of the training data with both GIZA++ (Och and Ney, 2003) and NILE (Riesa et al., 2011), and concatenate the corpora together for rule extraction. |
Neural Network Joint Model (NNJ M) | This notion of afi‘iliation is derived from the word alignment, but unlike word alignment , each target word must be affiliated with exactly one non-NULL source word. |
Experiments and evaluation | We use the hierarchical translation system that comes with the Moses SMT-package and GIZA++ to compute the word alignment , using the “grow-diag-final-and” heuristics. |
Translation pipeline | We consider gender as part of the stem, whereas the value for number is derived from the source-side: if marked for number, singular/plural nouns are distinguished during word alignment and then translated accordingly. |
Using subcategorization information | Figure 1: Deriving features from dependency-parsed English data via the word alignment . |
Using subcategorization information | to the SMT output via word alignment . |
Empirical Evaluation | We use GIZA++ (Och and Ney, 2003) to produce word alignments in Europarl: we ran it in both directions and kept the intersection of the induced word alignments . |
Empirical Evaluation | We mark arguments in two languages as aligned if there is any word alignment between the corresponding sets and if they are arguments of aligned predicates. |
Multilingual Extension | In doing so, as in much of previous work on unsupervised induction of linguistic structures, we rely on automatically produced word alignments . |
Multilingual Extension | In Section 6, we describe how we use word alignment to decide if two arguments are aligned; for now, we assume that (noisy) argument alignments are given. |
Experiments | For Moses HPB, we use “grow-diag-final-and” to obtain symmetric word alignments , 10 for the maximum phrase length, and the recommended default values for all other parameters. |
Experiments | We obtain the word alignments by running |
Head-Driven HPB Translation Model | Given the word alignment in Figure 1, Table 1 demonstrates the difference between hierarchical rules in Chiang (2007) and HD-HRs defined here. |
Introduction | Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence. |
Translation Model Architecture | The lexical weights lex(§|f) and lex(f|§) are calculated as follows, using a set of word alignments a between E and El |
Translation Model Architecture | 30(3, t) and 005, s) are not identical since the lexical probabilities are based on the unsymmetrized word alignment frequencies (in the Moses implementation which we re-implement). |
Translation Model Architecture | In the unweighted variant, the resulting features are equivalent to training on the concatenation of all training data, excepting differences in word alignment , pruning4 and rounding. |
Experiments | We ran GIZA++ on these corpora in both directions and then applied the “grow-diag-final” refinement rule to obtain word alignments . |
Integrating the Two Models into SMT | We maintain word alignments for each phrase pair in the phrase table. |
Integrating the Two Models into SMT | Whenever a hypothesis covers a new verbal predicate v, we find the target translation 6 for 7} through word alignments and then calculate its translation probability pt(e|C according to Eq. |
Related Work | Therefore they either postpone the integration of target side PASs until the whole decoding procedure is completed (Wu and Fung, 2009b), or directly project semantic roles from the source side to the target side through word alignments during decoding (Liu and Gildea, 2010). |
Cross-lingual Annotation Projection for Relation Extraction | However, these automatic annotations can be unreliable because of source text misclassification and word alignment errors; thus, it can cause a critical falling-off in the annotation projection quality. |
Graph Construction | Das and Petrov (Das and Petrov, 2011) proposed a graph-based bilingual projection of part-of-speech tagging by considering the tagged words in the source language as labeled examples and connecting them to the unlabeled words in the target language, while referring to the word alignments . |
Graph Construction | If the context vertices U S for the source language sentences are defined, then the units of context in the target language can also be created based on the word alignments . |
Implementation | We used the GIZA++ software 3 (Och and Ney, 2003) to obtain the word alignments for each bi-sentence in the parallel corpus. |
Experiments | To obtain word-level alignments, we ran GIZA++ (Och and Ney, 2000) on the remaining corpus in both directions, and applied the “grow-diag-final” refinement rule (Koehn et al., 2005) to produce the final many-to-many word alignments . |
Introduction | According to the word alignments , we define bracketable and unbracketable instances. |
The Acquisition of Bracketing Instances | Let c and e be the source sentence and the target sentence, W be the word alignment between them, T be the parse tree of c. We define a binary bracketing instance as a tuple (b,7'(cinj),7'(Cj+1nk),7'(cink)> where b E {bracketable,unbracketable}, cinj and cj+1nlc are two neighboring source phrases and 7'(T, 3) (7(3) for short) is a subtree function which returns the minimal subtree covering the source sequence 3 from the source parse tree T. Note that 7(cz-nk) includes both 7(cz-nj) and flog-+1.19). |
The Acquisition of Bracketing Instances | 1: Input: sentence pair (0, e), the parse tree T of c and the word alignment W between c and e 2: QR :2 (Z) 3: for each (i,j, k) E cdo 4: if There exist a target phrase can” aligned to Cinj and ep,,q aligned to Cj+1,_k; then |
Introduction | We first identify anchors as regions in the source sentences around which ambiguous reordering patterns frequently occur and chunks as regions that are consistent with word alignment which may span multiple translation units at decoding time. |
Maximal Orientation Span | We also attach the word indices as the superscript of the source words and project the indices to the target words aligned , as such “have5” suggests that the word “have” is aligned to the 5-th source word, i.e. |
Maximal Orientation Span | Note that to facilitate the projection, the rules must come with internal word alignment in practice. |
Maximal Orientation Span | the source sentence, the complete translation and the word alignment . |
Corpora and baselines | All conditions use word alignments produced by sequential iterations of IBM model 1, HMM, and IBM model 4 in GIZA++, followed by “diag-and” symmetrization (Koehn et al., 2003). |
Features | We add inflection features for all words aligned to at least one English verb, adjective, noun, pronoun, or determiner, excepting definite and indefinite articles. |
Features | These features would be more properly defined based on the identity of the target word aligned to these quantifiers, but little ambiguity seems to arise from this substitution in practice. |
Features | These dependencies are inferred from source-side annotation Via word alignments , as depicted in figure 1, without any use of target-side dependency parses. |
Experiments | using GIZA++ in both directions, and the diag-grow-final heuristic is used to refine symmetric word alignment . |
Related Work | They proposed a bilingual topical admixture approach for word alignment and assumed that each word-pair follows a topic- |
Related Work | They reported extensive empirical analysis and improved word alignment accuracy as well as translation quality. |
Automatic Evaluation Metrics | Given a pair of strings to compare (a system translation and a reference translation), METEOR (Banerjee and Lavie, 2005) first creates a word alignment between the two strings. |
Automatic Evaluation Metrics | These word alignments are created incrementally through a series of stages, where each stage only adds alignments between unigrams which have not been matched in previous stages. |
Introduction | Although a maximum weight bipartite graph was also used in the recent work of (Taskar et al., 2005), their focus was on learning supervised models for single word alignment between sentences from a source and target language. |
Decoding with Sense-Based Translation Model | During decoding, we keep word alignments for each translation rule. |
Decoding with Sense-Based Translation Model | Whenever a new source word 0 is translated, we find its translation 6 Via the kept word alignments . |
Experiments | We ran Giza++ on the training data in two directions and applied the “grow-diag-final” refinement rule (Koehn et al., 2003) to obtain word alignments . |
Model Training and Application 3.1 Training | For the bilingual corpus, we also perform word alignment to get correspondences between source and target words. |
Model Training and Application 3.1 Training | According to word alignment results, we classify |
Model Training and Application 3.1 Training | We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair. |
Introduction | XMEANT is obtained by (1) using simple lexical translation probabilities, instead of the monolingual context vector model used in MEANT for computing the semantic role fillers similarities, and (2) incorporating bracketing ITG constrains for word alignment within the semantic role fillers. |
Introduction | than that of the reference translation, and on the other hand, the BITG constraints the word alignment more accurately than the heuristic bag-of-word aggregation used in MEANT. |
Results | It is also consistent with results observed while estimating word alignment probabilities, where BITG constraints outperformed alignments from GIZA++ (Saers and Wu, 2009). |
Word Sense Disambiguation | Then, word alignment was performed on the parallel corpora with the GIZA+ + software (Och and Ney, 2003). |
Word Sense Disambiguation | For each English morphological root 6, the English sentences containing its occurrences were eXtracted from the word aligned output of GIZA++, as well as the corresponding translations of these occurrences. |
Word Sense Disambiguation | To minimize noisy word alignment result, translations with no Chinese character were deleted, and we further removed a translation when it only appears once, or its frequency is less than 10 and also less than 1% of the frequency of 6. |
Experiment | GIZA++ (Och and Ney, 2003) and the heuristics “grow-diag-final-and” are used to generate m-to-n word alignments . |
Experiment | This is mainly because tree sequence rules are only sensitive to word alignment while tree rules, even extracted from a forest (like in FT2S), are also limited by syntax according to grammar parsing rules. |
Forest-based tree sequence to string model | Given a source forest F and target translation T S as well as word alignment A, our translation model is formulated as: |
The effect of the Italian connectives on the LIS translation | Word Alignment . |
The effect of the Italian connectives on the LIS translation | For this purpose we used the Berkeley Word Aligner (BWA) 1 (Denero, 2007), a general tool for aligning sentences in bilingual corpora. |
The effect of the Italian connectives on the LIS translation | Bold dashed lines show word alignment . |
For each 1‘ E V[Sk] : | If a target word in t is a gap word, we suppose there is a word alignment between the target gap word and the source-side null. |
Phrase Pair Refinement and Parameterization | According to our analysis, we find that the biggest problem is that in the target-side of the phrase pair, there are two or more identical words aligned to the same source- |
Probabilistic Bilingual Lexicon Acquisition | We employ the same algorithm used in (Munteanu and Marcu, 2006) which first use the GIZA++ (with grow-diag-final-and heuristic) to obtain the word alignment between source and target words, and then calculate the association strength between the aligned words. |
Experiments & Results 4.1 Experimental Setup | Word alignment is done using GIZA++ (Och and Ney, 2003). |
Experiments & Results 4.1 Experimental Setup | The resulting word alignments are used to extract the translations for each oov. |
Experiments & Results 4.1 Experimental Setup | The correctness of this gold standard is limited to the size of the parallel data used as well as the quality of the word alignment software toolkit, and is not 100% precise. |
Conclusions | This may suggest that adding shallow semantic information is more effective than introducing complex structured constraints, at least for the specific word alignment model we experimented with in this work. |
Introduction | This may suggest that compared to introducing complex structured constraints, incorporating shallow semantic information is both more effective and computationally inexpensive in improving the performance, at least for the specific word alignment model tested in this work. |
Learning QA Matching Models | As a result, an “ideal” word alignment structure should not link words in this clause to those in the question. |
Experimental Setup | Additionally, we compute a word alignment score to investigate the extent to which the input text is used to construct correct analyses. |
Results | The word alignment results from Table 2 indicate that the learners are mapping the correct words to actions for documents that are successfully completed. |
Results | For example, the models that perform best in the Windows domain achieve nearly perfect word alignment scores. |
Pivot Methods for Phrase-based SMT | (2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s |
Pivot Methods for Phrase-based SMT | Let a1 and a2 represent the word alignment information inside the phrase pairs (5,13) and (13, 2?) |
Pivot Methods for Phrase-based SMT | Based on the the induced word alignment information, we estimate the co-occurring frequencies of word pairs directly from the induced phrase |
Experiments | The system configurations are as follows: GIZA++ (Och and Ney, 2003) is used to obtain the bidirectional word alignments . |
Problem Formulation | is the final translation; [tm_s,tm_t,tm_f,s_a,tm_a] are the associated information of the best TM sentence-pair; tm_s and tm_t denote the corresponding TM sentence pair; tm_f denotes its associated fuzzy match score (from 0.0 to 1.0); 8_a is the editing operations between tm_8 and s; and tm_a denotes the word alignment between tm_s and tmi. |
Problem Formulation | ), we can find its corresponding TM source phrase tm_sa(k) and all possible TM target phrases (each of them is denoted by tmiaw» with the help of corresponding editing operations 8_a and word alignment tm_a. |
Experiments | Many of the remaining errors are due to the garbage collection phenomenon familiar from word alignment models (Moore, 2004; Liang et al., 2006). |
Generative Model | The alignment aspect of our model is similar to the HMM model for word alignment (Ney and Vogel, 1996). |
Generative Model | (2008) perform joint segmentation and word alignment for machine translation, but the nature of that task is different from ours. |
Experiment | Word alignments of each paraphrase pair are trained by GIZA++. |
Paraphrasing for Web Search | Word alignments within each paraphrase pair are generated using GIZA++ (Och and Ney, 2000). |
Paraphrasing for Web Search | In order to enable our paraphrasing model to learn the preferences on different paraphrasing strategies according to the characteristics of web queries, we design search-oriented features2 based on word alignments within Q and Q’, which can be described as follows: |