Abstract | Bidirectional models of word alignment are an appealing alternative to post-hoc combinations of directional word aligners . |
Background | The focus of this work is on the word alignment decoding problem. |
Background | Before turning to the model of interest, we first introduce directional word alignment . |
Background | 2.1 Word Alignment |
Introduction | Word alignment is a critical first step for building statistical machine translation systems. |
Introduction | In order to ensure accurate word alignments, most systems employ a post-hoc symmetrization step to combine directional word aligners , such as IBM Model 4 (Brown et al., 1993) or hidden Markov model (HMM) based aligners (Vogel et al., 1996). |
Introduction | We begin in Section 2 by formally describing the directional word alignment problem. |
Related Work | Cromieres and Kurohashi (2009) use belief propagation on a factor graph to train and decode a one-to-one word alignment problem. |
Related Work | (2008) use posterior regularization to constrain the posterior probability of the word alignment problem to be symmetric and bij ective. |
Data and Tools | 3.2 Word Alignments |
Data and Tools | In our approach, word alignments for the parallel text are required. |
Data and Tools | We perform word alignments with the open source GIZA++ toolkit5. |
Experiments | By using IGT Data, not only can we obtain more accurate word alignments , but also extract useful cross-lingual information for the resource-poor language. |
Our Approach | In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U. |
Our Approach | We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments: |
Our Approach | By reducing unaligned edges to their deleXicalized forms, we can still use those deleXicalized features, such as part-of-speech tags, for those unaligned edges, and can address problem that automatically generated word alignments include errors. |
Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. |
Abstract | The RNN-based model outperforms the feed-forward neural network-based model (Yang et al., 2013) as well as the IBM Model 4 under Japanese-English and French-English word alignment tasks, and achieves comparable translation performance to those baselines for Japanese-English and Chinese-English translation tasks. |
Introduction | Automatic word alignment is an important task for statistical machine translation. |
Introduction | We assume that this property would fit with a word alignment task, and we propose an RNN-based word alignment model. |
Introduction | (2013) trained their model from word alignments produced by traditional unsupervised probabilistic models. |
Related Work | Various word alignment models have been proposed. |
Related Work | As an instance of discriminative models, we describe an FFNN-based word alignment model (Yang et al., 2013), which is our baseline. |
Training | GEN is a subset of all possible word alignments (I), which is generated by beam search. |
Training | We evaluated the alignment performance of the proposed models with two tasks: Japanese-English word alignment with the Basic Travel Expression Corpus (BTEC) (Takezawa et a1., 2002) and French-English word alignment with the Hansard dataset (H ansards) from the 2003 NAACL shared task (Mihalcea and Pedersen, 2003). |
Introduction | DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment , language modelling, translation modelling and distortion modelling. |
Introduction | (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method to HMM-based word alignment model. |
Phrase Pair Embedding | where, fa, is the corresponding target word aligned to 6, , and it is similar for ea]. |
Phrase Pair Embedding | The recurrent neural network is trained with word aligned bilingual corpus, similar as (Auli et al., 2013). |
Related Work | (2013) adapt and extend CD-DNN-HMM (Dahl et al., 2012) to word alignment . |
Related Work | Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance. |
Related Work | Unfortunately, the better word alignment result generated by this model, cannot bring significant performance improvement on a end-to-end SMT evaluation task. |
Abstract | We rederive all the steps of KN smoothing to operate on count distributions instead of integral counts, and apply it to two tasks where KN smoothing was not applicable before: one in language model adaptation, and the other in word alignment . |
Introduction | One is language model domain adaptation, and the other is word alignment using the IBM models (Brown et al., 1993). |
Word Alignment | In this section, we show how to apply expected KN to the IBM word alignment models (Brown et al., 1993). |
Word Alignment | Of course, expected KN can be applied to other instances of EM besides word alignment . |
Word Alignment | The IBM models and related models define probability distributions p(a, f | e, 6), which model how likely a French sentence f is to be generated from an English sentence e with word alignment a. |
Experiments | They employed a word alignment model to capture opinion relations among words, and then used a random walking algorithm to extract opinion targets. |
Experiments | Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments , which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words. |
Introduction | They have investigated a series of techniques to enhance opinion relations identification performance, such as nearest neighbor rules (Liu et al., 2005), syntactic patterns (Zhang et al., 2010; Popescu and Etzioni, 2005), word alignment models (Liu et al., 2012; Liu et al., 2013b; Liu et al., 2013a), etc. |
Related Work | (Liu et al., 2012; Liu et al., 2013a; Liu et al., 2013b) employed word alignment model to capture opinion relations rather than syntactic parsing. |
The Proposed Method | This approach models capturing opinion relations as a monolingual word alignment process. |
The Proposed Method | After performing word alignment , we obtain a set of word pairs composed of a noun (noun phrase) and its corresponding modified word. |
A semantic span can include one or more eus. | Instead, we reserve the cohesive information in the training process by converting the original source sentence into tagged-flattened CSS and then perform word alignment and extract the translation rules from the bilingual flattened source CSS and the target string. |
A semantic span can include one or more eus. | We then perform word alignment on the modified bilingual sentences, and extract the new translation rules based on the new alignment, as shown in Figure 3(b) to Figure 3(c). |
A semantic span can include one or more eus. | bound by the word alignment , the alignment complies with EUC only if there is no overlap between pSA and pSB. |
Experiments | We obtain the word alignment with the grow-diag-final-and strategy with GIZA++. |
Experiments | The merits of “Flattened Rule” are twofold: 1) In training process, the new word alignment upon modified sentence pairs can align transitional expressions to flattened CSS tags; 2) In decoding process, the CSS-based rules are more discriminating than the original rules, which is more flexible than “TFS”. |
Introduction | Our syntactic constituent reordering model considers context free grammar (CFG) rules in the source language and predicts the reordering of their elements on the target side, using word alignment information. |
Introduction | We introduce novel soft reordering constraints, using syntactic constituents or semantic roles, composed over word alignment information in translation rules used during decoding time; |
Unified Linguistic Reordering Models | parse tree and its word alignment links to the target language. |
Unified Linguistic Reordering Models | Unlike the conventional phrase and lexical translation features, whose values are phrase pair-determined and thus can be calculated offline, the value of the reordering features can only be obtained during decoding time, and requires word alignment information as well. |
Unified Linguistic Reordering Models | Before we present the algorithm integrating the reordering models, we define the following functions by assuming XP,- and XP,-+1 are the constituent pair of interest in CFG rule cfg, H is the translation hypothesis and a is its word alignment: |
Decoding with the NNJ M | For aligned target words, the normal affiliation heuristic can be used, since the word alignment is available within the rule. |
Model Variations | We treat NULL as a normal target word, and if a source word aligns to multiple target words, it is treated as a single concatenated token. |
Model Variations | For word alignment , we align all of the training data with both GIZA++ (Och and Ney, 2003) and NILE (Riesa et al., 2011), and concatenate the corpora together for rule extraction. |
Neural Network Joint Model (NNJ M) | This notion of afi‘iliation is derived from the word alignment, but unlike word alignment , each target word must be affiliated with exactly one non-NULL source word. |
Experiments | We also extract the bidirectional word alignments between Chinese and English using GIZA++ (Och and Ney, 2003). |
Experiments | While ptLDA-align performs better than baseline SMT and LDA, it is worse than ptLDA-dict, possibly because of errors in the word alignments , making the tree priors less effective. |
Introduction | Topic models bridge the chasm between languages using document connections (Mimno et al., 2009), dictionaries (Boyd-Graber and Resnik, 2010), and word alignments (Zhao and Xing, 2006). |
Polylingual Tree-based Topic Models | In addition, we extract the word alignments from aligned sentences in a parallel corpus. |
Experiments | using GIZA++ in both directions, and the diag-grow-final heuristic is used to refine symmetric word alignment . |
Related Work | They proposed a bilingual topical admixture approach for word alignment and assumed that each word-pair follows a topic- |
Related Work | They reported extensive empirical analysis and improved word alignment accuracy as well as translation quality. |
Introduction | XMEANT is obtained by (1) using simple lexical translation probabilities, instead of the monolingual context vector model used in MEANT for computing the semantic role fillers similarities, and (2) incorporating bracketing ITG constrains for word alignment within the semantic role fillers. |
Introduction | than that of the reference translation, and on the other hand, the BITG constraints the word alignment more accurately than the heuristic bag-of-word aggregation used in MEANT. |
Results | It is also consistent with results observed while estimating word alignment probabilities, where BITG constraints outperformed alignments from GIZA++ (Saers and Wu, 2009). |
Decoding with Sense-Based Translation Model | During decoding, we keep word alignments for each translation rule. |
Decoding with Sense-Based Translation Model | Whenever a new source word 0 is translated, we find its translation 6 Via the kept word alignments . |
Experiments | We ran Giza++ on the training data in two directions and applied the “grow-diag-final” refinement rule (Koehn et al., 2003) to obtain word alignments . |