Abstract | In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models . |
Abstract | We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. |
Adding agreement constraints | They suggest how this framework can be used to encourage two word alignment models to agree during training. |
Adding agreement constraints | Most MT systems train an alignment model in each direction and then heuristically combine their predictions. |
Introduction | In this work, we show that by changing the way the word alignment models are trained and |
Introduction | We present extensive experimental results evaluating a new training scheme for unsupervised word alignment models : an extension of the Expectation Maximization algorithm that allows effective injection of additional information about the desired alignments into the unsupervised training process. |
Phrase-based machine translation | We then train the competing alignment models and compute competing alignments using different decoding schemes. |
Statistical word alignment | 2.1 Baseline word alignment models |
Statistical word alignment | Figure 1 illustrates the mapping between the usual HMM notation and the HMM alignment model . |
Statistical word alignment | All word alignment models we consider are normally trained using the Expectation Maximization |
Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. |
Abstract | Our alignment model is directional, similar to the generative IBM models (Brown et al., 1993). |
Introduction | the HMM alignment model and achieved state-of-the-art performance. |
Introduction | We assume that this property would fit with a word alignment task, and we propose an RNN-based word alignment model . |
Introduction | The NN-based alignment models are supervised models. |
Related Work | Various word alignment models have been proposed. |
Related Work | 2.1 Generative Alignment Model |
Related Work | 2.2 FFNN-based Alignment Model |
Experimental Results | We trained the model on a portion of FBIS data that has been used previously for alignment model evaluation (Ayan and Dorr, 2006; Haghighi et al., 2009; DeNero and Klein, 2010). |
Introduction | This result is achieved by embedding two directional HMM-based alignment models into a larger bidirectional graphical model. |
Introduction | Moreover, the bidirectional model enforces a one-to-one phrase alignment structure, similar to the output of phrase alignment models (Marcu and Wong, 2002; DeNero et al., 2008), unsupervised inversion transduction grammar (ITG) models (Blunsom et al., 2009), and supervised ITG models (Haghighi et al., 2009; DeNero and Klein, 2010). |
Model Definition | Our model contains two directional hidden Markov alignment models , which we review in Section 2.1, along with additional structure that that we introduce in Section 2.2. |
Model Definition | 2.1 HMM-Based Alignment Model |
Model Definition | This section describes the classic hidden Markov model (HMM) based alignment model (Vogel et al., 1996). |
Related Work | In addition, supervised word alignment models often use the output of directional unsupervised aligners as features or pruning signals. |
Related Work | This approach to jointly learning two directional alignment models yields state-of-the-art unsupervised performance. |
Generating reference reordering from parallel sentences | Complementing this model, we build an alignment model (P(a|ws,wt,7rs,7rt)) that scores alignments a given the source and target sentences and their predicted reorderings according to source and target reordering models. |
Generating reference reordering from parallel sentences | The model (C(773|ws, wt, a)) helps to produce better reference reorderings for training our final reordering model given fixed machine alignments and the alignment model (P (a|ws, Wt, 773, 79)) helps improve the machine alignments taking into account information from reordering models. |
Generating reference reordering from parallel sentences | St 2: Feed predictions the reordering models to the alignment model |
A Generic Phrase Training Procedure | We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair. |
A Generic Phrase Training Procedure | Beginning with a flat lexicon, we train IBM Model-l word alignment model with 10 iterations for each translation direction. |
A Generic Phrase Training Procedure | We then train HMM word alignment models (Vogel et al., 1996) in two directions simultaneously by merging statistics collected in the |
Features | All these features are data-driven and defined based on models, such as statistical word alignment model or language model. |
Features | In a statistical generative word alignment model (Brown et al., 1993), it is assumed that (i) a random variable a specifies how each target word fj is generated by (therefore aligned to) a source 1 word eaj; and (ii) the likelihood function f (f, a|e) specifies a generative procedure from the source sentence to the target sentence. |
Features | This distribution is applicable to all word alignment models that follow assumptions (i) and (ii). |
Introduction | We employ features based on word alignment models and alignment matrix. |
Abstract | In contrast, alignment based methods used word alignment model to fulfill this task, which could avoid parsing errors without using parsing. |
Abstract | We further combine syntactic patterns with alignment model by using a partially supervised framework and investigate whether this combination is useful or not. |
Introduction | Nevertheless, we notice that the alignment model is a statistical model which needs sufficient data to estimate parameters. |
Introduction | To answer these questions, in this paper, we adopt a unified framework to extract opinion targets from reviews, in the key component of which we vary the methods between syntactic patterns and alignment model . |
Introduction | Furthermore, this paper naturally addresses another question: is it useful for opinion targets extraction when we combine syntactic patterns and word alignment model into a unified model? |
Opinion Target Extraction Methodology | In the first component, we respectively use syntactic patterns and unsupervised word alignment model (WAM) to capture opinion relations. |
Opinion Target Extraction Methodology | In addition, we employ a partially supervised word alignment model (PSWAM) to incorporate syntactic information into WAM. |
Opinion Target Extraction Methodology | 3.1.2 Unsupervised Word Alignment Model |
Related Work | (Liu et al., 2013) extend Liu’s method, which is similar to our method and also used a partially supervised alignment model to extract opinion targets from reviews. |
Abstract | We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models . |
Experimental Setup | directional HMM models as our baseline and monolingual alignment models . |
Introduction | To capture this source of information, we present a novel extension that combines the BI-NER model with two unidirectional HMM-based alignment models , and perform joint decoding of NER and word alignments. |
Introduction | The new model (denoted as BI-NER-WA) factors over five components: one NER model and one word alignment model for each language, plus a joint NER-alignment model which not only enforces NER label agreements but also facilitates message passing among the other four components. |
Joint Alignment and NER Decoding | Most commonly used alignment models , such as the IBM models and HMM-based aligner are unsupervised learners, and can only capture simple distortion features and lexical translational features due to the high complexity of the structure prediction space. |
Joint Alignment and NER Decoding | We name the Chinese-to-English aligner model as m(Be) and the reverse directional model 71(Bf Be is a matrix that holds the output of the Chinese-to-English aligner. |
Joint Alignment and NER Decoding | In our experiments, we used two HMM-based alignment models . |
Abstract | We describe in detail how we adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method introduced in speech recognition to the HMM-based word alignment model , in which bilingual word embedding is discrimina-tively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. |
Conclusion | Secondly, we want to explore the possibility of unsupervised training of our neural word alignment model , without reliance of alignment result of other models. |
DNN for word alignment | Our DNN word alignment model extends classic HMM word alignment model (Vogel et al., 1996). |
DNN for word alignment | In the classic HMM word alignment model , context is not considered in the lexical translation probability. |
DNN for word alignment | Vocabulary V of our alignment model consists of a source vocabulary V6 and a target vocabulary Vf. |
Experiments and Results | In future we would like to explore whether our method can improve other word alignment models . |
Experiments and Results | embeddings trained by our word alignment model . |
Training | As we do not have a large manually word aligned corpus, we use traditional word alignment models such as HMM and IBM model 4 to generate word alignment on a large parallel corpus. |
Training | Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance. |
Abstract | Specifically, given a web page, the method contains four steps: 1) preprocessing: parse the web page into a DOM tree and segment the inner text of each node into snippets; 2) seed mining: identify potential translation pairs (seeds) using a word based alignment model which takes both translation and transliteration into consideration; 3) pattern learning: learn generalized patterns with the identified seeds; 4) pattern based mining: extract all bilingual data in the page using the learned patterns. |
Adaptive Pattern-based Bilingual Data Mining | In this step, every adjacent snippet pair in different languages will be checked by an alignment model to see if it is a potential translation pair. |
Adaptive Pattern-based Bilingual Data Mining | The alignment model combines a translation and a transliteration model to compute the likelihood of a bilingual snippet pair being a translation pair. |
Experimental Results | In Table 3, “Without pattern” means that we simply treat those seed pairs found by the alignment model as final bilingual data. |
Introduction | 2) Seed mining: identify potential translation pairs (seeds) using an alignment model which takes both translation and transliteration into consideration; |
Overview of the Proposed Approach | The seed mining module receives the inner text of each selected tree node and uses a word-based alignment model to identify potential translation pairs. |
Overview of the Proposed Approach | The alignment model can handle both translation and transliteration in a unified framework. |
Abstract | On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. |
Evaluation | IBM Model 1 and HMM alignment model are re-implemented as they are required by the three ITG pruning methods. |
Introduction | (2009) do pruning based on the probabilities of links from a simpler alignment model (viz. |
The DPDI Framework | The simpler alignment model we used is HMM. |
The DPDI Framework | The four links are produced by some simpler alignment model like HMM. |
The DPDI Framework | f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g. |
Conclusion | However, our best system does not apply VB to a single probability model, as we found an appreciable benefit from bootstrapping each model from simpler models, much as the IBM word alignment models are usually trained in succession. |
Introduction | As these word-level alignment models restrict the word alignment complexity by requiring each target word to align to zero or one source words, results are improved by aligning both source-to-target as well as target-to-source, |
Introduction | Ideally, such a procedure would remedy the deficiencies of word-level alignment models , including the strong restrictions on the form of the alignment, and the strong independence assumption between words. |
Phrasal Inversion Transduction Grammar | Our second approach was to constrain the search space using simpler alignment models , which has the further benefit of significantly speeding up training. |
Phrasal Inversion Transduction Grammar | First we train a lower level word alignment model , then we place hard constraints on the phrasal alignment space using confident word links from this simpler model. |
Variational Bayes for ITG | alignment models is the EM algorithm (Brown et al., 1993) which iteratively updates parameters to maximize the likelihood of the data. |
Experiments | 4.3 Integration into Word Alignment Model |
Experiments | 4.3.1 Modified EM Training of the Word Alignment Models |
Experiments | The normal translation probability pta( f |e) of the word alignment models is computed with relative frequency estimates. |
Background | A first-order HMM alignment model (Vogel et al., 1996) is an HMM of length I + 1 where the hidden state at position i E [I ]0 is the aligned index j E [J ]0, and the transition score takes into account the previously aligned index 3" E [J ]0.1 Formally, define the set of possible HMM alignments as X C {0,1}([I]0X[J]0)U([I]X[J]0X[J]0) with |
Bidirectional Alignment | The directional bias of the e—>f and f —>e alignment models may cause them to produce differing alignments. |
Bidirectional Alignment | In this work, we instead consider a bidirectional alignment model that jointly considers both directional models. |
Conclusion | We have introduced a novel Lagrangian relaxation algorithm for a bidirectional alignment model that uses incremental constraint addition and coarse-to-fine pruning to find exact solutions. |
Experiments | Our experimental results compare the accuracy and optimality of our decoding algorithm to directional alignment models and previous work on this bidirectional model. |
Alignment Methods | These may be words in word-based alignment models or single characters in character-based alignment models.1 We define our alignment as of, where each element is a span ak = (s, t, u, 2)) indicating that the target string es, . |
Alignment Methods | The most well-known and widely-used models for bitext alignment are for one-to-many alignment, including the IBM models (Brown et al., 1993) and HMM alignment model (Vogel et al., 1996). |
Introduction | One barrier to applying many-to-many alignment models to character strings is training cost. |
Introduction | Secondly, we describe a method to seed the search process using counts of all substring pairs in the corpus to bias the phrase alignment model . |
Related Work on Data Sparsity in SMT | Sparsity causes trouble for alignment models , both in the form of incorrectly aligned uncommon words, and in the form of garbage collection, where uncommon words in one language are incorrectly aligned to large segments of the sentence in the other language (Och and Ney, 2003). |
Parallel Data Extraction | The number of parallel messages is estimated by running our alignment model , and checking if 7' > gb, where gb was set empirically initially, and optimized after obtaining annotated data, which will be detailed in 5.1. |
Parallel Data Extraction | Finally, we run our alignment model described in section 3, and obtain the parallel segments and their scores, which measure how likely those segments are parallel. |
Parallel Segment Retrieval | Then, we would use an word alignment model (Brown et al., 1993; Vogel et al., 1996), with source s = sup, . |
Parallel Segment Retrieval | Firstly, word alignment models generally attribute higher probabilities to smaller segments, since these are the result of a smaller product chain of probabilities. |
Conclusions | This may suggest that adding shallow semantic information is more effective than introducing complex structured constraints, at least for the specific word alignment model we experimented with in this work. |
Introduction | Compared to the previous work, our latent alignment model improves the result on a benchmark dataset by a wide margin — the mean average precision (MAP) and mean reciprocal rank (MRR) scores are increased by 25.6% and 18.8%, respectively. |
Introduction | Second, while the latent alignment model performs better than unstructured models, the difference diminishes after adding the enhanced lexical semantics information. |
Introduction | This may suggest that compared to introducing complex structured constraints, incorporating shallow semantic information is both more effective and computationally inexpensive in improving the performance, at least for the specific word alignment model tested in this work. |
Experimental Evaluation | This is the first reported result in which an unsupervised phrase alignment model has built a phrase table directly from model probabilities and achieved results that compare to heuristic phrase extraction. |
Hierarchical ITG Model | Previous research has used a variety of sampling methods to learn Bayesian phrase based alignment models (DeNero et al., 2008; Blunsom et al., 2009; Blunsom and Cohn, 2010). |
Introduction | The model is similar to previously proposed phrase alignment models based on inversion transduction grammars (ITGs) (Cherry and Lin, 2007; Zhang et al., 2008; Blunsom et al., 2009), with one important change: ITG symbols and phrase pairs are generated in the opposite order. |
Substructure Spaces for BTKs | 5 Alignment Model |
Substructure Spaces for BTKs | Given feature spaces defined in the last two sections, we propose a 2-phase subtree alignment model as follows: |
Substructure Spaces for BTKs | In order to evaluate the effectiveness of the alignment model and its capability in the applications requiring syntactic translational equivalences, we employ two corpora to carry out the subtree alignment evaluation. |
Experiments | We align the same core subset with our trained hypergraph alignment model , and extract a second set of translation rules. |
Introduction | Generative alignment models like IBM Model-4 (Brown et al., 1993) have been in wide use for over 15 years, and while not perfect (see Figure 1), they are completely unsupervised, requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. |
Introduction | We present in this paper a discriminative alignment model trained on relatively little data, with a simple, yet powerful hierarchical search procedure. |
Background | The following constraints on links are assumed by some or all alignment models: |
Background | We refer to an alignment model that assumes all three constraints as a pure one-to-one (ll) model. |
Extrinsic evaluation | The TiMBL L2P generation method (Table 2) is applicable only to the 1-1 alignment models . |
Experiments | They employed a word alignment model to capture opinion relations among words, and then used a random walking algorithm to extract opinion targets. |
Introduction | They have investigated a series of techniques to enhance opinion relations identification performance, such as nearest neighbor rules (Liu et al., 2005), syntactic patterns (Zhang et al., 2010; Popescu and Etzioni, 2005), word alignment models (Liu et al., 2012; Liu et al., 2013b; Liu et al., 2013a), etc. |
Related Work | (Liu et al., 2012; Liu et al., 2013a; Liu et al., 2013b) employed word alignment model to capture opinion relations rather than syntactic parsing. |
Experiments | This task highlights the evolution and alignment models . |
Introduction | Finally, an alignment model maps the flat word lists to cognate groups. |
Introduction | Inference requires a combination of message-passing in the evolutionary model and iterative bipartite graph matching in the alignment model . |
Graph Features | Thus assuming there is an alignment model that is able to tell how likely one relation maps to the original question, we add extra alignment-based features for the incoming and outgoing relation of each node. |
Graph Features | We describe such an alignment model in § 5. |
Relation Mapping | Since the relations on one side of these pairs are not natural sentences, we ran the most simple IBM alignment Model 1 (Brown et al., 1993) to estimate the translation probability with GIZA++ (Och and Ney, 2003). |