Abstract | In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). |
Abstract | We describe in detail how we adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method introduced in speech recognition to the HMM-based word alignment model, in which bilingual word embedding is discrimina-tively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. |
Abstract | Experiments on a large scale English-Chinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score. |
DNN for word alignment | Our DNN word alignment model extends classic HMM word alignment model (Vogel et al., 1996). |
DNN for word alignment | Given a sentence pair (e, f), HMM word alignment takes the following form: |
Introduction | Inspired by successful previous works, we propose a new DNN-based word alignment method, which exploits contextual and semantic similarities between words. |
Introduction | Figure 1: Two examples of word alignment |
Introduction | In the rest of this paper, related work about DNN and word alignment are first reviewed in Section 2, followed by a brief introduction of DNN in Section 3. |
Related Work | For the related works of word alignment , the most popular methods are based on generative models such as IBM Models (Brown et al., 1993) and HMM (Vogel et al., 1996). |
Related Work | Discriminative approaches are also proposed to use hand crafted features to improve word alignment . |
Abstract | We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system. |
Experimental Results | Then we run GIZA++ (Och and Ney, 2000) to generate the word alignment for each direction and apply grow-diagonal-final (Koehn et al., 2003), same as in the baseline. |
Integrating Empty Categories in Machine Translation | With the preprocessed MT training corpus, an unsupervised word aligner, such as GIZA++, can be used to generate automatic word alignment , as the first step of a system training pipeline. |
Integrating Empty Categories in Machine Translation | The effect of inserting ECs is twofold: first, it can impact the automatic word alignment since now it allows the target-side words, especially the function words, to align to the inserted ECs and fix some errors in the original word alignment ; second, new phrases and rules can be extracted from the preprocessed training data. |
Integrating Empty Categories in Machine Translation | A few examples of the extracted Hiero rules and tree-to-string rules are also listed, which we would not have been able to extract from the original incorrect word alignment when the *pro* was missing. |
Introduction | In addition, the pro-drop problem can also degrade the word alignment quality in the training data. |
Introduction | A sentence pair observed in the real data is shown in Figure 1 along with the word alignment obtained from an automatic word aligner , where the English subject pronoun |
Introduction | Figure 1: Example of incorrect word alignment due to missing pronouns on the Chinese side. |
Abstract | However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. |
Abstract | We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment , by combining two monolingual tagging models with two unidirectional alignment models. |
Abstract | Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines. |
Bilingual NER by Agreement | We also assume that a set of word alignments (A = {(z’,j) : e,- <—> fj}) is given by a word aligner and remain fixed in our model. |
Bilingual NER by Agreement | The assumption in the hard agreement model can also be violated if there are word alignment errors. |
Introduction | In this work, we first develop a bilingual NER model (denoted as BI-NER) by embedding two monolingual CRF-based NER models into a larger undirected graphical model, and introduce additional edge factors based on word alignment (WA). |
Introduction | Our method does not require any manual annotation of word alignments or named entities over the bilingual training data. |
Introduction | The aforementioned BI-NER model assumes fixed alignment input given by an underlying word aligner . |
Joint Alignment and NER Decoding | To capture this intuition, we extend the BI-NER model to jointly perform word alignment and NER decoding, and call the resulting model BI-NER-WA. |
Abstract | Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. |
Abstract | In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. |
Abstract | To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. |
Introduction | These methods use a small corpus of manual word alignments (where the words in the source sentence are manually aligned to the words in the target sentence) to learn a model to preorder the source sentence to match target order. |
Introduction | In this paper, we build upon the approach in (Visweswariah et al., 2011) which uses manual word alignments for learning a reordering model. |
Introduction | Specifically, we show that we can significantly improve reordering performance by using a large number of sentence pairs for which manual word alignments are not available. |
Conclusion | We have shown that the word alignment models IBM-3 and IBM-4 can be turned into nondeficient |
Introduction | While most people think of the translation and word alignment models IBM-3 and IBM-4 as inherently deficient models (i.e. |
Introduction | The source code of this project is available in our word alignment software RegAlignerl, version 1.2 and later. |
Introduction | Related Work Today’s most widely used models for word alignment are still the models IBM 1-5 of Brown et al. |
The models IBM-3, IBM-4 and IBM-5 | bility p(fi] |e{) of getting the foreign sentence as a translation of the English one is modeled by introducing the word alignment a as a hidden variable: |
The models IBM-3, IBM-4 and IBM-5 | ,I, decide on the number (I),-of foreign words aligned to 6,. |
The models IBM-3, IBM-4 and IBM-5 | Foreachi=l,2,...,I, andk = l,...,<l>z-decide on (a) the identity fiyc of the next foreign word aligned to 6,. |
Training the New Variants | For the task of word alignment , we infer the parameters of the models using the maximum likeli- |
Training the New Variants | This task is also needed for the actual task of word alignment (annotating a given sentence pair with an alignment). |
Abstract | Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baselinel. |
Experiments | Therefore, it is important to use name-replaced corpora for rule extraction to fully take advantage of improved word alignment . |
Experiments | 5.4 Word Alignment |
Experiments | It is also important to investigate the impact of our NAMT approach on improving word alignment . |
Introduction | names in parallel corpora, updating word segmentation, word alignment and grammar extraction (Section 3.1). |
Name-aware MT | We pair two entities from two languages, if they have the same entity type and are mapped together by word alignment . |
Name-aware MT | First, we replace tagged name pairs with their entity types, and then use Giza++ and symmetrization heuristics to regenerate word alignment . |
Name-aware MT | Since the name tags appear very frequently, the existence of such tags yields improvement in word alignment quality. |
Abstract | In contrast, alignment based methods used word alignment model to fulfill this task, which could avoid parsing errors without using parsing. |
Introduction | A word can find its corresponding modifiers by using a word alignment |
Introduction | Furthermore, this paper naturally addresses another question: is it useful for opinion targets extraction when we combine syntactic patterns and word alignment model into a unified model? |
Introduction | Then, these partial alignment links can be regarded as the constrains for a standard unsupervised word alignment model. |
Opinion Target Extraction Methodology | In the first component, we respectively use syntactic patterns and unsupervised word alignment model (WAM) to capture opinion relations. |
Opinion Target Extraction Methodology | In addition, we employ a partially supervised word alignment model (PSWAM) to incorporate syntactic information into WAM. |
Opinion Target Extraction Methodology | 3.1.2 Unsupervised Word Alignment Model |
Parallel Segment Retrieval | Finally, a represents the word alignment between the words in the left and the right segments. |
Parallel Segment Retrieval | Then, we would use an word alignment model (Brown et al., 1993; Vogel et al., 1996), with source s = sup, . |
Parallel Segment Retrieval | Finally, from the probability of the word alignments , we can determine whether the segments are parallel. |
Introduction | The algorithm uses learned word alignments to aggressively generalize the seeds, producing a large set of possible lexical equivalences. |
Learning | We call this procedure InduceLex(:c, :c’, y, A), which takes a paraphrase pair (at, :c’ ), a derivation y of at, and a word alignment A, and returns a new set of lexical entries. |
Learning | , A word alignment A between at and :c’ is a subset of x [n’ A phrase alignment is a pair of index sets (1,1’) where I g and I’ g A phrase alignment (1,1’) is consistent with a word alignment A if for all (i, z") E A, i E I if and only if z" E I ’ . |
Learning | In other words, a phrase alignment is consistent with a word alignment if the words in the phrases are aligned only with each other, and not with any outside words. |
Experiment | GIZA++ and grow-diag-final-and heuristics were used to obtain word alignments . |
Experiment | In order to reduce word alignment errors, we removed articles {a, an, the} in English and particles {ga, wo, wa} in Japanese before performing word alignments because these function words do not correspond to any words in the other languages. |
Experiment | After word alignment, we restored the removed words and shifted the word alignment positions to the original word positions. |
Proposed Method | The lines represent word alignments . |
Proposed Method | The English side arrows point to the nearest word aligned on the right. |
Proposed Method | The training data is built from a parallel corpus and word alignments between corresponding source words and target words. |
Abstract | The notion of fertility in word alignment (the number of words emitted by a single state) is useful but difficult to model. |
Evaluation | We explore the impact of this improved MAP inference procedure on a task in German-English word alignment . |
Evaluation | For training data we use the news commentary data from the WMT 2012 translation task.1 120 of the training sentences were manually annotated with word alignments . |
HMM alignment | , f J and word alignment vectors a = a1, . |
HMM alignment | For the standard HMM, there is a dynamic programming algorithm to compute the posterior probability over word alignments Pr(a|e, f These are the sufficient statistics gathered in the E step of EM. |
Introduction | These word alignments are a crucial training component in most machine translation systems. |
Introduction | 2 and 3 incorporate a positional model based on the absolute position of the word; Models 4 and 5 use a relative position model instead (an English word tends to align to a French word that is nearby the French word aligned to the previous English word). |
Comparative Study | monotone for current phrase, if a word alignment to the bottom left (point A) exists and there is no word alignment point at the bottom right position (point B) . |
Comparative Study | swap for current phrase, if a word alignment to the bottom right (point B) exists and there is no word alignment point at the bottom left position (point A) . |
Comparative Study | 0 when one source word aligned to multiple target words, duplicate the source word for each target word, e.g. |
Experiments | The main reason is that the “annotated” corpus is converted from word alignment which contains lots of error. |
Tagging-style Reordering Model | The first step is word alignment training. |
Tagging-style Reordering Model | We also have the word alignment within the new phrase pair, which is stored during the phrase extraction process. |
Conclusions and Future Work | In this paper the model was only used to infer word alignments ; in future work we intend to develop a decoding algorithm for directly translating with the model. |
Experiments | However in this paper we limit our focus to inducing word alignments , i.e., by using the model to infer alignments which are then used in a standard phrase-based translation pipeline. |
Experiments | We present results on translation quality and word alignment . |
Gibbs Sampling | Given the structure of our model, a word alignment uniquely specifies the translation decisions and the sequence follows the order of the target sentence left to right. |
Introduction | In this paper we propose a new model to drop the independence assumption, by instead modelling correlations between translation decisions, which we use to induce translation derivations from aligned sentences (akin to word alignment ). |
Model | Given a source sentence, our model infers a latent derivation which produces a target translation and meanwhile gives a word alignment between the source and the target. |
Conclusions | We presented a novel information theoretic model for bilingual word clustering which seeks a clustering with high average mutual information between clusters of adjacent words, and also high mutual information across observed word alignment links. |
Experiments | The corpus was word aligned in two directions using an unsupervised word aligner (Dyer et al., 2013), then the intersected alignment points were taken. |
Experiments | For Turkish the F1 score improves by 1.0 point over when there are no distributional clusters which clearly shows that the word alignment information improves the clustering quality. |
Experiments | Thus we propose to further refine the quality of word alignment links as follows: Let c be a word in language 2 and y be a word in language 9 and let there exists an alignment link between cc and 3/. |
Introduction | The second term ensures that the cluster alignments induced by a word alignment have high mutual information across languages (§2.2). |
Word Clustering | For concreteness, A(:c, y) will be the number of times that cc is aligned to y in a word aligned parallel corpus. |
Clustering for Cross Lingual Sentiment Analysis | Given a parallel bilingual corpus, word clusters in S can be aligned to clusters in T. Word alignments are created using parallel corpora. |
Clustering for Cross Lingual Sentiment Analysis | Here, LTIS and LSIT(...) are factors based on word alignments , which can be represented as: |
Conclusion and Future Work | A naive cluster linkage algorithm based on word alignments was used to perform CLSA. |
Introduction | To perform CLSA, this study leverages unlabelled parallel corpus to generate the word alignments . |
Introduction | These word alignments are then used to link cluster based features to obliterate the language gap for performing SA. |
Experiment | We run GIZA++ and then employ the grow-diag—final—and (gdfa) strategy to produce symmetric word alignments . |
Inside Context Integration | We demand that every element and its corresponding target span must be consistent with word alignment . |
Inside Context Integration | Note that we only apply the source-side PAS and word alignment for IC—PASTR extraction. |
Inside Context Integration | Thus to get a high recall for PASs, we only utilize word alignment instead of capturing the relation between bilingual elements. |
Maximum Entropy PAS Disambiguation (MEPD) Model | t_range(PAS) refers to the target range covering all the words that are reachable from the PAS via word alignment . |
Experiments | Following (Levenberg et al., 2012; Neubig et al., 2011), we evaluate our model by using its output word alignments to construct a phrase table. |
Experiments | As a baseline, we train a phrase-based model using the moses toolkit12 based on the word alignments obtained using GIZA++ in both directions and symmetrized using the grow-diag-final-and heuristic13 (Koehn et al., 2003). |
Experiments | 11These are taken from the final model 4 word alignments , using the intersection of the source-target and target-source models. |
Related Work | In the context of machine translation, ITG has been explored for statistical word alignment in both unsupervised (Zhang and Gildea, 2005; Cherry and Lin, 2007; Zhang et al., 2008; Pauls et al., 2010) and supervised (Haghighi et al., 2009; Cherry and Lin, 2006) settings, and for decoding (Petrov et al., 2008). |
Related Work | Our paper fits into the recent line of work for jointly inducing the phrase table and word alignment (DeNero and Klein, 2010; Neubig et al., 2011). |
Translation Model Architecture | The lexical weights lex(§|f) and lex(f|§) are calculated as follows, using a set of word alignments a between E and El |
Translation Model Architecture | 30(3, t) and 005, s) are not identical since the lexical probabilities are based on the unsymmetrized word alignment frequencies (in the Moses implementation which we re-implement). |
Translation Model Architecture | In the unweighted variant, the resulting features are equivalent to training on the concatenation of all training data, excepting differences in word alignment , pruning4 and rounding. |
Introduction | We first identify anchors as regions in the source sentences around which ambiguous reordering patterns frequently occur and chunks as regions that are consistent with word alignment which may span multiple translation units at decoding time. |
Maximal Orientation Span | We also attach the word indices as the superscript of the source words and project the indices to the target words aligned , as such “have5” suggests that the word “have” is aligned to the 5-th source word, i.e. |
Maximal Orientation Span | Note that to facilitate the projection, the rules must come with internal word alignment in practice. |
Maximal Orientation Span | the source sentence, the complete translation and the word alignment . |
Experiments and evaluation | We use the hierarchical translation system that comes with the Moses SMT-package and GIZA++ to compute the word alignment , using the “grow-diag-final-and” heuristics. |
Translation pipeline | We consider gender as part of the stem, whereas the value for number is derived from the source-side: if marked for number, singular/plural nouns are distinguished during word alignment and then translated accordingly. |
Using subcategorization information | Figure 1: Deriving features from dependency-parsed English data via the word alignment . |
Using subcategorization information | to the SMT output via word alignment . |
For each 1‘ E V[Sk] : | If a target word in t is a gap word, we suppose there is a word alignment between the target gap word and the source-side null. |
Phrase Pair Refinement and Parameterization | According to our analysis, we find that the biggest problem is that in the target-side of the phrase pair, there are two or more identical words aligned to the same source- |
Probabilistic Bilingual Lexicon Acquisition | We employ the same algorithm used in (Munteanu and Marcu, 2006) which first use the GIZA++ (with grow-diag-final-and heuristic) to obtain the word alignment between source and target words, and then calculate the association strength between the aligned words. |
The effect of the Italian connectives on the LIS translation | Word Alignment . |
The effect of the Italian connectives on the LIS translation | For this purpose we used the Berkeley Word Aligner (BWA) 1 (Denero, 2007), a general tool for aligning sentences in bilingual corpora. |
The effect of the Italian connectives on the LIS translation | Bold dashed lines show word alignment . |
Conclusions | This may suggest that adding shallow semantic information is more effective than introducing complex structured constraints, at least for the specific word alignment model we experimented with in this work. |
Introduction | This may suggest that compared to introducing complex structured constraints, incorporating shallow semantic information is both more effective and computationally inexpensive in improving the performance, at least for the specific word alignment model tested in this work. |
Learning QA Matching Models | As a result, an “ideal” word alignment structure should not link words in this clause to those in the question. |
Experiments | The system configurations are as follows: GIZA++ (Och and Ney, 2003) is used to obtain the bidirectional word alignments . |
Problem Formulation | is the final translation; [tm_s,tm_t,tm_f,s_a,tm_a] are the associated information of the best TM sentence-pair; tm_s and tm_t denote the corresponding TM sentence pair; tm_f denotes its associated fuzzy match score (from 0.0 to 1.0); 8_a is the editing operations between tm_8 and s; and tm_a denotes the word alignment between tm_s and tmi. |
Problem Formulation | ), we can find its corresponding TM source phrase tm_sa(k) and all possible TM target phrases (each of them is denoted by tmiaw» with the help of corresponding editing operations 8_a and word alignment tm_a. |
Experiment | Word alignments of each paraphrase pair are trained by GIZA++. |
Paraphrasing for Web Search | Word alignments within each paraphrase pair are generated using GIZA++ (Och and Ney, 2000). |
Paraphrasing for Web Search | In order to enable our paraphrasing model to learn the preferences on different paraphrasing strategies according to the characteristics of web queries, we design search-oriented features2 based on word alignments within Q and Q’, which can be described as follows: |
Experiments & Results 4.1 Experimental Setup | Word alignment is done using GIZA++ (Och and Ney, 2003). |
Experiments & Results 4.1 Experimental Setup | The resulting word alignments are used to extract the translations for each oov. |
Experiments & Results 4.1 Experimental Setup | The correctness of this gold standard is limited to the size of the parallel data used as well as the quality of the word alignment software toolkit, and is not 100% precise. |