Discussions | The phrase table extracting procedure is trainable and can be optimized jointly with the translation engine. |
Discussions | As the figure 1 shows, when we increase the threshold by allowing more candidate phrase pair hypothesized as valid translation, we observe the phrase table size increases monotonically. |
Discussions | Figure 1: Thresholding effects on translation performance and phrase table size |
Experimental Results | Our baseline phrase table training method is the ViterbiExtract algorithm. |
Experimental Results | We notice that Model-4 based phrase table performs roughly 1% better in terms of both BLEU and METEOR scores than that based on HMM. |
Experimental Results | Since the translation engine implements a log-linear model, the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process. |
Abstract | We evaluate our proposed method on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. |
Experiments | Two tasks are involved in the experiments: phrase table pruning that discards entries whose semantic similarity is very low and decoding with the phrasal semantic similarities as additional new features. |
Experiments | 4.3 Phrase Table Pruning |
Experiments | Pruning most of the phrase table without much impact on translation quality is very important for translation especially in environments where memory and time constraints are imposed. |
Introduction | Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to check whether a translation candidate and the source phrase are in the same meaning. |
Introduction | In phrase table pruning, we discard the phrasal translation rules with low semantic similarity. |
Introduction | The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality, and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved. |
Experiments and Results | In the contrast experiments, our DAE and HCDAE features are appended as extra features to the phrase table . |
Input Features for DNN Feature Learning | Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)), |
Introduction | Using the 4 original phrase features in the phrase table as the input features, they pre-trained the DBN by contrastive divergence (Hinton, 2002), and generated new unsupervised DBN features using forward computation. |
Introduction | These new features are appended as extra features to the phrase table for the translation decoder. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | After the pre-training, for each phrase pair in the phrase table , we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To determine an adequate number of epochs and to avoid over-fitting, we fine-tune on a fraction phrase table and test performance on the remaining validation phrase table, and then repeat fine-tuning on the entire phrase table for 100 epochs. |
Abstract | In face of the problem, we propose an efficient phrase table combination method. |
Abstract | The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. |
Abstract | Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel. |
Introduction | This paper proposes a new phrase table combination method. |
Introduction | (2011) to perform phrase table extraction. |
Introduction | Second, extracted phrase tables are combined as if they are drawn from a hierarchical Pitman-Yor process, in which the phrase tables represented as tables in the Chinese restaurant process (CRP) are hierarchically chained by treating each of the previously learned phrase tables as prior to the current one. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | It can achieve comparable translation accuracy with a much smaller phrase table than the traditional GIZA++ and heuristic phrase extraction methods. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | Compared to GIZA++ with heuristic phrase extraction, the Bayesian phrasal ITG can achieve competitive accuracy under a smaller phrase table size. |
Related Work | In the case of the previous work on translation modeling, mixed methods have been investigated for domain adaptation in SMT by adding domain information as additional labels to the original phrase table (Foster and Kuhn, 2007). |
Related Work | However, their methods usually require numbers of hyperparameters, such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update. |
Abstract | As a side effect, the phrase table size is reduced by more than 80%. |
Alignment | To be able to perform the re-computation in an efficient way, we store the source and target phrase marginal counts for each phrase in the phrase table . |
Alignment | It is then straightforward to compute the phrase counts after leaving-one-out using the phrase probabilities and marginal counts stored in the phrase table . |
Alignment | From the initial phrase table , each of these blocks only loads the phrases that are required for alignment. |
Introduction | The most common method for obtaining the phrase table is heuristic extraction from automatically word-aligned bilingual training data (Och et al., 1999). |
Phrase Model Training | 4.3 Phrase Table Interpolation |
Phrase Model Training | As (DeNero et al., 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using lo g-linear interpolation of the estimated model with the original model. |
Phrase Model Training | When interpolating phrase tables containing different sets of phrase pairs, we retain the intersection of the two. |
Related Work | Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999). |
Related Work | In addition, (DeNero et al., 2006) found that the trained phrase table shows a highly peaked distribution in opposition to the more flat distribution resulting from heuristic extraction, leaving the decoder only few translation options at decoding time. |
Related Work | They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality. |
Adaptive Online MT | Finally, large data structures such as the language model (LM) and phrase table exist in shared memory, obviating the need for remote queries. |
Analysis | In Table 6, A is the set of phrase table features that received a nonzero weight when tuned on dataset DA (same for B). |
Analysis | Phrase table features in A m B are overwhelmingly short, simple, and correct phrases, suggesting L1 regularization is effective for feature selection. |
Analysis | To understand the domain adaptation issue we compared the nonzero weights in the discriminative phrase table (PT) for Ar—En models tuned on bitext5k and MT05/6/ 8. |
Experiments | tive phrase table (PT): indicators for each rule in the phrase table . |
Experiments | Moses5 also contains the discriminative phrase table implementation of (Hasler et al., 2012b), which is identical to our implementation using Phrasal. |
Experiments | Moses and Phrasal accept the same phrase table and LM formats, so we kept those data structures in common. |
Related Work | A discriminative phrase table helped them improve slightly over a dense, online MIRA baseline, but their best results required initialization with MERT-tuned weights and retuning a single, shared weight for the discriminative phrase table with MERT. |
A Probabilistic Model for Phrase Table Extraction | If 6 takes the form of a scored phrase table , we can use traditional methods for phrase-based SMT to find P(e|f, 6) and concentrate on creating a model for P(6| (5 , .7: We decompose this posterior probability using Bayes law into the corpus likelihood and parameter prior probabilities |
Abstract | This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. |
Abstract | Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size. |
Flat ITG Model | The traditional flat ITG generative probability for a particular phrase (or sentence) pair Pflat((e, f ); 635, 67;) is parameterized by a phrase table 6,; and a symbol distribution 635. |
Flat ITG Model | (a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;). |
Flat ITG Model | We assign 635 a Dirichlet priorl, and assign the phrase table parameters 67; a prior using the Pitman-Yor process (Pitman and Yor, 1997; Teh, 2006), which is a generalization of the Dirichlet process prior used in previous research. |
Introduction | This phrase table is traditionally generated by going through a pipeline of two steps, first generating word (or minimal phrase) alignments, then extracting a phrase table that is consistent with these alignments. |
Introduction | phrase tables that are used in translation. |
Introduction | This makes it possible to directly use probabilities of the phrase model as a replacement for the phrase table generated by heuristic extraction techniques. |
Cross-lingual Features | We experimented with three different cross-lingual features that used Arabic and English Wikipedia cross-language links and a true-cased phrase table that was generated using Moses (Koehn et al., 2007). |
Cross-lingual Features | The phrase table was trained on a set of 3.69 million parallel sentences containing 123.4 million English tokens. |
Cross-lingual Features | To capture cross-lingual capitalization, we used the aforementioned true-cased phrase table at word and |
Introduction | Cross-lingual links are obtained using Wikipedia cross-language links and a large Machine Translation (MT) phrase table that is true cased, where word casing is preserved during training. |
Related Work | Transliteration Mining (TM) has been used to enrich MT phrase tables or to improve cross language search (Udupa et al., 2009). |
Abstract | Another line of research that is closely related to our work is phrase table refinement and pruning. |
Abstract | The parallel sentences were forced to be aligned at the phrase level using the phrase table and other features as in a decoding process. |
Abstract | To prevent overf1tting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence. |
Beyond lexical CLTE | In order to enrich the feature space beyond pure lexical match through phrase table entries, our model |
Beyond lexical CLTE | builds on two additional feature sets, derived from i) semantic phrase tables , and ii) dependency relations. |
Beyond lexical CLTE | Semantic Phrase Table (SPT) matching represents a novel way to leverage the integration of semantics and MT—derived techniques. |
CLTE-based content synchronization | CLTE has been previously modeled as a phrase matching problem that exploits dictionaries and phrase tables extracted from bilingual parallel corpora to determine the number of word sequences in H that can be mapped to word sequences in T. In this way a semantic judgement about entailment is made exclusively on the basis of lexical evidence. |
Experiments and results | To build the English-German phrase tables we combined the Europarl, News Commentary and “de-news”3 parallel corpora. |
Introduction | The CLTE methods proposed so far adopt either a “pivoting approach” based on the translation of the two input texts into the same language (Mehdad et al., 2010), or an “integrated solution” that exploits bilingual phrase tables to capture lexical relations and contextual information (Mehdad et al., 2011). |
Evaluation | The baseline is a state-of-the-art phrase-based system; we perform word alignment using a lexicalized hidden Markov model, and then the phrase table is extracted using the grow—diag—final heuristic (Koehn et al., 2003). |
Evaluation | The 13 baseline features (2 lexical, 2 phrasal, 5 HRM, and 1 language model, word penalty, phrase length feature and distortion penalty feature) were tuned using MERT (Och, 2003), which is also used to tune the 4 feature weights introduced by the secondary phrase table (2 lexical and 2 phrasal, other features being shared between the two tables). |
Generation & Propagation | Our goal is to obtain translation distributions for source phrases that are not present in the phrase table extracted from the parallel corpus. |
Generation & Propagation | If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data) is used as the label, and is sub- |
Generation & Propagation | Prior to generation, one phrase node for each target phrase occurring in the baseline phrase table is added to the target graph (black nodes in Fig. |
Introduction | The additional phrases are incorporated in the SMT system through a secondary phrase table (§2.5). |
Related Work | (2012) propose a method that utilizes a preexisting phrase table and a small bilingual lexicon, and performs BLI using monolingual corpora. |
Related Work | Decipherment-based approaches (Ravi and Knight, 2011; Dou and Knight, 2012) have generally taken a monolingual view to the problem and combine phrase tables through the log-linear model during feature weight training. |
Experiments | We validated our simple implementation using a phrase table of 38,488,777 lines created with the Moses toolkit3(Koehn et al., 2007) phrase-based SMT system, corresponding to 15,764,069 entries |
Experiments | Figure 4 displays the time required to complete retrieval for subsets of increasing size of the 2,000 sentence test set, and for phrase tables uniformly sampled at 25%, 50%, 75% and 100%. |
Experiments | 217,019 distinct digests are generated for all possible phrase of length up to 6 from the full test set, resulting in the retrieval of 47,072 entries (596,560 lines) from the full phrase table . |
Implementation | This mirrors the standard practice of filtering the phrase table for a given source file to translate before starting the actual decoding. |
Introduction | In this method, the owner of the TM generates a Phrase Table (PT) from it, and makes it accessible to the user following a special procedure. |
Introduction | 0 The user acquires all and only the phrase table entries required to perform the decoding of a specific file, thus avoiding complete transfer of the TM to the user; |
Introduction | While the exposition will focus on phrase tables , there is nothing in the method precluding its use with other resources, provided that they can be represented as lookup tables, a very mild constraint. |
Related work | private access to a phrase table or other resources for the purpose of performing statistical machine translation. |
Conclusion | Using phrase table merging that combined AR and EG’ training data in a way that preferred adapted dialectal data yielded an extra 0.86 BLEU points. |
Proposed Methods 3.1 Egyptian to EG’ Conversion | - Only added the phrase with its translations and their probabilities from the AR phrase table . |
Proposed Methods 3.1 Egyptian to EG’ Conversion | - Only added the phrase with its translations and their probabilities from the EG’ phrase table . |
Proposed Methods 3.1 Egyptian to EG’ Conversion | - Added translations of the phrase from both phrase tables and left the choice to the decoder. |
Experimental Results | As clear in Table 7, it is important to rerun MERT (on MT02 only) with the augmented phrase table in order to get performance gains. |
Experimental Results | the MERT weights with different phrase tables . |
Unsupervised Translation Induction for Chinese Abbreviations | the baseline phrase table . |
Unsupervised Translation Induction for Chinese Abbreviations | Since the obtained translation entries for abbreviations have the same format as the regular translation entries in the baseline phrase table, it is relatively easy to add them into the baseline phrase table . |
Unsupervised Translation Induction for Chinese Abbreviations | Specifically, if a translation entry (signatured by its Chinese and English strings) to be added is not in the baseline phrase table , we simply add the entry into the baseline table. |
Baselines | where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table , and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003). |
Baselines | Whenever a phrase pair does not appear in a component phrase table , we set the corresponding pm(é| f) to a small epsilon value. |
Baselines | Whenever a phrase pair does not appear in a component phrase table , we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded. |
Ensemble Decoding | The cells of the CKY chart are populated with appropriate rules from all the phrase tables of different components. |
Experiments & Results 4.1 Experimental Setup | The corpus was word-aligned using both HMM and IBM2 models, and the phrase table was the union of phrases extracted from these separate alignments, with a length limit of 7. |
Related Work 5.1 Domain Adaptation | In intersection, for each span only the hypotheses would be used that are present in all phrase tables . |
Related Work 5.1 Domain Adaptation | Union, on the other hand, uses hypotheses from all the phrase tables . |
Abstract | We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. |
Conclusion | Then the collocation information was employed to improve BWA for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Conclusion | To improve phrase table , we calculate phrase collocation probabilities based on word collocation probabilities. |
Experiments on Phrase-Based SMT | We also investigate the performance of the system employing both the word alignment improvement and phrase table improvement methods. |
Introduction | Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Introduction | To improve phrase table , we calculate phrase collocation probabilities based on word collocation probabilities. |
Introduction | In section 3 and 4, we show how to improve the BWA method and the phrase table using collocation models respectively. |
Abstract | The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. |
Conclusion and Outlook | Our class-based agreement model improves translation quality by promoting local agreement, but with a minimal increase in decoding time and no additional storage requirements for the phrase table . |
Discussion of Translation Results | Phrase Table Coverage In a standard phrase-based system, effective translation into a highly inflected target language requires that the phrase table contain the inflected word forms necessary to construct an output with correct agreement. |
Discussion of Translation Results | During development, we observed that the phrase table of our large-scale English-Arabic system did often contain the inflected forms that we desired the system to select. |
Introduction | Unlike previous models for scoring syntactic relations, our model does not require bitext annotations, phrase table features, or decoder modifications. |
Abstract | We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. |
Experiments | For better comparison with NAMT, besides the original baseline, we develop the other baseline system by adding name translation table into the phrase table (NPhrase). |
Name-aware MT | Finally, the extracted 9,963 unique name translation pairs were also used to create an additional name phrase table for NAMT. |
Name-aware MT | Finally, based on LMs, our decoder exploits the dynamically created phrase table from name translation, competing with originally extracted rules, to find the best translation for the input sentence. |
Related Work | Preprocessing: identify names in the source texts and propose name translations to the MT system; the name translation results can be simply but aggressively transferred from the source to the target side using word alignment, or added into phrase table in order to |
Experiments | Following (Levenberg et al., 2012; Neubig et al., 2011), we evaluate our model by using its output word alignments to construct a phrase table . |
Experiments | For our models, we report the average BLEU score of the 5 independent runs as well as that of the aggregate phrase table generated by these 5 independent runs. |
Experiments | Firstly, combining the phrase tables from independent runs results in increased BLEU scores, possibly due to the representation of uncertainty in the outputs, and the representation of different modes captured by the individual models. |
Related Work | Our paper fits into the recent line of work for jointly inducing the phrase table and word alignment (DeNero and Klein, 2010; Neubig et al., 2011). |
Experiments | For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs, and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword. |
Experiments | For the in-domain electronic data, we first consider the lexicon as a phrase table in which we assign a constant 1.0 for each of the four probabilities, and then we combine this initial phrase table and the induced phrase pairs to form the new phrase table . |
Experiments | (2008) regards the in-domain lexicon with corpus translation probability as another phrase table and further use the in-domain language model besides the out-of-domain language model. |
Paraphrasing | We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013). |
Paraphrasing | We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams. |
Paraphrasing | This results in a phrase table with approximately 1.3 million phrase pairs. |
Experiments | For each system, we used three different alignment heuristics (grow, grow-diag, grow-diag-final4) to obtain the final alignment results, and then constructed three different phrase tables . |
Experiments | In order to further analyze the translation results, we evaluated the above systems by examining the coverage of the phrase tables over the test phrases. |
Introduction | The first is based on phrase table multiplication (Cohn and Lapata 2007; Wu and Wang, 2007). |
Introduction | It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table . |
Experimental Evaluation | We use identical phrase tables and scaling factors for Moses and our decoder. |
Experimental Evaluation | The phrase table is pruned to a maximum of 400 target candidates per source phrase before decoding. |
Experimental Evaluation | The phrase table and LM are loaded into memory before translating and loading time is eliminated for speed measurements. |
Data and Gold Standard | We first used the phrase table from |
Data and Gold Standard | We then looked at the different translations that each had in the phrase table and a French speaker selected a subset that have multiple senses.3 |
New Sense Indicators | The ground truth labels (target translation for a given source word) for this classifier are generated from the phrase table of the old domain data. |
Models 2.1 Baseline Models | Table 1: Morpheme occurences in the phrase table and in translation. |
Related Work | They use a segmented phrase table and language model along With the word-based versions in the decoder and in tuning a Finnish target. |
Related Work | Habash (2007) provides various methods to incorporate morphological variants of words in the phrase table in order to help recognize out of vocabulary words in the source language. |
Introduction | Furthermore, the introduction of non-terminals makes the grammar size significantly bigger than phrase tables and leads to higher memory requirement (Chiang, 2007). |
Introduction | On the other hand, although target words can be generated left-to-right by altering the way of tree transversal in syntax-based models, it is still difficult to reach full rule coverage as compared with phrase table . |
Introduction | phrase table limit is set to 20 for all the three systems. |
Experimental Setup and Results | Phrase table entries for the surface factors produced by Moses after it does an alignment on the roots, contain the English (e) and Turkish (t) parts of a pair of aligned phrases, and the probabilities, p(e|t), the conditional probability that the English phrase is 6 given that the Turkish phrase is t, and p(t|e), the conditional probability that the Turkish phrase is t given the English phrase is 6. |
Experimental Setup and Results | Among these phrase table entries, those with p(e|t) m p(t|e) and p(t|e) + p(e|t) larger than some threshold, can be considered as reliable mutual translations, in that they mostly translate to each other and not much to others. |
Experimental Setup and Results | from the phrase table those phrases with 0.9 S p(elt)/p(tle) S 1.1 and Mile) + Melt) 2 1.5 and added them to the training data to further bias the alignment process. |
Error Analysis | 12After having the MERT parameters, we add the 600 dev sentences back into the training corpus, retrain GIZA, and then estimate a new phrase table on all 5600 sentences. |
Previous Work | do not compete with internal phrase tables . |
Previous Work | (2008) use a tagger to identify good candidates for transliteration (which are mostly NEs) in input text and add transliterations to the SMT phrase table dynamically such that they can directly compete with translations during decoding. |
Methods | This is sometimes referred to as a word graph (Ueffing et al., 2002), although in our case the segmented phrase table also produces tokens that correspond to morphemes. |
Methods | (2010) address this problem by forcing the decoder’s phrase table to respect word boundaries, guaranteeing that each de-segmentable token sequence is local to an edge. |
Related Work | Alternatively, one can reparame-terize existing phrase tables as exponential models, so that translation probabilities account for source context and morphological features (Jeong et al., 2010; Subotin, 2011). |
AL-SMT: Multilingual Setting | When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively). |
Sentence Selection: Single Language Pair | Some of these fragments are the source language part of a phrase pair available in the phrase table , which we call regular phrases and denote their set by X £69 for a sentence 3. |
Sentence Selection: Single Language Pair | However, there are some fragments in the sentence which are not covered by the phrase table —possibly because of the OOVs (out-of-vocabulary words) or the constraints imposed by the phrase extraction algorithm — called X 800” for a sentence 5. |