Abstract | We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs . |
Abstract | The structured space of a synchronous grammar is a natural fit for phrase pair probability estimation, though the search space can be prohibitively large. |
Introduction | Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair . |
Phrasal Inversion Transduction Grammar | Our ITG has two nonterminals: X and C, where X represents compositional phrase pairs that can have recursive structures and C is the preterminal over terminal phrase pairs . |
Phrasal Inversion Transduction Grammar | They split the left-hand side constituent which represents a phrase pair into two smaller phrase pairs on the right-hand side and order them according to one of the two possible permutations. |
Phrasal Inversion Transduction Grammar | where Ze/f P (e / f) = l is a multinomial distribution over phrase pairs . |
Variational Bayes for ITG | A sparse prior over a multinomial distribution such as the distribution of phrase pairs may bias the estimator toward skewed distributions that generalize better. |
Variational Bayes for ITG | The other is the distribution of the phrase pairs . |
Variational Bayes for ITG | By adjusting ac to a very small number, we hope to place more posterior mass on parsimonious solutions with fewer but more confident and general phrase pairs . |
Bilingually-constrained Recursive Auto-encoders | We can make inference from this fact that if a model can learn the same embedding for any phrase pair sharing the same meaning, the learned embedding must encode the semantics of the phrases and the corresponding model is our desire. |
Bilingually-constrained Recursive Auto-encoders | For a phrase pair (3, If), two kinds of errors are involved: |
Bilingually-constrained Recursive Auto-encoders | For the phrase pair (3, t), the joint error is: |
Experiments | To obtain high-quality bilingual phrase pairs to train our BRAE model, we perform forced decoding for the bilingual training sentences and collect the phrase pairs used. |
Experiments | After removing the duplicates, the remaining 1.12M bilingual phrase pairs (length ranging from 1 to 7) are obtained. |
Experiments | These algorithms are based on corpus statistics including co-occurrence statistics, phrase pair usage and composition information. |
Introduction | In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can- |
Input Features for DNN Feature Learning | Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)), |
Input Features for DNN Feature Learning | 3.2 Phrase pair similarity |
Input Features for DNN Feature Learning | (2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality. |
Introduction | First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair , such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation. |
Introduction | To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each layer is greedily pre-trained for 50 epochs through the entire phrase pairs . |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | After the pre-training, for each phrase pair in the phrase table, we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation. |
Abstract | A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. |
Introduction | (2013) use recursive auto encoders to make full use of the entire merging phrase pairs , going beyond the boundary words with a maximum entropy classifier (Xiong et al., 2006). |
Introduction | So as to model the translation confidence for a translation phrase pair, we initialize the phrase pair embedding by leveraging the sparse features and recurrent neural network. |
Introduction | The sparse features are phrase pairs in translation table, and recurrent neural network is utilized to learn a smoothed translation score with the source and target side information. |
Our Model | We then check whether translation candidates can be found in the translation table for each span, together with the phrase pair embedding and recurrent input vector (global features). |
Our Model | We extract phrase pairs using the conventional method (Och and Ney, 2004). |
Our Model | 0 Representations of phrase pairs are automatically learnt to optimize the translation performance, while features used in conventional model are handcrafted. |
Related Work | Given the representations of the smaller phrase pairs, recursive auto-encoder can generate the representation of the parent phrase pair with a reordering confidence score. |
Event Causality Extraction Method | An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score. |
Experiments | These three datasets have no overlap in terms of phrase pairs . |
Experiments | We observed that CEAsup and CEAWS performed poorly and tended to favor event causality candidates whose phrase pairs were highly relevant to each other but described the contrasts of events rather than event causality (e. g. build a slow muscle and build a fast muscle) probably because their |
Experiments | phrase pairs described two events that often happen in parallel but are not event causality (e. g. reduce the intake of energy and increase the energy consumption) in the highly ranked event causality candidates of Csuns and Cssup. |
Future Scenario Generation Method | A naive approach chains two phrase pairs by exact matching. |
Future Scenario Generation Method | Scenarios (scs) generated by chaining causally-compatible phrase pairs are scored by Score(sc), which embodies our assumption that an acceptable scenario consists of plausible event causality pairs: |
Introduction | Annotators regarded as event causality only phrase pairs that were interpretable as event causality without contexts (i.e., self-contained). |
Introduction | Phrase pair extraction, the key step to discover translation knowledge, heavily relies on the scale of training data. |
Introduction | Typically, the more parallel corpora used, the more phrase pairs and more accurate parameters will be learned, which can obviously be beneficial to improving translation performances. |
Introduction | First, phrase pairs are extracted from each domain without interfering with other domains. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | More formally, P((e, f >; 635, 67;) are the probability of phrase pairs (6, f >, which is parameterized by a phrase pair distribution 67; and a symbol distribution 635. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | where d is the discount parameter, s is the strength parameter, and , and Pdac is a prior probability which acts as a fallback probability when a phrase pair is not in the model. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | Under this model, the probability for a phrase pair found in a bilingual corpus (E, F) can be represented by the following equation using the Chinese restaurant process (Teh, 2006): |
Related Work | Then al-1 the phrase pairs and features are tuned together with different weights during decoding. |
Related Work | Classification-based methods must at least add an explicit label to indicate which domain the current phrase pair comes from. |
Related Work | (2012) employed a feature-based approach, in which phrase pairs are enriched |
Introduction | The main idea of our method is to divide the phrase-level translation rule induction into two steps: bilingual lexicon induction and phrase pair induction. |
Introduction | Since many researchers have studied the bilingual lexicon induction, in this paper, we mainly concentrate ourselves on phrase pair induction given a probabilistic bilingual lexicon and two in-domain large monolingual data (source and target language). |
Introduction | In addition, we will further introduce how to refine the induced phrase pairs and estimate the parameters of the induced phrase pairs , such as four standard translation features and phrase reordering feature used in the conventional phrase-based models (Koehn et al., 2007). |
What Can We Learn with Phrase Pair Induction? | Readers may doubt that if phrase pair induction is performed only using bilingual lexicon and monolingual data, what new translation knowledge can be learned? |
What Can We Learn with Phrase Pair Induction? | In contrast, phrase pair induction can make up for this deficiency to some extent. |
What Can We Learn with Phrase Pair Induction? | From the induced phrase pairs with our method, we have conducted a deep analysis and find that we can learn three kinds of new translation knowledge: 1) word reordering in a phrase pair ; 2) idioms; and 3) unknown word translations. |
Introduction | However, these phrase pairs with gaps do not capture structure reordering as do Hiero rules with nonterminal mappings. |
Introduction | From this Hiero derivation, we have a segmentation of the sentence pairs into phrase pairs according to the word alignments, as shown on the left side of Figure 1. |
Introduction | Ordering these phrase pairs according the word sequence on the target side, shown on the right side of Figure l, we have a phrase-based translation path consisting of four phrase pairs : (je, , (ne . |
Phrasal-Hiero Model | Training: Represent each rule as a sequence of phrase pairs and nonterminals. |
Phrasal-Hiero Model | Decoding: Use the rules’ sequences of phrase pairs and nonterminals to find the corresponding phrase-based path of a Hiero derivation and calculate its feature scores. |
Phrasal-Hiero Model | 2.1 Map Rule to A Sequence of Phrase Pairs and Nonterminals |
Corpus Data and Baseline SMT | We used this corpus to extract translation phrase pairs from bidirectional IBM Model 4 word alignment (Och and Ney, 2003) based on the heuristic approach of (Koehn et al., 2003). |
Corpus Data and Baseline SMT | Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric. |
Corpus Data and Baseline SMT | All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances. |
Incremental Topic-Based Adaptation | Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality. |
Incremental Topic-Based Adaptation | Additionally, the SMT phrase table tracks, for each phrase pair, the set of parent training conversations (including the “background conversation”) from which that phrase pair originated. |
Incremental Topic-Based Adaptation | Using this information, the decoder evaluates, for each candidate phrase pair |
Introduction | As the conversation progresses, however, the gradual accumulation of contextual information can be used to infer the topic(s) of discussion, and to deploy contextually appropriate translation phrase pairs . |
Introduction | Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model. |
Relation to Prior Work | Rather, we compute a similarity between the current conversation history and each of the training conversations, and use this measure to dynamically score the relevance of candidate translation phrase pairs during decoding. |
Abstract | This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. |
Abstract | Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. |
Abstract | Thus, we obtain a decoding feature whose value represents the phrase pair’s closeness to the dev. |
Introduction | typically use a rich feature set to decide on weights for the training data, at the sentence or phrase pair level. |
Introduction | As in (Foster et al., 2010), this approach works at the level of phrase pairs . |
Introduction | Instead of using word-based features and a computationally expensive training procedure, we capture the distributional properties of each phrase pair directly, representing it as a vector in a space which also contains a representation of the dev set. |
A Generic Phrase Training Procedure | We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair . |
Abstract | Multiple data-driven feature functions are proposed to capture the quality and confidence of phrases and phrase pairs . |
Introduction | The first problem is referred to as phrase pair extraction, which identifies phrase pairs that are supposed to be translations of each other. |
Introduction | The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003). |
Introduction | Other methods do not depend on word alignments only, such as directly modeling phrase alignment in a joint generative way (Marcu and Wong, 2002), pursuing information extraction perspective (Venugopal et al., 2003), or augmenting with model-based phrase pair posterior (Deng and Byrne, 2005). |
Abstract | On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs , which improves both F-score and Bleu score over the baseline alignment system of GIZA++. |
Introduction | On top of the discriminative pruning method, we also propose a discriminative ITG alignment system using hierarchical phrase pairs . |
The DITG Models | Another is DITG with hierarchical phrase pairs (henceforth HP-DITG), which relaxes the l-to-l constraint by adopting hierarchical phrase pairs in Chiang (2007). |
The DITG Models | 6.2 DITG with Hierarchical Phrase Pairs |
The DITG Models | Wu (1997) proposes a bilingual segmentation grammar extending the terminal rules by including phrase pairs . |
The DPDI Framework | f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g. |
The DPDI Framework | 3 An inconsistent link connects a word Within the phrase pair to some word outside the phrase pair . |
Clustering phrase pairs directly using the K-means algorithm | Using multiple word clusterings simultaneously, each based on a different number of classes, could turn this global, hard tradeoff into a local, soft one, informed by the number of phrase pair instances available for a given granularity. |
Clustering phrase pairs directly using the K-means algorithm | We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors, i.e., points of a vector space. |
Clustering phrase pairs directly using the K-means algorithm | then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label. |
Hard rule labeling from word classes | (2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences. |
Hard rule labeling from word classes | We convert each extracted phrase pair , represented by its source span (2', j) and target span (19,6), into an initial rule |
Hard rule labeling from word classes | Then (depending on the extracted phrase pairs ), the resulting initial rules could be: |
Introduction | Zollmann and Venugopal (2006) directly extend the rule extraction procedure from Chiang (2005) to heuristically label any phrase pair based on target language parse trees. |
PSCFG-based translation | Chiang (2005) learns a single-nonterminal PSCFG from a bilingual corpus by first identifying initial phrase pairs using the technique from Koehn et al. |
PSCFG-based translation | (2003), and then performing a generalization operation to generate phrase pairs with gaps, which can be viewed as PSCFG rules with generic ‘X’ nonterminal left-hand-sides and substitution sites. |
Flat ITG Model | (a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;). |
Flat ITG Model | (b) If cc 2 REG, a regular ITG rule, generate phrase pairs (el, f1) and <62, f2) from Pflat, and concatenate them into a single phrase pair (6162, f1f2>. |
Flat ITG Model | While the previous formulation can be used as-is in maximum likelihood training, this leads to a degenerate solution where every sentence is memorized as a single phrase pair . |
Introduction | The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs . |
Introduction | The model is similar to previously proposed phrase alignment models based on inversion transduction grammars (ITGs) (Cherry and Lin, 2007; Zhang et al., 2008; Blunsom et al., 2009), with one important change: ITG symbols and phrase pairs are generated in the opposite order. |
Introduction | In traditional ITG models, the branches of a biparse tree are generated from a nonterminal distribution, and each leaf is generated by a word or phrase pair distribution. |
Alignment | We refer to singleton phrases as phrase pairs that occur only in one sentence. |
Alignment | For these sentences, the decoder needs the singleton phrase pairs to produce an alignment. |
Alignment | Standard leaving-one-out assigns a fixed probability 04 to singleton phrase pairs . |
Introduction | The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system. |
Phrase Model Training | The translation probability of a phrase pair (1216) is estimated as |
Phrase Model Training | where CFA(f, 6) is the count of the phrase pair (f, 6) in the phrase-aligned training data. |
Phrase Model Training | We will refer to this model as the count model as we simply count the number of occurrences of a phrase pair . |
Experiments | About 2.6M hierarchical phrase pairs extracted from the training corpus were used on the test set. |
Experiments | This improvement resulted from the mixture of hierarchical phrase pairs and tree-to-string rules. |
Experiments | To produce the result, the joint decoder made use of 8,114 hierarchical phrase pairs learned from training data, 6,800 glue rules connecting partial translations monotonically, and 16,554 tree-to-string rules. |
Joint Decoding | Decoders that use rules with flat structures (e.g., phrase pairs ) usually generate target sentences from left to right while those using rules with hierarchical structures (e.g., SCFG rules) often run in a bottom-up style. |
Joint Decoding | (2006) develop a bottom-up decoder for BTG (Wu, 1997) that uses only phrase pairs . |
Joint Decoding | Hierarchical phrase pairs are used for translating smaller units and tree-to-string rules for bigger ones. |
Generation & Propagation | In order to utilize these newly acquired phrase pairs , we need to compute their relevant features. |
Generation & Propagation | The phrase pairs have four log-probability features with two likelihood features and two lexical weighting features. |
Generation & Propagation | In addition, we use a sophisticated lexicalized hierarchical reordering model (HRM) (Galley and Manning, 2008) with five features for each phrase pair . |
Related Work | The operational scope of their approach is limited in that they assume a scenario where unknown phrase pairs are provided (thereby sidestepping the issue of translation candidate generation for completely unknown phrases), and what remains is the estimation of phrasal probabilities. |
Related Work | In our case, we obtain the phrase pairs from the graph structure (and therefore indirectly from the monolingual data) and a separate generation step, which plays an important role in good performance of the method. |
Comparative Study | We count how often each extracted phrase pair is found with each of the three reordering types. |
Comparative Study | The bilingual sequence of phrase pairs will be extracted using the same strategy in Figure 4. |
Comparative Study | Suppose the search state is now extended with a new phrase pair (1216). |
Tagging-style Reordering Model | During the search, a sentence pair ( 1‘], (if) will be formally splitted into a segmentation Sff which consists of K phrase pairs . |
Tagging-style Reordering Model | Suppose the search~ state is ~now extended with a new phrase pair (fk,ék): fk, :2 fbk . |
Tagging-style Reordering Model | We also have the word alignment within the new phrase pair , which is stored during the phrase extraction process. |
Introduction | Moreover, syntax-based approaches often suffer from the rule coverage problem since syntactic constraints rule out a large portion of non-syntactic phrase pairs , which might help decoders generalize well to unseen data (Marcu et al., 2006). |
Introduction | The basic unit of translation in our model is string-to-dependency phrase pair , which consists of a phrase on the source side and a dependency structure on the target side. |
Introduction | The algorithm generates well-formed dependency structures for partial translations left-to-right using string-to-dependency phrase pairs . |
Conclusion | The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments. |
Experimental Results | In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs . |
Experimental Results | Table 3: Phrase pair extraction accuracy for phrase pairs up to length 5. |
Experimental Results | Possible links are both included and excluded from phrase pairs during extraction, as in DeNero and Klein (2010). |
Translation Model Architecture | For phrase pairs which are not found, C(E, f) and C(f) are initially set to 0. |
Translation Model Architecture | Note that C(E) is potentially incorrect at this point, since a phrase pair not being found does not entail that C(E) is 0. |
Translation Model Architecture | 4We prune the tables to the most frequent 50 phrase pairs per source phrase before combining them, since calculating the features for all phrase pairs of very common source phrases causes a significant slowdown. |
Introduction | We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus, and a vector space model, which represents each utterance as a vector and learns a similarity score between them. |
Paraphrasing | We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013). |
Paraphrasing | We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams. |
Paraphrasing | This results in a phrase table with approximately 1.3 million phrase pairs . |
Experimental Results | One is an re-imp1ementation of Hiero (Chiang, 2005), and the other is a hybrid syntax-based tree-to-string system (Zhao and Al-onaizan, 2008), where normal phrase pairs and Hiero rules are used as a backoff for tree-to-string rules. |
Experimental Results | Next we extract phrase pairs , Hiero rules and tree-to-string rules from the original word alignment and the improved word alignment, and tune all the feature weights on the tuning set. |
Integrating Empty Categories in Machine Translation | For example, for a hierarchical MT system, some phrase pairs and Hiero (Chiang, 2005) rules can be extracted with recovered *pro* and *PRO* at the Chinese side. |
Integrating Empty Categories in Machine Translation | For each phrase pair , Hiero rule or tree-to-string rule in the MT system, a binary feature fk, fires if there exists a *pro* on the source side and it aligns to one of its most frequently aligned target words found in the training corpus. |
Integrating Empty Categories in Machine Translation | The motivation for such sparse features is to reward those phrase pairs and rules that have highly confident lexical pairs specifically related to ECs, and penalize those who don’t have such lexical pairs. |
Experiments | We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations. |
Related Work | (2010) introduce topic model for filtering topic-mismatched phrase pairs . |
Related Work | Similarly, each phrase pair is also assigned with one specific topic. |
Related Work | A phrase pair will be discarded if its topic mismatches the document topic. |
Topic Models for Machine Translation | Lexical Weighting In phrase-based SMT, lexical weighting features estimate the phrase pair quality by combining lexical translation probabilities of words in a phrase (Koehn et al., 2003). |
Topic Models for Machine Translation | The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003). |
Topic Models for Machine Translation | from which we can compute the phrase pair probabilities pw (6| f; k) by multiplying the lexical probabilities and normalizing as in Koehn et a1. |
Experiments | In this way, the intersected English phrases in the CE corpus and ES corpus becomes more, which enables the Chinese-Spanish translation model induced using the triangulation method to cover more phrase pairs . |
Pivot Methods for Phrase-based SMT | (2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s |
Pivot Methods for Phrase-based SMT | Let a1 and a2 represent the word alignment information inside the phrase pairs (5,13) and (13, 2?) |
Using RBMT Systems for Pivot Translation | It can also change the distribution of some phrase pairs and reinforce some phrase pairs relative to the test set. |
Bag-of-Words Vector Space Model | In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005). |
Bag-of-Words Vector Space Model | Consider a rule with non-terminals on the source and target side; for a given instance of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals. |
Bag-of-Words Vector Space Model | In turn, the context for the sub-phrases that instantiate the non-terminals will be the words in the remainder of the phrase pair . |
Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Head-Driven HPB Translation Model | We extract HD-HRs and NRRs based on initial phrase pairs , respectively. |
Head-Driven HPB Translation Model | We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads. |
Methods | 0 f3 (7“, C) = lHWithin(7“, c)|/|Hwithin(c)|: This is a variation of f2 that considers only noun phrase pairs shared at least once by relations in c. |
Methods | o f4(7“, C) = lHWithin(7“, c)|/|Hshared(c)|: ThlS lS a variation of f 2 that considers only noun phrase pairs shared at least once by any pair of relations. |
Methods | where Hwithino“, c) = UMEC H(7“, 0) fl H(7“,7“*), the intersection, considering translation, of H (7“) and noun phrase pairs shared at once by relations in c, Hwithin(c) = UMEC H(7“*,c — {r*}), and Hshared(c) = Ur*€REURC [{(7~*7 c), the noun phrase pairs shared at once by any relations. |
Related Work | UOM Views reordering as a process of generating (19, 0) in a left-to-right fashion, where b is the current phrase pair and 0 is the orientation of b with the preViously generated phrase pair 19’. |
Related Work | Tillmann and Zhang (2007) proposed a Bigram Orientation Model (BOM) to include both phrase pairs (1) and (9’) into the model. |
Two-Neighbor Orientation Model | 2We represent a chunk as a source and target phrase pair (fjf/BZ) where the subscript and the superscript indicate the |
Abstract | To prevent overf1tting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence. |
Abstract | A set of phrase pairs are extracted from word-aligned parallel corpus according to phrase extraction rules (Koehn et al., 2003). |
Abstract | We use the word translation table from IBM Model 1 (Brown et al., 1993) and compute the sum over all possible word alignments within a phrase pair without normalizing for length (Quirk et al., 2005). |
Experiments | alignment phrase pair |
Experiments | Table 2: Alignment accuracy and phrase pair extraction accuracy for directional and bidirectional models. |
Experiments | AER is alignment error rate and F l is the phrase pair extraction F1 score. |
Abstract | Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering. |
Abstract | This mechanism implicitly supports not only traditional phrase pairs , but also gapping phrases which are nonconsecutive in the source. |
Experiments | In contrast, our model better aligns the function words, such that many more useful phrase pairs can be extracted, i.e., <7£, ’m>, <ilZ, looking for>, <l€§l% fig, grill-type> and their combinations with neighbouring phrase pairs . |
Alignment Methods | Phrasal ITGs are ITGs that allow for non-terminals that can emit phrase pairs with multiple elements on both the source and target sides. |
LookAhead Biparsing | This probability is the combination of the generative probability of each phrase pair Pt(e:, f3) as well as the sum the probabilities over all shorter spans in straight and inverted order2 |
Substring Prior Probabilities | While the Bayesian phrasal ITG framework uses the previously mentioned phrase distribution Pt during search, it also allows for definition of a phrase pair prior probability Ppmofieg, f3), which can efficiently seed the search process with a bias towards phrase pairs that satisfy certain properties. |
Experiments | We extracted phrase pairs from the training data to investigate how many phrase pairs can be captured by lexicalized tree-to-tree rules that contain only terminals. |
Experiments | We set the maximal length of phrase pairs to 10. |
Related Work | While this method works for l-best tree pairs, it cannot be applied to packed forest pairs because it is impractical to enumerate all tree pairs over a phrase pair . |
Discussion of Translation Results | 0 Baseline system translation output: 44.6% o Phrase pairs matching source n-grams: 67.8% |
Inference during Translation Decoding | 0 Extend a hypothesis with a new phrase pair 0 Recombine hypotheses with identical states |
Related Work | Och (1999) showed a method for inducing bilingual word classes that placed each phrase pair into a two-dimensional equivalence class. |
Baselines | Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é| f) to a small epsilon value. |
Baselines | Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded. |
Ensemble Decoding | probability for each phrase pair (6, f) is given by: |
Training | Second, we extract phoneme phrase pairs consistent with these alignments. |
Training | From the example above, we pull out phrase pairs like: |
Training | We add these phrase pairs to FST B, and call this the phoneme-phrase-based model. |
Model Description | Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003). |
Model Description | Phrase pair probabilities p(€|7) are computed from these as described in Koehn et al. |
Model Description | For example, if topic 1:: is dominant in T, pk(é|7) may be quite large, but if p(k:|V) is very small, then we should steer away from this phrase pair and select a competing phrase pair which may have a lower probability in T, but which is more relevant to the test sentence at hand. |