Index of papers in Proc. ACL that mention
  • phrase pair
Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel
Abstract
We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs .
Abstract
The structured space of a synchronous grammar is a natural fit for phrase pair probability estimation, though the search space can be prohibitively large.
Introduction
Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair .
Phrasal Inversion Transduction Grammar
Our ITG has two nonterminals: X and C, where X represents compositional phrase pairs that can have recursive structures and C is the preterminal over terminal phrase pairs .
Phrasal Inversion Transduction Grammar
They split the left-hand side constituent which represents a phrase pair into two smaller phrase pairs on the right-hand side and order them according to one of the two possible permutations.
Phrasal Inversion Transduction Grammar
where Ze/f P (e / f) = l is a multinomial distribution over phrase pairs .
Variational Bayes for ITG
A sparse prior over a multinomial distribution such as the distribution of phrase pairs may bias the estimator toward skewed distributions that generalize better.
Variational Bayes for ITG
The other is the distribution of the phrase pairs .
Variational Bayes for ITG
By adjusting ac to a very small number, we hope to place more posterior mass on parsimonious solutions with fewer but more confident and general phrase pairs .
phrase pair is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing
Bilingually-constrained Recursive Auto-encoders
We can make inference from this fact that if a model can learn the same embedding for any phrase pair sharing the same meaning, the learned embedding must encode the semantics of the phrases and the corresponding model is our desire.
Bilingually-constrained Recursive Auto-encoders
For a phrase pair (3, If), two kinds of errors are involved:
Bilingually-constrained Recursive Auto-encoders
For the phrase pair (3, t), the joint error is:
Experiments
To obtain high-quality bilingual phrase pairs to train our BRAE model, we perform forced decoding for the bilingual training sentences and collect the phrase pairs used.
Experiments
After removing the duplicates, the remaining 1.12M bilingual phrase pairs (length ranging from 1 to 7) are obtained.
Experiments
These algorithms are based on corpus statistics including co-occurrence statistics, phrase pair usage and composition information.
Introduction
In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can-
phrase pair is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Input Features for DNN Feature Learning
Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)),
Input Features for DNN Feature Learning
3.2 Phrase pair similarity
Input Features for DNN Feature Learning
(2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality.
Introduction
First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair , such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation.
Introduction
To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Each layer is greedily pre-trained for 50 epochs through the entire phrase pairs .
Semi-Supervised Deep Auto-encoder Features Learning for SMT
After the pre-training, for each phrase pair in the phrase table, we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation.
phrase pair is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Abstract
A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly.
Introduction
(2013) use recursive auto encoders to make full use of the entire merging phrase pairs , going beyond the boundary words with a maximum entropy classifier (Xiong et al., 2006).
Introduction
So as to model the translation confidence for a translation phrase pair, we initialize the phrase pair embedding by leveraging the sparse features and recurrent neural network.
Introduction
The sparse features are phrase pairs in translation table, and recurrent neural network is utilized to learn a smoothed translation score with the source and target side information.
Our Model
We then check whether translation candidates can be found in the translation table for each span, together with the phrase pair embedding and recurrent input vector (global features).
Our Model
We extract phrase pairs using the conventional method (Och and Ney, 2004).
Our Model
0 Representations of phrase pairs are automatically learnt to optimize the translation performance, while features used in conventional model are handcrafted.
Related Work
Given the representations of the smaller phrase pairs, recursive auto-encoder can generate the representation of the parent phrase pair with a reordering confidence score.
phrase pair is mentioned in 36 sentences in this paper.
Topics mentioned in this paper:
Hashimoto, Chikara and Torisawa, Kentaro and Kloetzer, Julien and Sano, Motoki and Varga, István and Oh, Jong-Hoon and Kidawara, Yutaka
Event Causality Extraction Method
An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score.
Experiments
These three datasets have no overlap in terms of phrase pairs .
Experiments
We observed that CEAsup and CEAWS performed poorly and tended to favor event causality candidates whose phrase pairs were highly relevant to each other but described the contrasts of events rather than event causality (e. g. build a slow muscle and build a fast muscle) probably because their
Experiments
phrase pairs described two events that often happen in parallel but are not event causality (e. g. reduce the intake of energy and increase the energy consumption) in the highly ranked event causality candidates of Csuns and Cssup.
Future Scenario Generation Method
A naive approach chains two phrase pairs by exact matching.
Future Scenario Generation Method
Scenarios (scs) generated by chaining causally-compatible phrase pairs are scored by Score(sc), which embodies our assumption that an acceptable scenario consists of plausible event causality pairs:
Introduction
Annotators regarded as event causality only phrase pairs that were interpretable as event causality without contexts (i.e., self-contained).
phrase pair is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Introduction
Phrase pair extraction, the key step to discover translation knowledge, heavily relies on the scale of training data.
Introduction
Typically, the more parallel corpora used, the more phrase pairs and more accurate parameters will be learned, which can obviously be beneficial to improving translation performances.
Introduction
First, phrase pairs are extracted from each domain without interfering with other domains.
Phrase Pair Extraction with Unsupervised Phrasal ITGs
More formally, P((e, f >; 635, 67;) are the probability of phrase pairs (6, f >, which is parameterized by a phrase pair distribution 67; and a symbol distribution 635.
Phrase Pair Extraction with Unsupervised Phrasal ITGs
where d is the discount parameter, s is the strength parameter, and , and Pdac is a prior probability which acts as a fallback probability when a phrase pair is not in the model.
Phrase Pair Extraction with Unsupervised Phrasal ITGs
Under this model, the probability for a phrase pair found in a bilingual corpus (E, F) can be represented by the following equation using the Chinese restaurant process (Teh, 2006):
Related Work
Then al-1 the phrase pairs and features are tuned together with different weights during decoding.
Related Work
Classification-based methods must at least add an explicit label to indicate which domain the current phrase pair comes from.
Related Work
(2012) employed a feature-based approach, in which phrase pairs are enriched
phrase pair is mentioned in 34 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Introduction
The main idea of our method is to divide the phrase-level translation rule induction into two steps: bilingual lexicon induction and phrase pair induction.
Introduction
Since many researchers have studied the bilingual lexicon induction, in this paper, we mainly concentrate ourselves on phrase pair induction given a probabilistic bilingual lexicon and two in-domain large monolingual data (source and target language).
Introduction
In addition, we will further introduce how to refine the induced phrase pairs and estimate the parameters of the induced phrase pairs , such as four standard translation features and phrase reordering feature used in the conventional phrase-based models (Koehn et al., 2007).
What Can We Learn with Phrase Pair Induction?
Readers may doubt that if phrase pair induction is performed only using bilingual lexicon and monolingual data, what new translation knowledge can be learned?
What Can We Learn with Phrase Pair Induction?
In contrast, phrase pair induction can make up for this deficiency to some extent.
What Can We Learn with Phrase Pair Induction?
From the induced phrase pairs with our method, we have conducted a deep analysis and find that we can learn three kinds of new translation knowledge: 1) word reordering in a phrase pair ; 2) idioms; and 3) unknown word translations.
phrase pair is mentioned in 66 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Introduction
However, these phrase pairs with gaps do not capture structure reordering as do Hiero rules with nonterminal mappings.
Introduction
From this Hiero derivation, we have a segmentation of the sentence pairs into phrase pairs according to the word alignments, as shown on the left side of Figure 1.
Introduction
Ordering these phrase pairs according the word sequence on the target side, shown on the right side of Figure l, we have a phrase-based translation path consisting of four phrase pairs : (je, , (ne .
Phrasal-Hiero Model
Training: Represent each rule as a sequence of phrase pairs and nonterminals.
Phrasal-Hiero Model
Decoding: Use the rules’ sequences of phrase pairs and nonterminals to find the corresponding phrase-based path of a Hiero derivation and calculate its feature scores.
Phrasal-Hiero Model
2.1 Map Rule to A Sequence of Phrase Pairs and Nonterminals
phrase pair is mentioned in 51 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Corpus Data and Baseline SMT
We used this corpus to extract translation phrase pairs from bidirectional IBM Model 4 word alignment (Och and Ney, 2003) based on the heuristic approach of (Koehn et al., 2003).
Corpus Data and Baseline SMT
Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
Corpus Data and Baseline SMT
All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances.
Incremental Topic-Based Adaptation
Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.
Incremental Topic-Based Adaptation
Additionally, the SMT phrase table tracks, for each phrase pair, the set of parent training conversations (including the “background conversation”) from which that phrase pair originated.
Incremental Topic-Based Adaptation
Using this information, the decoder evaluates, for each candidate phrase pair
Introduction
As the conversation progresses, however, the gradual accumulation of contextual information can be used to infer the topic(s) of discussion, and to deploy contextually appropriate translation phrase pairs .
Introduction
Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
Relation to Prior Work
Rather, we compute a similarity between the current conversation history and each of the training conversations, and use this measure to dynamically score the relevance of candidate translation phrase pairs during decoding.
phrase pair is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Abstract
This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set.
Abstract
Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set.
Abstract
Thus, we obtain a decoding feature whose value represents the phrase pair’s closeness to the dev.
Introduction
typically use a rich feature set to decide on weights for the training data, at the sentence or phrase pair level.
Introduction
As in (Foster et al., 2010), this approach works at the level of phrase pairs .
Introduction
Instead of using word-based features and a computationally expensive training procedure, we capture the distributional properties of each phrase pair directly, representing it as a vector in a space which also contains a representation of the dev set.
phrase pair is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair .
Abstract
Multiple data-driven feature functions are proposed to capture the quality and confidence of phrases and phrase pairs .
Introduction
The first problem is referred to as phrase pair extraction, which identifies phrase pairs that are supposed to be translations of each other.
Introduction
The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003).
Introduction
Other methods do not depend on word alignments only, such as directly modeling phrase alignment in a joint generative way (Marcu and Wong, 2002), pursuing information extraction perspective (Venugopal et al., 2003), or augmenting with model-based phrase pair posterior (Deng and Byrne, 2005).
phrase pair is mentioned in 51 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Zhou, Ming
Abstract
On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs , which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Introduction
On top of the discriminative pruning method, we also propose a discriminative ITG alignment system using hierarchical phrase pairs .
The DITG Models
Another is DITG with hierarchical phrase pairs (henceforth HP-DITG), which relaxes the l-to-l constraint by adopting hierarchical phrase pairs in Chiang (2007).
The DITG Models
6.2 DITG with Hierarchical Phrase Pairs
The DITG Models
Wu (1997) proposes a bilingual segmentation grammar extending the terminal rules by including phrase pairs .
The DPDI Framework
f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g.
The DPDI Framework
3 An inconsistent link connects a word Within the phrase pair to some word outside the phrase pair .
phrase pair is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Clustering phrase pairs directly using the K-means algorithm
Using multiple word clusterings simultaneously, each based on a different number of classes, could turn this global, hard tradeoff into a local, soft one, informed by the number of phrase pair instances available for a given granularity.
Clustering phrase pairs directly using the K-means algorithm
We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors, i.e., points of a vector space.
Clustering phrase pairs directly using the K-means algorithm
then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label.
Hard rule labeling from word classes
(2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences.
Hard rule labeling from word classes
We convert each extracted phrase pair , represented by its source span (2', j) and target span (19,6), into an initial rule
Hard rule labeling from word classes
Then (depending on the extracted phrase pairs ), the resulting initial rules could be:
Introduction
Zollmann and Venugopal (2006) directly extend the rule extraction procedure from Chiang (2005) to heuristically label any phrase pair based on target language parse trees.
PSCFG-based translation
Chiang (2005) learns a single-nonterminal PSCFG from a bilingual corpus by first identifying initial phrase pairs using the technique from Koehn et al.
PSCFG-based translation
(2003), and then performing a generalization operation to generate phrase pairs with gaps, which can be viewed as PSCFG rules with generic ‘X’ nonterminal left-hand-sides and substitution sites.
phrase pair is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Flat ITG Model
(a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;).
Flat ITG Model
(b) If cc 2 REG, a regular ITG rule, generate phrase pairs (el, f1) and <62, f2) from Pflat, and concatenate them into a single phrase pair (6162, f1f2>.
Flat ITG Model
While the previous formulation can be used as-is in maximum likelihood training, this leads to a degenerate solution where every sentence is memorized as a single phrase pair .
Introduction
The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs .
Introduction
The model is similar to previously proposed phrase alignment models based on inversion transduction grammars (ITGs) (Cherry and Lin, 2007; Zhang et al., 2008; Blunsom et al., 2009), with one important change: ITG symbols and phrase pairs are generated in the opposite order.
Introduction
In traditional ITG models, the branches of a biparse tree are generated from a nonterminal distribution, and each leaf is generated by a word or phrase pair distribution.
phrase pair is mentioned in 33 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Alignment
We refer to singleton phrases as phrase pairs that occur only in one sentence.
Alignment
For these sentences, the decoder needs the singleton phrase pairs to produce an alignment.
Alignment
Standard leaving-one-out assigns a fixed probability 04 to singleton phrase pairs .
Introduction
The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system.
Phrase Model Training
The translation probability of a phrase pair (1216) is estimated as
Phrase Model Training
where CFA(f, 6) is the count of the phrase pair (f, 6) in the phrase-aligned training data.
Phrase Model Training
We will refer to this model as the count model as we simply count the number of occurrences of a phrase pair .
phrase pair is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun
Experiments
About 2.6M hierarchical phrase pairs extracted from the training corpus were used on the test set.
Experiments
This improvement resulted from the mixture of hierarchical phrase pairs and tree-to-string rules.
Experiments
To produce the result, the joint decoder made use of 8,114 hierarchical phrase pairs learned from training data, 6,800 glue rules connecting partial translations monotonically, and 16,554 tree-to-string rules.
Joint Decoding
Decoders that use rules with flat structures (e.g., phrase pairs ) usually generate target sentences from left to right while those using rules with hierarchical structures (e.g., SCFG rules) often run in a bottom-up style.
Joint Decoding
(2006) develop a bottom-up decoder for BTG (Wu, 1997) that uses only phrase pairs .
Joint Decoding
Hierarchical phrase pairs are used for translating smaller units and tree-to-string rules for bigger ones.
phrase pair is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
In order to utilize these newly acquired phrase pairs , we need to compute their relevant features.
Generation & Propagation
The phrase pairs have four log-probability features with two likelihood features and two lexical weighting features.
Generation & Propagation
In addition, we use a sophisticated lexicalized hierarchical reordering model (HRM) (Galley and Manning, 2008) with five features for each phrase pair .
Related Work
The operational scope of their approach is limited in that they assume a scenario where unknown phrase pairs are provided (thereby sidestepping the issue of translation candidate generation for completely unknown phrases), and what remains is the estimation of phrasal probabilities.
Related Work
In our case, we obtain the phrase pairs from the graph structure (and therefore indirectly from the monolingual data) and a separate generation step, which plays an important role in good performance of the method.
phrase pair is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Comparative Study
We count how often each extracted phrase pair is found with each of the three reordering types.
Comparative Study
The bilingual sequence of phrase pairs will be extracted using the same strategy in Figure 4.
Comparative Study
Suppose the search state is now extended with a new phrase pair (1216).
Tagging-style Reordering Model
During the search, a sentence pair ( 1‘], (if) will be formally splitted into a segmentation Sff which consists of K phrase pairs .
Tagging-style Reordering Model
Suppose the search~ state is ~now extended with a new phrase pair (fk,ék): fk, :2 fbk .
Tagging-style Reordering Model
We also have the word alignment within the new phrase pair , which is stored during the phrase extraction process.
phrase pair is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Introduction
Moreover, syntax-based approaches often suffer from the rule coverage problem since syntactic constraints rule out a large portion of non-syntactic phrase pairs , which might help decoders generalize well to unseen data (Marcu et al., 2006).
Introduction
The basic unit of translation in our model is string-to-dependency phrase pair , which consists of a phrase on the source side and a dependency structure on the target side.
Introduction
The algorithm generates well-formed dependency structures for partial translations left-to-right using string-to-dependency phrase pairs .
phrase pair is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Macherey, Klaus
Conclusion
The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
Experimental Results
In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs .
Experimental Results
Table 3: Phrase pair extraction accuracy for phrase pairs up to length 5.
Experimental Results
Possible links are both included and excluded from phrase pairs during extraction, as in DeNero and Klein (2010).
phrase pair is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Translation Model Architecture
For phrase pairs which are not found, C(E, f) and C(f) are initially set to 0.
Translation Model Architecture
Note that C(E) is potentially incorrect at this point, since a phrase pair not being found does not entail that C(E) is 0.
Translation Model Architecture
4We prune the tables to the most frequent 50 phrase pairs per source phrase before combining them, since calculating the features for all phrase pairs of very common source phrases causes a significant slowdown.
phrase pair is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Introduction
We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus, and a vector space model, which represents each utterance as a vector and learns a similarity score between them.
Paraphrasing
We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013).
Paraphrasing
We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams.
Paraphrasing
This results in a phrase table with approximately 1.3 million phrase pairs .
phrase pair is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Experimental Results
One is an re-imp1ementation of Hiero (Chiang, 2005), and the other is a hybrid syntax-based tree-to-string system (Zhao and Al-onaizan, 2008), where normal phrase pairs and Hiero rules are used as a backoff for tree-to-string rules.
Experimental Results
Next we extract phrase pairs , Hiero rules and tree-to-string rules from the original word alignment and the improved word alignment, and tune all the feature weights on the tuning set.
Integrating Empty Categories in Machine Translation
For example, for a hierarchical MT system, some phrase pairs and Hiero (Chiang, 2005) rules can be extracted with recovered *pro* and *PRO* at the Chinese side.
Integrating Empty Categories in Machine Translation
For each phrase pair , Hiero rule or tree-to-string rule in the MT system, a binary feature fk, fires if there exists a *pro* on the source side and it aligns to one of its most frequently aligned target words found in the training corpus.
Integrating Empty Categories in Machine Translation
The motivation for such sparse features is to reward those phrase pairs and rules that have highly confident lexical pairs specifically related to ECs, and penalize those who don’t have such lexical pairs.
phrase pair is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Experiments
We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations.
Related Work
(2010) introduce topic model for filtering topic-mismatched phrase pairs .
Related Work
Similarly, each phrase pair is also assigned with one specific topic.
Related Work
A phrase pair will be discarded if its topic mismatches the document topic.
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Topic Models for Machine Translation
Lexical Weighting In phrase-based SMT, lexical weighting features estimate the phrase pair quality by combining lexical translation probabilities of words in a phrase (Koehn et al., 2003).
Topic Models for Machine Translation
The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003).
Topic Models for Machine Translation
from which we can compute the phrase pair probabilities pw (6| f; k) by multiplying the lexical probabilities and normalizing as in Koehn et a1.
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Experiments
In this way, the intersected English phrases in the CE corpus and ES corpus becomes more, which enables the Chinese-Spanish translation model induced using the triangulation method to cover more phrase pairs .
Pivot Methods for Phrase-based SMT
(2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s
Pivot Methods for Phrase-based SMT
Let a1 and a2 represent the word alignment information inside the phrase pairs (5,13) and (13, 2?)
Using RBMT Systems for Pivot Translation
It can also change the distribution of some phrase pairs and reinforce some phrase pairs relative to the test set.
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Bag-of-Words Vector Space Model
In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
Bag-of-Words Vector Space Model
Consider a rule with non-terminals on the source and target side; for a given instance of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals.
Bag-of-Words Vector Space Model
In turn, the context for the sub-phrases that instantiate the non-terminals will be the words in the remainder of the phrase pair .
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef
Head-Driven HPB Translation Model
For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007).
Head-Driven HPB Translation Model
We extract HD-HRs and NRRs based on initial phrase pairs , respectively.
Head-Driven HPB Translation Model
We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads.
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lee, Taesung and Hwang, Seung-won
Methods
0 f3 (7“, C) = lHWithin(7“, c)|/|Hwithin(c)|: This is a variation of f2 that considers only noun phrase pairs shared at least once by relations in c.
Methods
o f4(7“, C) = lHWithin(7“, c)|/|Hshared(c)|: ThlS lS a variation of f 2 that considers only noun phrase pairs shared at least once by any pair of relations.
Methods
where Hwithino“, c) = UMEC H(7“, 0) fl H(7“,7“*), the intersection, considering translation, of H (7“) and noun phrase pairs shared at once by relations in c, Hwithin(c) = UMEC H(7“*,c — {r*}), and Hshared(c) = Ur*€REURC [{(7~*7 c), the noun phrase pairs shared at once by any relations.
phrase pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Related Work
UOM Views reordering as a process of generating (19, 0) in a left-to-right fashion, where b is the current phrase pair and 0 is the orientation of b with the preViously generated phrase pair 19’.
Related Work
Tillmann and Zhang (2007) proposed a Bigram Orientation Model (BOM) to include both phrase pairs (1) and (9’) into the model.
Two-Neighbor Orientation Model
2We represent a chunk as a source and target phrase pair (fjf/BZ) where the subscript and the superscript indicate the
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
To prevent overf1tting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence.
Abstract
A set of phrase pairs are extracted from word-aligned parallel corpus according to phrase extraction rules (Koehn et al., 2003).
Abstract
We use the word translation table from IBM Model 1 (Brown et al., 1993) and compute the sum over all possible word alignments within a phrase pair without normalizing for length (Quirk et al., 2005).
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chang, Yin-Wen and Rush, Alexander M. and DeNero, John and Collins, Michael
Experiments
alignment phrase pair
Experiments
Table 2: Alignment accuracy and phrase pair extraction accuracy for directional and bidirectional models.
Experiments
AER is alignment error rate and F l is the phrase pair extraction F1 score.
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Abstract
Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering.
Abstract
This mechanism implicitly supports not only traditional phrase pairs , but also gapping phrases which are nonconsecutive in the source.
Experiments
In contrast, our model better aligns the function words, such that many more useful phrase pairs can be extracted, i.e., <7£, ’m>, <ilZ, looking for>, <l€§l% fig, grill-type> and their combinations with neighbouring phrase pairs .
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Alignment Methods
Phrasal ITGs are ITGs that allow for non-terminals that can emit phrase pairs with multiple elements on both the source and target sides.
LookAhead Biparsing
This probability is the combination of the generative probability of each phrase pair Pt(e:, f3) as well as the sum the probabilities over all shorter spans in straight and inverted order2
Substring Prior Probabilities
While the Bayesian phrasal ITG framework uses the previously mentioned phrase distribution Pt during search, it also allows for definition of a phrase pair prior probability Ppmofieg, f3), which can efficiently seed the search process with a bias towards phrase pairs that satisfy certain properties.
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang and Lü, Yajuan and Liu, Qun
Experiments
We extracted phrase pairs from the training data to investigate how many phrase pairs can be captured by lexicalized tree-to-tree rules that contain only terminals.
Experiments
We set the maximal length of phrase pairs to 10.
Related Work
While this method works for l-best tree pairs, it cannot be applied to packed forest pairs because it is impractical to enumerate all tree pairs over a phrase pair .
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and DeNero, John
Discussion of Translation Results
0 Baseline system translation output: 44.6% o Phrase pairs matching source n-grams: 67.8%
Inference during Translation Decoding
0 Extend a hypothesis with a new phrase pair 0 Recombine hypotheses with identical states
Related Work
Och (1999) showed a method for inducing bilingual word classes that placed each phrase pair into a two-dimensional equivalence class.
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Baselines
Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é| f) to a small epsilon value.
Baselines
Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded.
Ensemble Decoding
probability for each phrase pair (6, f) is given by:
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shi, Xing and Knight, Kevin and Ji, Heng
Training
Second, we extract phoneme phrase pairs consistent with these alignments.
Training
From the example above, we pull out phrase pairs like:
Training
We add these phrase pairs to FST B, and call this the phoneme-phrase-based model.
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Boyd-Graber, Jordan and Resnik, Philip
Model Description
Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003).
Model Description
Phrase pair probabilities p(€|7) are computed from these as described in Koehn et al.
Model Description
For example, if topic 1:: is dominant in T, pk(é|7) may be quite large, but if p(k:|V) is very small, then we should steer away from this phrase pair and select a competing phrase pair which may have a lower probability in T, but which is more relevant to the test sentence at hand.
phrase pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: