Index of papers in Proc. ACL that mention

phrase pair

Seen in text as:

phrase pair (284)
phrase pairs (277)
Phrase pairs (8)
Phrase pair (8)
Phrase Pair (7)

Seen in 513 sentences in 39 papers.

1. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs .
Abstract	The structured space of a synchronous grammar is a natural fit for phrase pair probability estimation, though the search space can be prohibitively large.
Introduction	Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair .
Phrasal Inversion Transduction Grammar	Our ITG has two nonterminals: X and C, where X represents compositional phrase pairs that can have recursive structures and C is the preterminal over terminal phrase pairs .
Phrasal Inversion Transduction Grammar	They split the left-hand side constituent which represents a phrase pair into two smaller phrase pairs on the right-hand side and order them according to one of the two possible permutations.
Phrasal Inversion Transduction Grammar	where Ze/f P (e / f) = l is a multinomial distribution over phrase pairs .
Variational Bayes for ITG	A sparse prior over a multinomial distribution such as the distribution of phrase pairs may bias the estimator toward skewed distributions that generalize better.
Variational Bayes for ITG	The other is the distribution of the phrase pairs .
Variational Bayes for ITG	By adjusting ac to a very small number, we hope to place more posterior mass on parsimonious solutions with fewer but more confident and general phrase pairs .

phrase pair is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

2. Bilingually-constrained Phrase Embeddings for Machine Translation

Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingually-constrained Recursive Auto-encoders	We can make inference from this fact that if a model can learn the same embedding for any phrase pair sharing the same meaning, the learned embedding must encode the semantics of the phrases and the corresponding model is our desire.
Bilingually-constrained Recursive Auto-encoders	For a phrase pair (3, If), two kinds of errors are involved:
Bilingually-constrained Recursive Auto-encoders	For the phrase pair (3, t), the joint error is:
Experiments	To obtain high-quality bilingual phrase pairs to train our BRAE model, we perform forced decoding for the bilingual training sentences and collect the phrase pairs used.
Experiments	After removing the duplicates, the remaining 1.12M bilingual phrase pairs (length ranging from 1 to 7) are obtained.
Experiments	These algorithms are based on corpus statistics including co-occurrence statistics, phrase pair usage and composition information.
Introduction	In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can-

phrase pair is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Input Features for DNN Feature Learning	Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6\| f) and P (f \|e)), bidirectional lexical weighting (Lem(e\|f) and Lex(f\|e)),
Input Features for DNN Feature Learning	3.2 Phrase pair similarity
Input Features for DNN Feature Learning	(2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality.
Introduction	First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair , such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation.
Introduction	To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments.
Semi-Supervised Deep Auto-encoder Features Learning for SMT	To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch.
Semi-Supervised Deep Auto-encoder Features Learning for SMT	Each layer is greedily pre-trained for 50 epochs through the entire phrase pairs .
Semi-Supervised Deep Auto-encoder Features Learning for SMT	After the pre-training, for each phrase pair in the phrase table, we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation.

phrase pair is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

4. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly.
Introduction	(2013) use recursive auto encoders to make full use of the entire merging phrase pairs , going beyond the boundary words with a maximum entropy classifier (Xiong et al., 2006).
Introduction	So as to model the translation confidence for a translation phrase pair, we initialize the phrase pair embedding by leveraging the sparse features and recurrent neural network.
Introduction	The sparse features are phrase pairs in translation table, and recurrent neural network is utilized to learn a smoothed translation score with the source and target side information.
Our Model	We then check whether translation candidates can be found in the translation table for each span, together with the phrase pair embedding and recurrent input vector (global features).
Our Model	We extract phrase pairs using the conventional method (Och and Ney, 2004).
Our Model	0 Representations of phrase pairs are automatically learnt to optimize the translation performance, while features used in conventional model are handcrafted.
Related Work	Given the representations of the smaller phrase pairs, recursive auto-encoder can generate the representation of the parent phrase pair with a reordering confidence score.

phrase pair is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

5. Toward Future Scenario Generation: Extracting Event Causality Exploiting Semantic Relation, Context, and Association Features

Hashimoto, Chikara and Torisawa, Kentaro and Kloetzer, Julien and Sano, Motoki and Varga, István and Oh, Jong-Hoon and Kidawara, Yutaka

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Event Causality Extraction Method	An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score.
Experiments	These three datasets have no overlap in terms of phrase pairs .
Experiments	We observed that CEAsup and CEAWS performed poorly and tended to favor event causality candidates whose phrase pairs were highly relevant to each other but described the contrasts of events rather than event causality (e. g. build a slow muscle and build a fast muscle) probably because their
Experiments	phrase pairs described two events that often happen in parallel but are not event causality (e. g. reduce the intake of energy and increase the energy consumption) in the highly ranked event causality candidates of Csuns and Cssup.
Future Scenario Generation Method	A naive approach chains two phrase pairs by exact matching.
Future Scenario Generation Method	Scenarios (scs) generated by chaining causally-compatible phrase pairs are scored by Score(sc), which embodies our assumption that an acceptable scenario consists of plausible event causality pairs:
Introduction	Annotators regarded as event causality only phrase pairs that were interpretable as event causality without contexts (i.e., self-contained).

phrase pair is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

6. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Phrase pair extraction, the key step to discover translation knowledge, heavily relies on the scale of training data.
Introduction	Typically, the more parallel corpora used, the more phrase pairs and more accurate parameters will be learned, which can obviously be beneficial to improving translation performances.
Introduction	First, phrase pairs are extracted from each domain without interfering with other domains.
Phrase Pair Extraction with Unsupervised Phrasal ITGs	More formally, P((e, f >; 635, 67;) are the probability of phrase pairs (6, f >, which is parameterized by a phrase pair distribution 67; and a symbol distribution 635.
Phrase Pair Extraction with Unsupervised Phrasal ITGs	where d is the discount parameter, s is the strength parameter, and , and Pdac is a prior probability which acts as a fallback probability when a phrase pair is not in the model.
Phrase Pair Extraction with Unsupervised Phrasal ITGs	Under this model, the probability for a phrase pair found in a bilingual corpus (E, F) can be represented by the following equation using the Chinese restaurant process (Teh, 2006):
Related Work	Then al-1 the phrase pairs and features are tuned together with different weights during decoding.
Related Work	Classification-based methods must at least add an explicit label to indicate which domain the current phrase pair comes from.
Related Work	(2012) employed a feature-based approach, in which phrase pairs are enriched

phrase pair is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

7. Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Zhang, Jiajun and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The main idea of our method is to divide the phrase-level translation rule induction into two steps: bilingual lexicon induction and phrase pair induction.
Introduction	Since many researchers have studied the bilingual lexicon induction, in this paper, we mainly concentrate ourselves on phrase pair induction given a probabilistic bilingual lexicon and two in-domain large monolingual data (source and target language).
Introduction	In addition, we will further introduce how to refine the induced phrase pairs and estimate the parameters of the induced phrase pairs , such as four standard translation features and phrase reordering feature used in the conventional phrase-based models (Koehn et al., 2007).
What Can We Learn with Phrase Pair Induction?	Readers may doubt that if phrase pair induction is performed only using bilingual lexicon and monolingual data, what new translation knowledge can be learned?
What Can We Learn with Phrase Pair Induction?	In contrast, phrase pair induction can make up for this deficiency to some extent.
What Can We Learn with Phrase Pair Induction?	From the induced phrase pairs with our method, we have conducted a deep analysis and find that we can learn three kinds of new translation knowledge: 1) word reordering in a phrase pair ; 2) idioms; and 3) unknown word translations.

phrase pair is mentioned in 66 sentences in this paper.

Topics mentioned in this paper:

8. Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Nguyen, ThuyLinh and Vogel, Stephan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	However, these phrase pairs with gaps do not capture structure reordering as do Hiero rules with nonterminal mappings.
Introduction	From this Hiero derivation, we have a segmentation of the sentence pairs into phrase pairs according to the word alignments, as shown on the left side of Figure 1.
Introduction	Ordering these phrase pairs according the word sequence on the target side, shown on the right side of Figure l, we have a phrase-based translation path consisting of four phrase pairs : (je, , (ne .
Phrasal-Hiero Model	Training: Represent each rule as a sequence of phrase pairs and nonterminals.
Phrasal-Hiero Model	Decoding: Use the rules’ sequences of phrase pairs and nonterminals to find the corresponding phrase-based path of a Hiero derivation and calculate its feature scores.
Phrasal-Hiero Model	2.1 Map Rule to A Sequence of Phrase Pairs and Nonterminals

phrase pair is mentioned in 51 sentences in this paper.

Topics mentioned in this paper:

9. Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpus Data and Baseline SMT	We used this corpus to extract translation phrase pairs from bidirectional IBM Model 4 word alignment (Och and Ney, 2003) based on the heuristic approach of (Koehn et al., 2003).
Corpus Data and Baseline SMT	Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
Corpus Data and Baseline SMT	All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances.
Incremental Topic-Based Adaptation	Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.
Incremental Topic-Based Adaptation	Additionally, the SMT phrase table tracks, for each phrase pair, the set of parent training conversations (including the “background conversation”) from which that phrase pair originated.
Incremental Topic-Based Adaptation	Using this information, the decoder evaluates, for each candidate phrase pair
Introduction	As the conversation progresses, however, the gradual accumulation of contextual information can be used to infer the topic(s) of discussion, and to deploy contextually appropriate translation phrase pairs .
Introduction	Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
Relation to Prior Work	Rather, we compute a similarity between the current conversation history and each of the training conversations, and use this measure to dynamically score the relevance of candidate translation phrase pairs during decoding.

phrase pair is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

10. Vector Space Model for Adaptation in Statistical Machine Translation

Chen, Boxing and Kuhn, Roland and Foster, George

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set.
Abstract	Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set.
Abstract	Thus, we obtain a decoding feature whose value represents the phrase pair’s closeness to the dev.
Introduction	typically use a rich feature set to decide on weights for the training data, at the sentence or phrase pair level.
Introduction	As in (Foster et al., 2010), this approach works at the level of phrase pairs .
Introduction	Instead of using word-based features and a computationally expensive training procedure, we capture the distributional properties of each phrase pair directly, representing it as a vector in a space which also contains a representation of the dev set.

phrase pair is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

11. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Generic Phrase Training Procedure	We first train word alignment models and will use them to evaluate the goodness of a phrase and a phrase pair .
Abstract	Multiple data-driven feature functions are proposed to capture the quality and confidence of phrases and phrase pairs .
Introduction	The first problem is referred to as phrase pair extraction, which identifies phrase pairs that are supposed to be translations of each other.
Introduction	The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003).
Introduction	Other methods do not depend on word alignments only, such as directly modeling phrase alignment in a joint generative way (Marcu and Wong, 2002), pursuing information extraction perspective (Venugopal et al., 2003), or augmenting with model-based phrase pair posterior (Deng and Byrne, 2005).

phrase pair is mentioned in 51 sentences in this paper.

Topics mentioned in this paper:

12. Discriminative Pruning for Discriminative ITG Alignment

Liu, Shujie and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs , which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Introduction	On top of the discriminative pruning method, we also propose a discriminative ITG alignment system using hierarchical phrase pairs .
The DITG Models	Another is DITG with hierarchical phrase pairs (henceforth HP-DITG), which relaxes the l-to-l constraint by adopting hierarchical phrase pairs in Chiang (2007).
The DITG Models	6.2 DITG with Hierarchical Phrase Pairs
The DITG Models	Wu (1997) proposes a bilingual segmentation grammar extending the terminal rules by including phrase pairs .
The DPDI Framework	f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g.
The DPDI Framework	3 An inconsistent link connects a word Within the phrase pair to some word outside the phrase pair .

phrase pair is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

13. A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

Zollmann, Andreas and Vogel, Stephan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering phrase pairs directly using the K-means algorithm	Using multiple word clusterings simultaneously, each based on a different number of classes, could turn this global, hard tradeoff into a local, soft one, informed by the number of phrase pair instances available for a given granularity.
Clustering phrase pairs directly using the K-means algorithm	We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors, i.e., points of a vector space.
Clustering phrase pairs directly using the K-means algorithm	then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label.
Hard rule labeling from word classes	(2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences.
Hard rule labeling from word classes	We convert each extracted phrase pair , represented by its source span (2', j) and target span (19,6), into an initial rule
Hard rule labeling from word classes	Then (depending on the extracted phrase pairs ), the resulting initial rules could be:
Introduction	Zollmann and Venugopal (2006) directly extend the rule extraction procedure from Chiang (2005) to heuristically label any phrase pair based on target language parse trees.
PSCFG-based translation	Chiang (2005) learns a single-nonterminal PSCFG from a bilingual corpus by first identifying initial phrase pairs using the technique from Koehn et al.
PSCFG-based translation	(2003), and then performing a generalization operation to generate phrase pairs with gaps, which can be viewed as PSCFG rules with generic ‘X’ nonterminal left-hand-sides and substitution sites.

phrase pair is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

14. An Unsupervised Model for Joint Phrase Alignment and Extraction

Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Flat ITG Model	(a) If cc 2 TERM, generate a phrase pair from the phrase table Pt((e, f ); 67;).
Flat ITG Model	(b) If cc 2 REG, a regular ITG rule, generate phrase pairs (el, f1) and <62, f2) from Pflat, and concatenate them into a single phrase pair (6162, f1f2>.
Flat ITG Model	While the previous formulation can be used as-is in maximum likelihood training, this leads to a degenerate solution where every sentence is memorized as a single phrase pair .
Introduction	The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs .
Introduction	The model is similar to previously proposed phrase alignment models based on inversion transduction grammars (ITGs) (Cherry and Lin, 2007; Zhang et al., 2008; Blunsom et al., 2009), with one important change: ITG symbols and phrase pairs are generated in the opposite order.
Introduction	In traditional ITG models, the branches of a biparse tree are generated from a nonterminal distribution, and each leaf is generated by a word or phrase pair distribution.

phrase pair is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

15. Training Phrase Translation Models with Leaving-One-Out

Wuebker, Joern and Mauser, Arne and Ney, Hermann

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Alignment	We refer to singleton phrases as phrase pairs that occur only in one sentence.
Alignment	For these sentences, the decoder needs the singleton phrase pairs to produce an alignment.
Alignment	Standard leaving-one-out assigns a fixed probability 04 to singleton phrase pairs .
Introduction	The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system.
Phrase Model Training	The translation probability of a phrase pair (1216) is estimated as
Phrase Model Training	where CFA(f, 6) is the count of the phrase pair (f, 6) in the phrase-aligned training data.
Phrase Model Training	We will refer to this model as the count model as we simply count the number of occurrences of a phrase pair .

phrase pair is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

16. Joint Decoding with Multiple Translation Models

Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	About 2.6M hierarchical phrase pairs extracted from the training corpus were used on the test set.
Experiments	This improvement resulted from the mixture of hierarchical phrase pairs and tree-to-string rules.
Experiments	To produce the result, the joint decoder made use of 8,114 hierarchical phrase pairs learned from training data, 6,800 glue rules connecting partial translations monotonically, and 16,554 tree-to-string rules.
Joint Decoding	Decoders that use rules with flat structures (e.g., phrase pairs ) usually generate target sentences from left to right while those using rules with hierarchical structures (e.g., SCFG rules) often run in a bottom-up style.
Joint Decoding	(2006) develop a bottom-up decoder for BTG (Wu, 1997) that uses only phrase pairs .
Joint Decoding	Hierarchical phrase pairs are used for translating smaller units and tree-to-string rules for bigger ones.

phrase pair is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

17. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generation & Propagation	In order to utilize these newly acquired phrase pairs , we need to compute their relevant features.
Generation & Propagation	The phrase pairs have four log-probability features with two likelihood features and two lexical weighting features.
Generation & Propagation	In addition, we use a sophisticated lexicalized hierarchical reordering model (HRM) (Galley and Manning, 2008) with five features for each phrase pair .
Related Work	The operational scope of their approach is limited in that they assume a scenario where unknown phrase pairs are provided (thereby sidestepping the issue of translation candidate generation for completely unknown phrases), and what remains is the estimation of phrasal probabilities.
Related Work	In our case, we obtain the phrase pairs from the graph structure (and therefore indirectly from the monolingual data) and a separate generation step, which plays an important role in good performance of the method.

phrase pair is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

18. Advancements in Reordering Models for Statistical Machine Translation

Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparative Study	We count how often each extracted phrase pair is found with each of the three reordering types.
Comparative Study	The bilingual sequence of phrase pairs will be extracted using the same strategy in Figure 4.
Comparative Study	Suppose the search state is now extended with a new phrase pair (1216).
Tagging-style Reordering Model	During the search, a sentence pair ( 1‘], (if) will be formally splitted into a segmentation Sff which consists of K phrase pairs .
Tagging-style Reordering Model	Suppose the search~ state is ~now extended with a new phrase pair (fk,ék): fk, :2 fbk .
Tagging-style Reordering Model	We also have the word alignment within the new phrase pair , which is stored during the phrase extraction process.

phrase pair is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

CRFs (19)
LM (14)
BLEU (8)

19. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Moreover, syntax-based approaches often suffer from the rule coverage problem since syntactic constraints rule out a large portion of non-syntactic phrase pairs , which might help decoders generalize well to unseen data (Marcu et al., 2006).
Introduction	The basic unit of translation in our model is string-to-dependency phrase pair , which consists of a phrase on the source side and a dependency structure on the target side.
Introduction	The algorithm generates well-formed dependency structures for partial translations left-to-right using string-to-dependency phrase pairs .

phrase pair is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

20. Model-Based Aligner Combination Using Dual Decomposition

DeNero, John and Macherey, Klaus

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
Experimental Results	In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs .
Experimental Results	Table 3: Phrase pair extraction accuracy for phrase pairs up to length 5.
Experimental Results	Possible links are both included and excluded from phrase pairs during extraction, as in DeNero and Klein (2010).

phrase pair is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Model Architecture	For phrase pairs which are not found, C(E, f) and C(f) are initially set to 0.
Translation Model Architecture	Note that C(E) is potentially incorrect at this point, since a phrase pair not being found does not entail that C(E) is 0.
Translation Model Architecture	4We prune the tables to the most frequent 50 phrase pairs per source phrase before combining them, since calculating the features for all phrase pairs of very common source phrases causes a significant slowdown.

phrase pair is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

22. Semantic Parsing via Paraphrasing

Berant, Jonathan and Liang, Percy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus, and a vector space model, which represents each utterance as a vector and learns a similarity score between them.
Paraphrasing	We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013).
Paraphrasing	We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams.
Paraphrasing	This results in a phrase table with approximately 1.3 million phrase pairs .

phrase pair is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Enlisting the Ghost: Modeling Empty Categories for Machine Translation

Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	One is an re-imp1ementation of Hiero (Chiang, 2005), and the other is a hybrid syntax-based tree-to-string system (Zhao and Al-onaizan, 2008), where normal phrase pairs and Hiero rules are used as a backoff for tree-to-string rules.
Experimental Results	Next we extract phrase pairs , Hiero rules and tree-to-string rules from the original word alignment and the improved word alignment, and tune all the feature weights on the tuning set.
Integrating Empty Categories in Machine Translation	For example, for a hierarchical MT system, some phrase pairs and Hiero (Chiang, 2005) rules can be extracted with recovered pro and PRO at the Chinese side.
Integrating Empty Categories in Machine Translation	For each phrase pair , Hiero rule or tree-to-string rule in the MT system, a binary feature fk, fires if there exists a pro on the source side and it aligns to one of its most frequently aligned target words found in the training corpus.
Integrating Empty Categories in Machine Translation	The motivation for such sparse features is to reward those phrase pairs and rules that have highly confident lexical pairs specifically related to ECs, and penalize those who don’t have such lexical pairs.

phrase pair is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations.
Related Work	(2010) introduce topic model for filtering topic-mismatched phrase pairs .
Related Work	Similarly, each phrase pair is also assigned with one specific topic.
Related Work	A phrase pair will be discarded if its topic mismatches the document topic.

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. Polylingual Tree-Based Topic Models for Translation Domain Adaptation

Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Topic Models for Machine Translation	Lexical Weighting In phrase-based SMT, lexical weighting features estimate the phrase pair quality by combining lexical translation probabilities of words in a phrase (Koehn et al., 2003).
Topic Models for Machine Translation	The phrase pair probabilities pw (6\| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003).
Topic Models for Machine Translation	from which we can compute the phrase pair probabilities pw (6\| f; k) by multiplying the lexical probabilities and normalizing as in Koehn et a1.

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In this way, the intersected English phrases in the CE corpus and ES corpus becomes more, which enables the Chinese-Spanish translation model induced using the triangulation method to cover more phrase pairs .
Pivot Methods for Phrase-based SMT	(2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s
Pivot Methods for Phrase-based SMT	Let a1 and a2 represent the word alignment information inside the phrase pairs (5,13) and (13, 2?)
Using RBMT Systems for Pivot Translation	It can also change the distribution of some phrase pairs and reinforce some phrase pairs relative to the test set.

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

27. Bilingual Sense Similarity for Statistical Machine Translation

Chen, Boxing and Foster, George and Kuhn, Roland

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bag-of-Words Vector Space Model	In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
Bag-of-Words Vector Space Model	Consider a rule with non-terminals on the source and target side; for a given instance of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals.
Bag-of-Words Vector Space Model	In turn, the context for the sub-phrases that instantiate the non-terminals will be the words in the remainder of the phrase pair .

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Head-Driven Hierarchical Phrase-based Translation

Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Head-Driven HPB Translation Model	For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007).
Head-Driven HPB Translation Model	We extract HD-HRs and NRRs based on initial phrase pairs , respectively.
Head-Driven HPB Translation Model	We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads.

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Bootstrapping Entity Translation on Weakly Comparable Corpora

Lee, Taesung and Hwang, Seung-won

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methods	0 f3 (7“, C) = lHWithin(7“, c)\|/\|Hwithin(c)\|: This is a variation of f2 that considers only noun phrase pairs shared at least once by relations in c.
Methods	o f4(7“, C) = lHWithin(7“, c)\|/\|Hshared(c)\|: ThlS lS a variation of f 2 that considers only noun phrase pairs shared at least once by any pair of relations.
Methods	where Hwithino“, c) = UMEC H(7“, 0) fl H(7“,7“), the intersection, considering translation, of H (7“) and noun phrase pairs shared at once by relations in c, Hwithin(c) = UMEC H(7“,c — {r}), and Hshared(c) = Ur€REURC [{(7~*7 c), the noun phrase pairs shared at once by any relations.

phrase pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

30. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	UOM Views reordering as a process of generating (19, 0) in a left-to-right fashion, where b is the current phrase pair and 0 is the orientation of b with the preViously generated phrase pair 19’.
Related Work	Tillmann and Zhang (2007) proposed a Bigram Orientation Model (BOM) to include both phrase pairs (1) and (9’) into the model.
Two-Neighbor Orientation Model	2We represent a chunk as a source and target phrase pair (fjf/BZ) where the subscript and the superscript indicate the

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

He, Xiaodong and Deng, Li

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To prevent overf1tting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence.
Abstract	A set of phrase pairs are extracted from word-aligned parallel corpus according to phrase extraction rules (Koehn et al., 2003).
Abstract	We use the word translation table from IBM Model 1 (Brown et al., 1993) and compute the sum over all possible word alignments within a phrase pair without normalizing for length (Quirk et al., 2005).

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

32. A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Chang, Yin-Wen and Rush, Alexander M. and DeNero, John and Collins, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	alignment phrase pair
Experiments	Table 2: Alignment accuracy and phrase pair extraction accuracy for directional and bidirectional models.
Experiments	AER is alignment error rate and F l is the phrase pair extraction F1 score.

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Feng, Yang and Cohn, Trevor

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering.
Abstract	This mechanism implicitly supports not only traditional phrase pairs , but also gapping phrases which are nonconsecutive in the source.
Experiments	In contrast, our model better aligns the function words, such that many more useful phrase pairs can be extracted, i.e., <7£, ’m>, <ilZ, looking for>, <l€§l% fig, grill-type> and their combinations with neighbouring phrase pairs .

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Machine Translation without Words through Substring Alignment

Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Alignment Methods	Phrasal ITGs are ITGs that allow for non-terminals that can emit phrase pairs with multiple elements on both the source and target sides.
LookAhead Biparsing	This probability is the combination of the generative probability of each phrase pair Pt(e:, f3) as well as the sum the probabilities over all shorter spans in straight and inverted order2
Substring Prior Probabilities	While the Bayesian phrasal ITG framework uses the previously mentioned phrase distribution Pt during search, it also allows for definition of a phrase pair prior probability Ppmofieg, f3), which can efficiently seed the search process with a bias towards phrase pairs that satisfy certain properties.

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. Improving Tree-to-Tree Translation with Packed Forests

Liu, Yang and Lü, Yajuan and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We extracted phrase pairs from the training data to investigate how many phrase pairs can be captured by lexicalized tree-to-tree rules that contain only terminals.
Experiments	We set the maximal length of phrase pairs to 10.
Related Work	While this method works for l-best tree pairs, it cannot be applied to packed forest pairs because it is impractical to enumerate all tree pairs over a phrase pair .

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

36. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion of Translation Results	0 Baseline system translation output: 44.6% o Phrase pairs matching source n-grams: 67.8%
Inference during Translation Decoding	0 Extend a hypothesis with a new phrase pair 0 Recombine hypotheses with identical states
Related Work	Och (1999) showed a method for inducing bilingual word classes that placed each phrase pair into a two-dimensional equivalence class.

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

37. Mixing Multiple Translation Models in Statistical Machine Translation

Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baselines	Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é\| f) to a small epsilon value.
Baselines	Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é\|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded.
Ensemble Decoding	probability for each phrase pair (6, f) is given by:

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

38. How to Speak a Language without Knowing It

Shi, Xing and Knight, Kevin and Ji, Heng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Training	Second, we extract phoneme phrase pairs consistent with these alignments.
Training	From the example above, we pull out phrase pairs like:
Training	We add these phrase pairs to FST B, and call this the phoneme-phrase-based model.

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

39. Topic Models for Dynamic Translation Model Adaptation

Eidelman, Vladimir and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model Description	Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003).
Model Description	Phrase pair probabilities p(€\|7) are computed from these as described in Koehn et al.
Model Description	For example, if topic 1:: is dominant in T, pk(é\|7) may be quite large, but if p(k:\|V) is very small, then we should steer away from this phrase pair and select a competing phrase pair which may have a lower probability in T, but which is more relevant to the test sentence at hand.

phrase pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: