Abstract | This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. |
Abstract | Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. |
Abstract | Thus, we obtain a decoding feature whose value represents the phrase pair’s closeness to the dev. |
Introduction | typically use a rich feature set to decide on weights for the training data, at the sentence or phrase pair level. |
Introduction | As in (Foster et al., 2010), this approach works at the level of phrase pairs . |
Introduction | Instead of using word-based features and a computationally expensive training procedure, we capture the distributional properties of each phrase pair directly, representing it as a vector in a space which also contains a representation of the dev set. |
Corpus Data and Baseline SMT | We used this corpus to extract translation phrase pairs from bidirectional IBM Model 4 word alignment (Och and Ney, 2003) based on the heuristic approach of (Koehn et al., 2003). |
Corpus Data and Baseline SMT | Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric. |
Corpus Data and Baseline SMT | All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances. |
Incremental Topic-Based Adaptation | Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality. |
Incremental Topic-Based Adaptation | Additionally, the SMT phrase table tracks, for each phrase pair, the set of parent training conversations (including the “background conversation”) from which that phrase pair originated. |
Incremental Topic-Based Adaptation | Using this information, the decoder evaluates, for each candidate phrase pair |
Introduction | As the conversation progresses, however, the gradual accumulation of contextual information can be used to infer the topic(s) of discussion, and to deploy contextually appropriate translation phrase pairs . |
Introduction | Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model. |
Relation to Prior Work | Rather, we compute a similarity between the current conversation history and each of the training conversations, and use this measure to dynamically score the relevance of candidate translation phrase pairs during decoding. |
Introduction | However, these phrase pairs with gaps do not capture structure reordering as do Hiero rules with nonterminal mappings. |
Introduction | From this Hiero derivation, we have a segmentation of the sentence pairs into phrase pairs according to the word alignments, as shown on the left side of Figure 1. |
Introduction | Ordering these phrase pairs according the word sequence on the target side, shown on the right side of Figure l, we have a phrase-based translation path consisting of four phrase pairs : (je, , (ne . |
Phrasal-Hiero Model | Training: Represent each rule as a sequence of phrase pairs and nonterminals. |
Phrasal-Hiero Model | Decoding: Use the rules’ sequences of phrase pairs and nonterminals to find the corresponding phrase-based path of a Hiero derivation and calculate its feature scores. |
Phrasal-Hiero Model | 2.1 Map Rule to A Sequence of Phrase Pairs and Nonterminals |
Introduction | The main idea of our method is to divide the phrase-level translation rule induction into two steps: bilingual lexicon induction and phrase pair induction. |
Introduction | Since many researchers have studied the bilingual lexicon induction, in this paper, we mainly concentrate ourselves on phrase pair induction given a probabilistic bilingual lexicon and two in-domain large monolingual data (source and target language). |
Introduction | In addition, we will further introduce how to refine the induced phrase pairs and estimate the parameters of the induced phrase pairs , such as four standard translation features and phrase reordering feature used in the conventional phrase-based models (Koehn et al., 2007). |
What Can We Learn with Phrase Pair Induction? | Readers may doubt that if phrase pair induction is performed only using bilingual lexicon and monolingual data, what new translation knowledge can be learned? |
What Can We Learn with Phrase Pair Induction? | In contrast, phrase pair induction can make up for this deficiency to some extent. |
What Can We Learn with Phrase Pair Induction? | From the induced phrase pairs with our method, we have conducted a deep analysis and find that we can learn three kinds of new translation knowledge: 1) word reordering in a phrase pair ; 2) idioms; and 3) unknown word translations. |
Introduction | Phrase pair extraction, the key step to discover translation knowledge, heavily relies on the scale of training data. |
Introduction | Typically, the more parallel corpora used, the more phrase pairs and more accurate parameters will be learned, which can obviously be beneficial to improving translation performances. |
Introduction | First, phrase pairs are extracted from each domain without interfering with other domains. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | More formally, P((e, f >; 635, 67;) are the probability of phrase pairs (6, f >, which is parameterized by a phrase pair distribution 67; and a symbol distribution 635. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | where d is the discount parameter, s is the strength parameter, and , and Pdac is a prior probability which acts as a fallback probability when a phrase pair is not in the model. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | Under this model, the probability for a phrase pair found in a bilingual corpus (E, F) can be represented by the following equation using the Chinese restaurant process (Teh, 2006): |
Related Work | Then al-1 the phrase pairs and features are tuned together with different weights during decoding. |
Related Work | Classification-based methods must at least add an explicit label to indicate which domain the current phrase pair comes from. |
Related Work | (2012) employed a feature-based approach, in which phrase pairs are enriched |
Comparative Study | We count how often each extracted phrase pair is found with each of the three reordering types. |
Comparative Study | The bilingual sequence of phrase pairs will be extracted using the same strategy in Figure 4. |
Comparative Study | Suppose the search state is now extended with a new phrase pair (1216). |
Tagging-style Reordering Model | During the search, a sentence pair ( 1‘], (if) will be formally splitted into a segmentation Sff which consists of K phrase pairs . |
Tagging-style Reordering Model | Suppose the search~ state is ~now extended with a new phrase pair (fk,ék): fk, :2 fbk . |
Tagging-style Reordering Model | We also have the word alignment within the new phrase pair , which is stored during the phrase extraction process. |
Introduction | Moreover, syntax-based approaches often suffer from the rule coverage problem since syntactic constraints rule out a large portion of non-syntactic phrase pairs , which might help decoders generalize well to unseen data (Marcu et al., 2006). |
Introduction | The basic unit of translation in our model is string-to-dependency phrase pair , which consists of a phrase on the source side and a dependency structure on the target side. |
Introduction | The algorithm generates well-formed dependency structures for partial translations left-to-right using string-to-dependency phrase pairs . |
Translation Model Architecture | For phrase pairs which are not found, C(E, f) and C(f) are initially set to 0. |
Translation Model Architecture | Note that C(E) is potentially incorrect at this point, since a phrase pair not being found does not entail that C(E) is 0. |
Translation Model Architecture | 4We prune the tables to the most frequent 50 phrase pairs per source phrase before combining them, since calculating the features for all phrase pairs of very common source phrases causes a significant slowdown. |
Experimental Results | One is an re-imp1ementation of Hiero (Chiang, 2005), and the other is a hybrid syntax-based tree-to-string system (Zhao and Al-onaizan, 2008), where normal phrase pairs and Hiero rules are used as a backoff for tree-to-string rules. |
Experimental Results | Next we extract phrase pairs , Hiero rules and tree-to-string rules from the original word alignment and the improved word alignment, and tune all the feature weights on the tuning set. |
Integrating Empty Categories in Machine Translation | For example, for a hierarchical MT system, some phrase pairs and Hiero (Chiang, 2005) rules can be extracted with recovered *pro* and *PRO* at the Chinese side. |
Integrating Empty Categories in Machine Translation | For each phrase pair , Hiero rule or tree-to-string rule in the MT system, a binary feature fk, fires if there exists a *pro* on the source side and it aligns to one of its most frequently aligned target words found in the training corpus. |
Integrating Empty Categories in Machine Translation | The motivation for such sparse features is to reward those phrase pairs and rules that have highly confident lexical pairs specifically related to ECs, and penalize those who don’t have such lexical pairs. |
Methods | 0 f3 (7“, C) = lHWithin(7“, c)|/|Hwithin(c)|: This is a variation of f2 that considers only noun phrase pairs shared at least once by relations in c. |
Methods | o f4(7“, C) = lHWithin(7“, c)|/|Hshared(c)|: ThlS lS a variation of f 2 that considers only noun phrase pairs shared at least once by any pair of relations. |
Methods | where Hwithino“, c) = UMEC H(7“, 0) fl H(7“,7“*), the intersection, considering translation, of H (7“) and noun phrase pairs shared at once by relations in c, Hwithin(c) = UMEC H(7“*,c — {r*}), and Hshared(c) = Ur*€REURC [{(7~*7 c), the noun phrase pairs shared at once by any relations. |
Abstract | Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering. |
Abstract | This mechanism implicitly supports not only traditional phrase pairs , but also gapping phrases which are nonconsecutive in the source. |
Experiments | In contrast, our model better aligns the function words, such that many more useful phrase pairs can be extracted, i.e., <7£, ’m>, <ilZ, looking for>, <l€§l% fig, grill-type> and their combinations with neighbouring phrase pairs . |
Related Work | UOM Views reordering as a process of generating (19, 0) in a left-to-right fashion, where b is the current phrase pair and 0 is the orientation of b with the preViously generated phrase pair 19’. |
Related Work | Tillmann and Zhang (2007) proposed a Bigram Orientation Model (BOM) to include both phrase pairs (1) and (9’) into the model. |
Two-Neighbor Orientation Model | 2We represent a chunk as a source and target phrase pair (fjf/BZ) where the subscript and the superscript indicate the |