Abstract | Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. |
Abstract | By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model. |
Abstract | When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points. |
Introduction | For this, an effective approach is to automatically select and eXpand domain-specific sentence pairs from large scale general-domain parallel corpus. |
Introduction | Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora. |
Introduction | Meanwhile, the translation model measures the translation probability of sentence pair , being used to verify the parallelism of the selected domain-relevant bitext. |
Related Work | (2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models. |
Related Work | Although previous works in data selection (Duh et al., 2013; Koehn and Haddow, 2012; Axelrod et al., 2011; Foster et al., 2010; Yasuda et al., 2008) have gained good performance, the methods which only adopt language models to score the sentence pairs are suboptimal. |
Related Work | The reason is that a sentence pair contains a source language sentence and a target language sentence, while the existing methods are incapable of evaluating the mutual translation probability of sentence pair in the target domain. |
Training Data Selection Methods | We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance. |
Training Data Selection Methods | However, in this paper, we adopt the translation model to evaluate the translation probability of sentence pair and develop a simple but effective variant of translation model to rank the sentence pairs in the general-domain corpus. |
Abstract | The algorithm finds provably exact solutions on 86% of sentence pairs and shows improvements over directional models. |
Conclusion | optimal score, averaged over all sentence pairs . |
Conclusion | The plot shows an average over all sentence pairs . |
Conclusion | We can construct a sentence pair in which I = J = N and e-alignments have infinite cost. |
Experiments | Following past work, the first 150 sentence pairs of the training section are used for evaluation. |
Experiments | With incremental constraints and pruning, we are able to solve over 86% of sentence pairs including many longer and more difficult pairs. |
Experiments | Table 3: The average number of constraints added for sentence pairs where Lagrangian relaxation is not able to find an exact solution. |
Introduction | o Empirically, it is able to find exact solutions on 86% of sentence pairs and is significantly faster than general-purpose solvers. |
Experiments | These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs . |
Experiments | In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances. |
Experiments | In total, the datasets contain nearly 1.1 million sentence pairs . |
Topic Similarity Model with Neural Network | Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them. |
Topic Similarity Model with Neural Network | Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic. |
Topic Similarity Model with Neural Network | Given a parallel sentence pair ( f, e), the DAB learns representations for f and 6 respectively, as zf = g(f) and ze = g(e) in Figure 1. |
A Joint Model with Unlabeled Parallel Text | the sentence pairs may be noisily parallel (or even comparable) instead of fully parallel (Munteanu and Marcu, 2005). |
A Joint Model with Unlabeled Parallel Text | In such noisy cases, the labels (positive or negative) could be different for the two monolingual sentences in a sentence pair . |
A Joint Model with Unlabeled Parallel Text | Although we do not know the exact probability that a sentence pair exhibits the same label, we can approximate it using their translation |
Experimental Setup 4.1 Data Sets and Preprocessing | Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair , and select the 100,000 pairs with the highest translation probabilities.5 |
Experimental Setup 4.1 Data Sets and Preprocessing | We then classify each unlabeled sentence pair by combining the two sentences in each pair into one. |
Experimental Setup 4.1 Data Sets and Preprocessing | 5We removed sentence pairs with an original confidence score (given in the corpus) smaller than 0.98, and also removed the pairs that are too long (more than 60 characters in one sentence) to facilitate Giza++. |
Results and Analysis | Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1). |
Results and Analysis | However, even with only 2,000 unlabeled sentence pairs , the proposed approach still produces large performance gains. |
Results and Analysis | Examination of those sentence pairs in setting 2 for which the two monolingual models still |
Alignment | The idea of forced alignment is to perform a phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding. |
Alignment | Consequently, we can modify Equation 2 to define the best segmentation of a sentence pair as: |
Alignment | The training data that consists of N parallel sentence pairs fn and en for n = l, . |
Introduction | In this method, all phrases of the sentence pair that match constraints given by the alignment are extracted. |
Related Work | When given a bilingual sentence pair , we can usually assume there are a number of equally correct phrase segmentations and corresponding alignments. |
Experiments | The y-aXis denotes the scores for each metric, and the X-aXis denotes the percentage of the highest scoring sentence pairs that are kept. |
Experiments | However, translation models are generally robust to such kinds of errors and can learn good translations even in the presence of imperfect sentence pairs . |
Experiments | Example sentence pairs . |
Parallel Data Extraction | In this process, lexical tables for EN-ZH language pair used by Model 1 were built using the FBIS dataset (LDC2003E14) for both directions, a corpus of 300K sentence pairs from the news domain. |
Parallel Data Extraction | Likewise, for the EN-AR language pair, we use a fraction of the NIST dataset, by removing the data originated from UN, which leads to approximately 1M sentence pairs . |
Parallel Segment Retrieval | This is obviously not our goal, since we would not obtain any useful sentence pairs . |
Parallel Segment Retrieval | It is highest for segmentations that cover all the words in the document (this is desirable since there are many sentence pairs that can be extracted but we want to find the largest sentence pair in the document). |
Experiments and Results | Statistics of corpora, “Ch” denotes Chinese, “En” denotes English, “Sent.” row is the number of sentence pairs , “word” row is the number of words, |
Experiments and Results | Figure 5 illustrates examples of pseudo-words of one Chinese-to-English sentence pair . |
Experiments and Results | SSP has a strong constraint that all parts of a sentence pair should be aligned, so source sentence and target sentence have same length after merging words into |
Introduction | But computational complexity is prohibitively high for the exponentially large number of decompositions of a sentence pair into phrase pairs. |
Introduction | pressions by monotonically segmenting a given Spanish-English sentence pair into bilingual units, where word aligner is also used. |
Searching for Pseudo-words | X and Y are sentence pair and multi-word pairs respectively in this bilingual scenario. |
Searching for Pseudo-words | Pseudo-word pairs of one sentence pair are such pairs that maximize the sum of Span-pairs’ bilingual sequence significances: K pwpf = ARGMAXZkzlSigspan_pm-rk (6) span—patrl |
Searching for Pseudo-words | Searching for pseudo-word pairs pwpIK is equal to bilingual segmentation of a sentence pair into optimal Span-pairIK. |
Abstract | Based on these measures, we improve the alignment quality by selecting high confidence sentence alignments and alignment links from multiple word alignments of the same sentence pair . |
Alignment Link Confidence Measure | Similar to the sentence alignment confidence measure, the confidence of an alignment link aij in the sentence pair (S, T) is defined as |
Alignment Link Confidence Measure | From multiple alignments of the same sentence pair , we select high confidence links from different alignments based on their link confidence scores and alignment agreement ratio. |
Alignment Link Confidence Measure | Suppose the sentence pair (8, T) have alignments A1,. |
Improved MaXEnt Aligner with Confidence-based Link Filtering | 512 sentence pairs, and the A-E alignment test set is the 200 Arabic-English sentence pairs from NIST MT03 test set. |
Introduction | The example in Figure 1 shows the word alignment of the given Chinese and English sentence pair , where the English words following each Chinese word is its literal translation. |
Introduction | In this paper we introduce a confidence measure for word alignment, which is robust to extra or missing words in the bilingual sentence pairs , as well as word alignment errors. |
Sentence Alignment Confidence Measure | Given a bilingual sentence pair (S,T) where S={31,. . |
Sentence Alignment Confidence Measure | We randomly selected 512 Chinese-English (CE) sentence pairs and generated word alignment using the MaxEnt aligner (Ittycheriah and Roukos, 2005). |
Sentence Alignment Confidence Measure | For each sentence pair , we also calculate the sentence alignment confidence score — log 0 (A|S, T). |
Automatic Evaluation Metrics | A uniform average of the counts is then taken as the score for the sentence pair . |
Automatic Evaluation Metrics | Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair . |
Introduction | To match each system item to at most one reference item, we model the items in the sentence pair as nodes in a bipartite graph and use the Kuhn-Munkres algorithm (Kuhn, 1955; Munkres, 1957) to find a maximum weight matching (or alignment) between the items in polynomial time. |
Introduction | Also, metrics such as METEOR determine an alignment between the items of a sentence pair by using heuristics such as the least number of matching crosses. |
Introduction | Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations) gathered from a sentence pair . |
Metric Design Considerations | Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores. |
Metric Design Considerations | To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores. |
Metric Design Considerations | Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all |S | sentence pairs: |
DNN for word alignment | Given a sentence pair (e, f), HMM word alignment takes the following form: |
DNN for word alignment | To decode our model, the lexical translation scores are computed for each source-target word pair in the sentence pair , which requires going through the neural network (|e| >< |f times; after that, the forward-backward algorithm can be used to find the viterbi path as in the classic HMM model. |
Experiments and Results | We use the manually aligned Chinese-English alignment corpus (Haghighi et al., 2009) which contains 491 sentence pairs as test set. |
Experiments and Results | Our parallel corpus contains about 26 million unique sentence pairs in total which are mined from web. |
Training | 1In practice, the number of nonzero parameters in classic HMM model would be much smaller, as many words do not co-occur in bilingual sentence pairs . |
Training | our model from raw sentence pairs , they are too computational demanding as the lexical translation probabilities must be computed from neural networks. |
Training | Hence, we opt for a simpler supervised approach, which learns the model from sentence pairs with word alignment. |
Experiments | They are manually translated into the other language to produce 7,000 sentence pairs , which are split into two parts: 2,000 pairs as development set (dev) and the other 5,000 pairs as test set (web test). |
Experiments | After removing duplicates, we have about 18 million sentence pairs , which contain about 270 millions of English tokens and 320 millions of Japanese tokens. |
Experiments | As we do not have access to a golden reordered sentence set, we decide to use the alignment crossing-link numbers between aligned sentence pairs as the measure for reorder performance. |
Ranking Model Training | For a sentence pair (6, f, a) with syntax tree Te on the source side, we need to determine which reordered tree Té, best represents the word order in target sentence f. For a tree node 75 in T6, if its children align to disjoint target spans, we can simply arrange them in the order of their corresponding target |
Ranking Model Training | Figure 2: Fragment of a sentence pair . |
Ranking Model Training | Figure 2 shows a fragment of one sentence pair in our training data. |
Word Reordering as Syntax Tree Node Ranking | Figure 1: An English-to-Japanese sentence pair . |
Bootstrapping Phrasal ITG from Word-based ITG | Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair . |
Experiments | The training data was a subset of 175K sentence pairs from the NIST Chinese-English training data, automatically selected to maximize character-level overlap with the source side of the test data. |
Experiments | We put a length limit of 35 on both sides, producing a training set of 141K sentence pairs . |
Experiments | Figure 4 examines the difference between EM and VB with varying sparse priors for the word-based model of ITG on the 500 sentence pairs , both after 10 iterations of training. |
Introduction | Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair ; these form the basis of the decoding process. |
Introduction | Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair. |
Phrasal Inversion Transduction Grammar | However, it is easy to show that the maximum likelihood training will lead to the saturated solution where PC = l —each sentence pair is generated by a single phrase spanning the whole sentence. |
Summary of the Pipeline | Then we use the efficient bidirectional tic-tac-toe pruning to prune the bitext space within each of the sentence pairs ; ITG parsing will be carried out on only this this sparse set of bitext cells. |
Variational Bayes for ITG | If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair . |
Experiment Results | The Arabic-English system was trained from 264K sentence pairs with true case English. |
Experiment Results | The Chinese-English system was trained on FBIS corpora of 384K sentence pairs , the English corpus is lower case. |
Experiment Results | The systems were trained on 1.8 million sentence pairs using the Europarl corpora. |
Introduction | From this Hiero derivation, we have a segmentation of the sentence pairs into phrase pairs according to the word alignments, as shown on the left side of Figure 1. |
Phrasal-Hiero Model | In the rule X —> Je X1 [6 Francais ; I X1 french extract from sentence pair in Figure l, the phrase le Frangais connects to the phrase french because the French word Frangais aligns with the English word french even though le is unaligned. |
Phrasal-Hiero Model | Figure 2: Alignment of a sentence pair . |
Phrasal-Hiero Model | For exam-pleintheruler4 = X —> je X1 le X2 ; 2' X1 X2 extracted from the sentence pair in Figure 2, the phrase le is not aligned. |
Data preparation | 2. for each aligned sentence pair (sentences E SS, sentencet E St) in the parallel corpus split (88,875): |
Data preparation | The output of the algorithm in Figure 1 is a modified set of sentence pairs (sentence; sentencet), in which the same sentence pair may be used multiple times with different Ll substitutions for different fragments. |
Evaluation | If output 0 is a subset of reference 7“ then a score of % is assigned for that sentence pair . |
Evaluation | The word accuracy for the entire set is then computed by taking the sum of the word accuracies per sentence pair, divided by the total number of sentence pairs . |
Experiments & Results | The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs. |
Experiments & Results | The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair. |
Experiments & Results | English fallback in a Spanish context, consists of 5, 608, 015 sentence pairs . |
Experiments and Results | The training data contains 81k sentence pairs , 655k Chinese words and 806 English words. |
Experiments and Results | The training data contains 354k sentence pairs , 8M Chinese words and 10M English words. |
Experiments and Results | re-ranking methods are performed in the same way as for IWSLT data, but for consensus-based decoding, the data set contains too many sentence pairs to be held in one graph for our machine. |
Features and Training | For the nodes representing the training sentence pairs , this posterior is fixed. |
Graph Construction | If there are sentence pairs with the same source sentence but different translations, all the translations will be assigned as labels to that source sentence, and the corresponding probabilities are estimated by MLE. |
Graph Construction | There is no edge between training nodes, since we suppose all the sentences of the training data are correct, and it is pointless to reestimate the confidence of those sentence pairs . |
Graph Construction | Forced alignment performs phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding (Wuebker et al., 2010). |
Data and task | A total of 13,410 English-Bulgarian and 8,832 English-Korean sentence pairs were extracted. |
Data and task | Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages. |
Data and task | Figure 1 illustrates a Bulgarian-English sentence pair with alignment. |
Introduction | Our results show that the semi-CRF model improves on the performance of projection models by more than 10 points in F—measure, and that we can achieve tagging F-measure of over 91 using a very small number of annotated sentence pairs . |
Bilingual subtree constraints | To solve the mapping problems, we use a bilingual corpus, which includes sentence pairs , to automatically generate the mapping rules. |
Bilingual subtree constraints | First, the sentence pairs are parsed by monolingual parsers on both sides. |
Bilingual subtree constraints | Figure 8 shows an example of a processed sentence pair that has tree structures on both sides and word alignment links. |
Experiments | Note that some sentence pairs were removed because they are not one-to-one aligned at the sentence level (Burkett and Klein, 2008; Huang et al., 2009). |
Experiments | Word alignments were generated from the Berkeley Aligner (Liang et al., 2006; DeNero and Klein, 2007) trained on a bilingual corpus having approximately 0.8M sentence pairs . |
Motivation | Suppose that we have an input sentence pair as shown in Figure l, where the source sentence is in English, the target is in Chinese, the dashed undirected links are word alignment links, and the directed links between words indicate that they have a (candidate) dependency relation. |
Collocation Model | The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. |
Experiments on Word Alignment | To investigate the quality of the generated word alignments, we randomly selected a subset from the bilingual corpus as test set, including 500 sentence pairs . |
Experiments on Word Alignment | (11), we also manually labeled a development set including 100 sentence pairs , in the same manner as the test set. |
Improving Statistical Bilingual Word Alignment | According to the BWA method, given a bilingual sentence pair E = e11 and F = fl’” , the optimal |
Improving Statistical Bilingual Word Alignment | Thus, the collocation probability of the alignment sequence of a sentence pair can be calculated according to Eq. |
Improving Statistical Bilingual Word Alignment | model to calculate the word alignment probability of a sentence pair , as shown in Eq. |
Basics of ITG Parsing | Larger and larger span pairs are recursively built until the sentence pair is built. |
Basics of ITG Parsing | Figure 1(a) shows one possible derivation for a toy example sentence pair with three words in each sentence. |
Evaluation | The 491 sentence pairs in this dataset are adapted to our own Chinese word segmentation standard. |
Evaluation | 250 sentence pairs are used as training data and the other 241 are test data. |
The DITG Models | The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure. |
The DITG Models | Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment. |
The DPDI Framework | Discriminative approaches to word alignment use manually annotated alignment for sentence pairs . |
The DPDI Framework | Discriminative pruning, however, handles not only a sentence pair but every possible span pair. |
Introduction | Word alignment is the task of identifying corresponding words in sentence pairs . |
Model Definition | Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values. |
Model Definition | The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time. |
Model Definition | Figure l: The structure of our graphical model for a simple sentence pair . |
Model Inference | Moreover, the value of u is specific to a sentence pair . |
Model Inference | Memory requirements are virtually identical to the baseline: only 11 must be stored for each sentence pair as it is being processed, but can then be immediately discarded once alignments are inferred. |
Methodology 2.1 The Problem | Its output is then double-checked and corrected by two experts in bilingual studies, resulting in a data set of 1747 1-1 and 70 1-0 or 0-1 sentence pairs . |
Methodology 2.1 The Problem | | i | i | i | 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Similarity of English sentence pair |
Methodology 2.1 The Problem | The horizontal axis is the similarity of English sentence pairs and the vertical is the similarity of the corresponding pairs in Chinese. |
Abstract | Sentence Filtering Since we do not perform any boilerplate removal in earlier steps, there are many sentence pairs produced by the pipeline which contain menu items or other bits of text which are not useful to an SMT system. |
Abstract | To measure this, we conducted a manual analysis of 200 randomly selected sentence pairs for each of three language pairs. |
Abstract | Table 2: Manual evaluation of precision (by sentence pair ) on the extracted parallel data for Spanish, French, and German (paired with English). |
Adaptive MT Quality Estimation | Our proposed method is as follows: we select a fixed set of sentence pairs (Sq, Rq) to train the QE model. |
Discussion and Conclusion | Another option is to select the sentence pairs from the MT system subsampled training data, which is more similar to the input document thus the trained QE model could be a better match to the input document. |
Document-specific MT System | Our parallel corpora includes tens of millions of sentence pairs covering a wide range of topics. |
Document-specific MT System | The document-specific system is built based on sub-sampling: from the parallel corpora we select sentence pairs that are the most similar to the sentences from the input document, then build the MT system with the sub-sampled sentence pairs . |
Document-specific MT System | From the extracted sentence pairs , we utilize the standard pipeline in SMT system building: word align- |
Introduction | First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs , which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011). |
Static MT Quality Estimation | The high FM phrases are selected from sentence pairs which are closest in terms of n-gram overlap to the input sentence. |
Comparative Study | The interpretation is that given the sentence pair ( f17 , 6;) and its alignment, the correct translation order is 631—762—163,€3—f1,64—f4765—f4,€6—f6—f7767—f5-Notice the bilingual units have been ordered according to the target side, as the decoder writes the translation in a left-to-right way. |
Comparative Study | After the operation in Figure 4 was done for all bilingual sentence pairs , we get a decoding sequence corpus. |
Experiments | Firstly, we delete the sentence pairs if the source sentence length is one. |
Experiments | Secondly, we delete the sentence pairs if the source sentence contains more than three contiguous unaligned words. |
Experiments | When this happens, the sentence pair is usually low quality hence not suitable for learning. |
Tagging-style Reordering Model | The transformation in Figure 1 is conducted for all the sentence pairs in the bilingual training corpus. |
Tagging-style Reordering Model | During the search, a sentence pair ( 1‘], (if) will be formally splitted into a segmentation Sff which consists of K phrase pairs. |
Consensus Decoding Algorithms | 2.1 Minimum Bayes Risk over Sentence Pairs |
Consensus Decoding Algorithms | Algorithm 1 MBR over Sentence Pairs 12 A <— —00 2: for e E E do 3: A6 <— 0 4 for e’ E E do 5: Ae<—A6+P(e’|f)-S(e;e’) 6 7 |
Consensus Decoding Algorithms | MBR over Sentence Pairs MBR over Features |
Experimental Results | A phrase discovery procedure over word-aligned sentence pairs provides rule frequency counts, which are normalized to estimate features on rules. |
Experimental Results | The synchronous grammar rules are extracted from word aligned sentence pairs where the target sentence is annotated with a syntactic parse (Galley et al., 2004). |
Evaluation | (2013), features 200 sentence pairs that were rated for similarity by 43 annotators. |
Evaluation | We also consider a similar data set introduced by Grefenstette (2013), comprising 200 sentence pairs rated by 50 annotators. |
Evaluation | Evaluation is carried out by computing the Spearman correlation between the annotator similarity ratings for the sentence pairs and the cosines of the vectors produced by the various systems for the same sentence pairs . |
Experiments | We randomly selected a development set and a test set, and then the remaining sentence pairs are for training set. |
Experiments | The remaining 28.3% of the sentence pairs are thus not adopted for generating training samples. |
Introduction | They first determine whether the extracted TM sentence pair should be adopted or not. |
Problem Formulation | is the final translation; [tm_s,tm_t,tm_f,s_a,tm_a] are the associated information of the best TM sentence-pair; tm_s and tm_t denote the corresponding TM sentence pair ; tm_f denotes its associated fuzzy match score (from 0.0 to 1.0); 8_a is the editing operations between tm_8 and s; and tm_a denotes the word alignment between tm_s and tmi. |
Problem Formulation | mula (3) is just the typical phrase-based SMT model, and the second factor P(Mk|Lk, 2:) (to be specified in the Section 3) is the information derived from the TM sentence pair . |
Problem Formulation | useful information from the best TM sentence pair to guide SMT decoding. |
Conclusion | Table 2: Evaluation of phrase-based translation from German to English with the obtained alignments (for 100.000 sentence pairs ). |
Introduction | The downside of our method is its resource consumption, but still we present results on corpora with 100.000 sentence pairs . |
See e. g. the author’s course notes (in German), currently | However, since we approximate expectations from the move and swap matrices, and hence by (9((1 + J) - J) alignments per sentence pair , in the end we get a polynomial number of terms. |
See e. g. the author’s course notes (in German), currently | We use MOSES with a 5-gram language model (trained on 500.000 sentence pairs ) and the standard setup in the MOSES Experiment Management System: training is run in both directions, the alignments are combined using diag—grow—final—and (Och and Ney, 2003) and the parameters of MOSES are optimized on 750 development sentences. |
Training the New Variants | sentence pairs 3 = l, . |
Training the New Variants | This task is also needed for the actual task of word alignment (annotating a given sentence pair with an alignment). |
Experiments | After tokenization and filtering, this bilingual corpus contained 319,694 sentence pairs (7.9M tokens on |
Extraction of Paraphrase Rules | 3.2 Selecting Paraphrase Sentence Pairs |
Extraction of Paraphrase Rules | If the sentence in T 2 has a higher BLEU score than the aligned sentence in T1, the corresponding sentences in S0 and S1 are selected as candidate paraphrase sentence pairs , which are used in the following steps of paraphrase extractions. |
Extraction of Paraphrase Rules | From the word-aligned sentence pairs , we then extract a set of rules that are consistent with the word alignments. |
Forward-Translation vs. Back-Translation | The aligned sentence pairs in (S0, S1) can be considered as paraphrases. |
Experiments | Here the training data consists of the non-UN portions and non-HK Hansards portions of the NIST training corpora distributed by the LDC, totalling 303k sentence pairs with 8m and 9.4m words of Chinese and English, respectively. |
Experiments | Overall there are 276k sentence pairs and 8.21m and 8.97m words in Arabic and English, respectively. |
Gibbs Sampling | Specifically we seek to infer the latent sequence of translation decisions given a corpus of sentence pairs . |
Gibbs Sampling | It visits each sentence pair in the corpus in a random order and resamples the alignments for each target position as follows. |
Model | Therefore, we introduce fertility to denote the number of target positions a source word is linked to in a sentence pair . |
Model | where gbj is the fertility of source word fj in the sentence pair < fi],e{ > and p58 is the basic model defined in Eq. |
Experiment | Table l: The sentence pairs used in each data set. |
Experiment | The parameter ’samps’ is set to 5, which indicates 5 samples are generated for a sentence pair . |
Experiment | However, if most domains are similar (FBIS data set) or if there are enough parallel sentence pairs (NIST data set) in each domain, then the translation performances are almost similar even with the opposite integrating orders. |
Introduction | Since SMT systems trend to employ very large scale training data for translation knowledge extraction, updating several sentence pairs each time will be annihilated in the existing corpus. |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | ITG is a synchronous grammar formalism which analyzes bilingual text by introducing inverted rules, and each ITG derivation corresponds to the alignment of a sentence pair (Wu, 1997). |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | Figure 1 (b) illustrates an example of the phrasal ITG derivation for word alignment in Figure l (a) in which a bilingual sentence pair is recursively divided into two through the recursively defined generative story. |
Bilingual NER by Agreement | The inputs to our models are parallel sentence pairs (see Figure 1 for an example in English and |
Bilingual NER by Agreement | Since we assume no bilingually annotated NER corpus is available, in order to get an estimate of the PMI scores, we first tag a collection of unannotated bilingual sentence pairs using the monolingual CRF taggers, and collect counts of aligned entity pairs from this auto- generated tagged data. |
Error Analysis and Discussion | In this example, a snippet of a longer sentence pair is shown with NER and word alignment results. |
Experimental Setup | After discarding sentences with no aligned counterpart, a total of 402 documents and 8,249 parallel sentence pairs were used for evaluation. |
Experimental Setup | Word alignment evaluation is done over the sections of OntoNotes that have matching gold-standard word alignment annotations from GALE Y1Q4 dataset.2 This subset contains 288 documents and 3,391 sentence pairs . |
Experimental Setup | An extra set of 5,000 unannotated parallel sentence pairs are used for |
Conclusions and Discussions | In this paper, we applied a graph-based SSL algorithm to improve the performance of QA task by exploiting unlabeled entailment relations between affirmed question and candidate sentence pairs . |
Experiments | We also used a set of 340 QA-type sentence pairs from RTE02-03 and 195 pairs from RTE04 by converting the hypothesis sentences into question form to create additional set of q/a pairs. |
Experiments | Each of these headline-candidate sentence pairs is used as additional unlabeled q/a pair. |
Experiments | This is due to the fact that we establish two seperate entailment models for copula and non-copula q/a sentence pairs that enables extracting useful information and better representation of the specific data. |
Feature Extraction for Entailment | (2) (QComp) Question component match features: The sentence component analysis is applied on both the affirmed question and the candidate sentence pairs to characterize their semantic components including subject(S), object(O), head (H) and modifiers(M). |
Graph Based Semi-Supervised Learning for Entailment Ranking | Let each data point in X = {531, ..., an}, xi E SW represents information about a question and candidate sentence pair and Y = {3/1, be their output labels. |
Corpus Data and Baseline SMT | The SMT parallel training corpus contains approximately 773K sentence pairs (7.3M English words). |
Corpus Data and Baseline SMT | Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs , 45K words) using BLEU as the tuning metric. |
Corpus Data and Baseline SMT | Finally, we evaluated translation performance on a separate, unseen test set (3,138 sentence pairs , 38K words). |
Experiments | The training corpus includes 1,686,458 sentence pairs . |
Experiments | For example, in the following sentence pair : “lfiifi‘ EF' , filfifflfiié/ET XE Efii fiiifilifl‘]... (in accordance with the tripartite agreement reached by China, Laos and the UNH CR on )...”, even though the tagger can successfully label “Edi/ET XE Efi/UNHCR” as an organization because it is a common Chinese name, English features based on previous GPE contexts still incorrectly predicted “UNH CR” as a GPE name. |
Name-aware MT | Given a parallel sentence pair we first apply Giza++ (Och and Ney, 2003) to align words, and apply this join- |
Name-aware MT | For example, given the following sentence pair: |
Name-aware MT | Both sentence pairs are kept in the combined data to build the translation model. |
Experiments | The parallel training data comprises of 9.6M sentence pairs (206M Chinese and 228M English words). |
Hard rule labeling from word classes | (2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences. |
Hard rule labeling from word classes | Consider the target-tagged example sentence pair: |
Hard rule labeling from word classes | Intuitively, the labeling of initial rules with tags marking the boundary of their target sides results in complex rules whose nonterminal occurrences impose weak syntactic constraints on the rules eligible for substitution in a PSCFG derivation: The left and right boundary word tags of the inserted rule’s target side have to match the respective boundary word tags of the phrase pair that was replaced by a nonterminal when the complex rule was created from a training sentence pair . |
Experiments | To train our Two-Neighbor Orientation model, we select a subset of 5 million aligned sentence pairs . |
Training | For each aligned sentence pair (F, E, N) in the training data, the training starts with the identification of the regions in the source sentences as anchors (A). |
Two-Neighbor Orientation Model | Given an aligned sentence pair 9 = (F, E, N), let A(@) be all possible chunks that can be extracted from 9 according to: 2 |
Two-Neighbor Orientation Model | Figure 1: An aligned Chinese-English sentence pair . |
Two-Neighbor Orientation Model | To be more concrete, let us consider an aligned sentence pair in Fig. |
Evaluation for SS | The two data sets we know of for SS are: l. human-rated sentence pair similarity data set (Li et al., 2006) [L106]; 2. the Microsoft Research Paraphrase Corpus (Dolan et al., 2004) [MSR04]. |
Evaluation for SS | On the other hand, the MSR04 data set comprises a much larger set of sentence pairs : 4,076 training and 1,725 test pairs. |
Evaluation for SS | This is not a problem per se, however the issue is that it is very strict in its assignment of a positive label, for example the following sentence pair as cited in (Islam and Inkpen, 2008) is rated not semantically similar: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft. |
Experiments and Results | Note that 7“ and p are much lower for 35 pairs set, since most of the sentence pairs have a very low similarity (the average similarity value is 0.065 in 35 pairs set and 0.367 in 30 pairs set) and SS models need to identify the tiny difference among them, thereby rendering this set much harder to predict. |
Experiments and Results | We use the same parameter setting used for the L106 evaluation setting since both sets are human-rated sentence pairs (A = 20,10m = 0.01,K = 100). |
Adding agreement constraints | For each sentence pair instance x = (5,13), we find the posterior |
Introduction | The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair . |
Statistical word alignment | Figure 2 shows two examples of word alignment of a sentence pair . |
Word alignment results | Figure 2 shows two machine-generated alignments of a sentence pair . |
Word alignment results | The figure illustrates a problem known as garbage collection (Brown et al., 1993), where rare source words tend to align to many target words, since the probability mass of the rare word translations can be hijacked to fit the sentence pair . |
Inflection prediction models | Figure 1 shows an example of an aligned English-Russian sentence pair : on the source (English) side, POS tags and word dependency structure are indicated by solid arcs. |
Inflection prediction models | Figure 1: Aligned English—Russian sentence pair with syntactic and morphological annotation. |
MT performance results | The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0). |
Machine translation systems and data | These aligned sentence pairs form the training data of the inflection models as well. |
Machine translation systems and data | Dataset sent pairs word tokens (avg/sent) English-Russian |
Substructure Spaces for BTKs | Chinese I English # of Sentence pair 5000 Avg. |
Substructure Spaces for BTKs | We randomly select 300 bilingual sentence pairs from the Chinese-English FBIS corpus with the length S 30 in both the source and target sides. |
Substructure Spaces for BTKs | The selected plain sentence pairs are further parsed by Stanford parser (Klein and Manning, 2003) on both the English and Chinese sides. |
Experimental Results | The MT training data includes 2 million sentence pairs from the parallel corpora released by |
Experimental Results | We append a 300-sentence set, which we have human hand alignment available as reference, to the 2M training sentence pairs before running GIZA++. |
Integrating Empty Categories in Machine Translation | Table 3 listed some of the most frequent English words aligned to *pro* or *PRO* in a Chinese-English parallel corpus with 2M sentence pairs . |
Introduction | A sentence pair observed in the real data is shown in Figure 1 along with the word alignment obtained from an automatic word aligner, where the English subject pronoun |
Model | This way we don’t insist on a single tiling of phrases for a sentence pair , but explicitly model the set of hierarchically nested phrases as defined by an ITG derivation. |
Model | nth—Ktjsas S —> s1g(t) —n§+bs sig(t) —> yield(t) l For every word pair, 6 / f in sentence pair , LU |
Model | This process is then repeated for each sentence pair in the corpus in a random order. |
Related Work | additional constraints on how phrase-pairs can be tiled to produce a sentence pair , and moreover, we seek to model the embedding of phrase-pairs in one another, something not considered by this prior work. |
A Generic Phrase Training Procedure | 1: Train Model-1 and HMM word alignment models 2: for all sentence pair (e, f) do |
A Generic Phrase Training Procedure | Phrase pair filtering is simply thresholding on the final score by comparing to the maximum within the sentence pair . |
Features | Given a phrase pair in a sentence pair , there will be many generative paths that align the source phrase to the target phrase. |
Features | Given a sentence pair , the basic assumption is that if the HMM word alignment model can align an English phrase well to a foreign phrase, the posterior distribution of the English phrase generating all foreign phrases on the other side is significantly biased. |
Experiments | We extracted both development and test data set from years of NIST Chinese-to-English evaluation data by filtering out sentence pairs not containing measure words. |
Experiments | The development set is extracted from NIST evaluation data from 2002 to 2004, and the test set consists of sentence pairs from NIST evaluation data from 2005 to 2006. |
Experiments | There are 759 testing cases for measure word generation in our test data consisting of 2746 sentence pairs . |
Model Training and Application 3.1 Training | We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair . |
Abstract | Using the presented method, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which exceeds 67%. |
Conclusion | Experimental results show that the pivot approach is effective, which extracts over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs . |
Introduction | Using the proposed approach, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which is above 67%. |
Related Work | Thus the method acquired paraphrase patterns from sentence pairs that share comparable NEs. |
Approach | Figure 1(b) shows an aligned sentence pair example where dependencies are perfectly conserved across the alignment. |
Approach | For example, in some sentence pair we might find 10 edges that have both end points aligned and can be transferred. |
Experiments | Our basic model uses constraints of the form: the expected proportion of conserved edges in a sentence pair is at least 77 = 90%.1 |
Posterior Regularization | In grammar transfer, our basic constraint is of the form: the expected proportion of conserved edges in a sentence pair is at least 77 (the exact proportion we used was 0.9, which was determined using unlabeled data as described in Section 5). |
A Unified Semantic Representation | Commonly, semantic comparisons are between word pairs or sentence pairs that do not have their lexical content sense-annotated, despite the potential utility of sense annotation in making semantic comparisons. |
Experiment 1: Textual Similarity | As our benchmark, we selected the recent SemEval-2012 task on Semantic Textual Similarity (STS), which was concerned with measuring the semantic similarity of sentence pairs . |
Experiment 1: Textual Similarity | Each sentence pair in the datasets was given a score from 0 to 5 (low to high similarity) by human judges, with a high inter-annotator agreement of around 0.90 when measured using the Pearson correlation coefficient. |
Experiment 1: Textual Similarity | Table 1 lists the number of sentence pairs in training and test portions of each dataset. |
Conclusion | 9We used about 70k sentence pairs for CE model training, while Wang et a1. |
Conclusion | (2008) used about 100k sentence pairs , a CE translation dictionary and more monolingual corpora for model training. |
Experiments | For English-Spanish translation, we selected 400k sentence pairs from the Europarl corpus that are close to the English parts of both the BTEC CE corpus and the BTEC ES corpus. |
Using RBMT Systems for Pivot Translation | Another way to use the synthetic multilingual corpus is to add the source-pivot or pivot-target sentence pairs in this corpus to the training data to rebuild the source-pivot or pivot-target SMT model. |
Experiments | They are neither parallel nor comparable because we cannot even extract a small number of parallel sentence pairs from this monolingual data using the method of (Munteanu and Marcu, 2006). |
Experiments | For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs , and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword. |
Phrase Pair Refinement and Parameterization | For each entry in LLR-lex, such as ([34], of), we can learn two kinds of information from the out-of-domain word-aligned sentence pairs : one is whether the target translation is before or after the translation of the preceding source-side word (Order); the other is whether the target translation is adjacent with the translation of the preceding source-side word (Adjacency). |
Related Work | For the target-side monolingual data, they just use it to train language model, and for the source-side monolingual data, they employ a baseline (word-based SMT or phrase-based SMT trained with small-scale bitext) to first translate the source sentences, combining the source sentence and its target translation as a bilingual sentence pair, and then train a new phrase-base SMT with these pseudo sentence pairs . |
Evaluation | This corpus consists of 1370 sentence pairs that were manually created from transcribed Broadcast News stories. |
Evaluation | a pattern of compression used many times in the BNC in sentence pairs such as “NPR’s Anne Gar-rels reports” / “Anne Garrels reports”. |
Sentence compression | An example sentence pair , which we use as a running example, is the following: |
The STSG Model | See Figure l for an example of how an STSG with these rules would operate in synchronously generating our example sentence pair . |
Experiments | For all language pairs we employ 200K and 400K sentence pairs for training, 2K for development and 2K for testing (single reference per source sentence). |
Experiments | Table 1 presents the results for the baseline and our method for the 4 language pairs, for training sets of both 200K and 400K sentence pairs . |
Experiments | In addition, increasing the size of the training data from 200K to 400K sentence pairs widens the performance margin between the baseline and our system, in some cases considerably. |
Experiments and Results | The training data contains 81k sentence pairs , 655K Chinese words and 806K English words. |
Model Training | The training samples for RAE are phrase pairs {31, 32} in translation table, where 31 and 32 can form a continuous partial sentence pair in the training data. |
Model Training | Forced decoding performs sentence pair segmentation using the same translation system as decoding. |
Model Training | For each sentence pair in the training data, SMT decoder is applied to the source side, and any candidate which is not the partial substring of the target sentence is removed from the n-best list during decoding. |
Experiments | PWKP contains 108016 / 114924 comple)dsimple sentence pairs . |
Simplification Framework | (2010); and build training graphs (Figure 2) from the pair of complex and simple sentence pairs in the training data. |
Simplification Framework | Each training graph represents a complex-simple sentence pair and consists of two types of nodes: major nodes (M-nodes) and operation nodes (O-nodes). |
Conclusion | Moreover, our dependency-based pre-ordering rule set substantially decreased the time for applying pre-ordering rules about 60% compared with WR07, on the training set of 1M sentences pairs . |
Experiments | Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs . |
Experiments | Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs . |
Complexity Analysis | The computational complexity of our method is linear in the number of iterations, the size of the corpus, and the complexity of calculating the expectations on each sentence or sentence pair . |
Complexity Analysis | This was verified by experiments on a corpus of 1-million sentence pairs on which traditional MCMC approaches would struggle (Xu et al., 2008). |
Methods | (.7-",E) a bilingual sentence pair |
Introduction | Specifically, we show that we can significantly improve reordering performance by using a large number of sentence pairs for which manual word alignments are not available. |
Reordering model | In this paper we focus on the case where in addition to using a relatively small number of manual word aligned sentences to derive the reference permutations 77* used to train our model, we would like to use more abundant but noisier machine aligned sentence pairs . |
Results and Discussions | We use H to refer to the manually word aligned data and U to refer to the additional sentence pairs for which manual word alignments are not available. |
Introduction | For the Chinese-to-English task, the training data is the FBIS corpus (news domain) with about 240k: sentence pairs ; the development set is the NIST02 evaluation data; the development test set is NIST05; and the test datasets are NIST06, and NIST08. |
Introduction | For the Japanese-to-English task, the training data with 300k: sentence pairs is from the NTCIR-patent task (Fujii et al., 2010); the development set, development test set, and two test sets are averagely extracted from a given development set with 4000 sentences, and these four datasets are called testl, test2, test3 and test4, respectively. |
Introduction | We run GIZA++ (Och and Ney, 2000) on the training corpus in both directions (Koehn et al., 2003) to obtain the word alignment for each sentence pair . |
Introduction | Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data? |
Introduction | On the other hand, there is still only modest overlap between the sentences for longer n-grams, particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical. |
Why Does Unsimplified Data Help? | The resulting data set contains 150K aligned simple-normal sentence pairs . |
Experiment | So approximately 2.05 million sentence pairs consisting of approximately 54 million |
Experiment | And approximately 0.49 million sentence pairs consisting of 14.9 million Chinese tokens whose lexicon size was 169k and 16.3 million English tokens whose lexicon size was 240k were used for CE. |
Experiment | Our distortion model was trained as follows: We used 0.2 million sentence pairs and their word alignments from the data used to build the translation model as the training data for our distortion models. |
Experiments | Most training subcorpora consist of parallel sentence pairs . |
Introduction | The resulting bilingual sentence pairs are then used as additional training data (Ueffing et al., 2007; Chen et al., 2008; Schwenk, 2008; Bertoldi and Federico, 2009). |
Introduction | Data selection approaches (Zhao et al., 2004; Hildebrand et al., 2005; Lu et al., 2007; Moore and Lewis, 2010; Axelrod et al., 2011) search for bilingual sentence pairs that are similar to the in-domain “dev” data, then add them to the training data. |
Theoretical Model | In this manner we obtain sentence pairs like the one shown in Figure 3. |
Theoretical Model | To these sentence pairs we apply the rule extraction method of Maletti (2011). |
Theoretical Model | The rules extracted from the sentence pair of Figure 3 are shown in Figure 4. |
Experiments | We train our model on a dataset with ~1.5M sentence pairs from the LDC dataset.2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs , respectively) as the test data. |
Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Introduction | Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence. |
Introduction | Starting with the classic IBM work (Brown et al., 1993), training has been viewed as a maximization problem involving hidden word alignments (a) that are assumed to underlie observed sentence pairs |
Machine Translation as a Decipherment Task | (1993) provide an efficient algorithm for training IBM Model 3 translation model when parallel sentence pairs are available. |
Machine Translation as a Decipherment Task | We see that deciphering with 10k monolingual Spanish sentences yields the same performance as training with around 200-500 parallel English/Spanish sentence pairs . |
Experiments | It contains 239K sentence pairs with about 6.9M/8.9M words in Chi-neseflEnglish. |
Experiments | The alignment matrixes for sentence pairs are generated according to (Liu et al., 2009). |
Projected Classification Instance | Suppose a bilingual sentence pair , composed of a source sentence e and its target translation f. ye is the parse tree of the source sentence. |
Experimental Setup | We trained the system on the NIST MT06 Eval corpus excluding the UN data (approximately 900K sentence pairs ). |
Experimental Setup | We trained the system on a subset of 950K sentence pairs from the NIST MT08 training data, selected by |
Experimental Setup | The subsampling algorithm selects sentence pairs from the training data in a way that seeks reasonable representation for all n-grams appearing in the test set. |
EXpt. 1: Predicting Absolute Scores | Each language consists of 1500—2800 sentence pairs produced by 7—15 MT systems. |
EXpt. 1: Predicting Absolute Scores | RTER has a rather flat learning curve that climbs to within 2 points of the final correlation value for 20% of the training set (about 400 sentence pairs ). |
Textual Entailment vs. MT Evaluation | The average total runtime per sentence pair is 5 seconds on an AMD 2.6GHz Opteron core — efficient enough to perform regular evaluations on development and test sets. |
Experiments | Table 4: Comparison of rule extraction time (seconds/ 1000 sentence pairs ) and decoding time (sec-ond/sentence) |
Experiments | Table 4 gives the rule extraction time (seconds/ 1000 sentence pairs ) and decoding time (sec-ond/sentence) with varying pruning thresholds. |
Experiments | the new training corpus contained about 260K sentence pairs with 7.39M Chinese words and 9.41M English words. |
Cohesive Phrasal Output | Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003). |
Experiments | We test our cohesion-enhanced Moses decoder trained using 688K sentence pairs of Europarl French-English data, provided by the SMT 2006 Shared Task (Koehn and Monz, 2006). |
Experiments | Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU, when measured on all sentence pairs , with improvements ranging between 0.2 and 0.5 absolute points. |