Index of papers in Proc. ACL 2014 that mention
  • sentence pairs
Chang, Yin-Wen and Rush, Alexander M. and DeNero, John and Collins, Michael
Abstract
The algorithm finds provably exact solutions on 86% of sentence pairs and shows improvements over directional models.
Conclusion
optimal score, averaged over all sentence pairs .
Conclusion
The plot shows an average over all sentence pairs .
Conclusion
We can construct a sentence pair in which I = J = N and e-alignments have infinite cost.
Experiments
Following past work, the first 150 sentence pairs of the training section are used for evaluation.
Experiments
With incremental constraints and pruning, we are able to solve over 86% of sentence pairs including many longer and more difficult pairs.
Experiments
Table 3: The average number of constraints added for sentence pairs where Lagrangian relaxation is not able to find an exact solution.
Introduction
o Empirically, it is able to find exact solutions on 86% of sentence pairs and is significantly faster than general-purpose solvers.
sentence pairs is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs .
Experiments
In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
Experiments
In total, the datasets contain nearly 1.1 million sentence pairs .
Topic Similarity Model with Neural Network
Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
Topic Similarity Model with Neural Network
Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.
Topic Similarity Model with Neural Network
Given a parallel sentence pair ( f, e), the DAB learns representations for f and 6 respectively, as zf = g(f) and ze = g(e) in Figure 1.
sentence pairs is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Abstract
By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model.
Abstract
When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points.
Introduction
For this, an effective approach is to automatically select and eXpand domain-specific sentence pairs from large scale general-domain parallel corpus.
Introduction
Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
Introduction
Meanwhile, the translation model measures the translation probability of sentence pair , being used to verify the parallelism of the selected domain-relevant bitext.
Related Work
(2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models.
Related Work
Although previous works in data selection (Duh et al., 2013; Koehn and Haddow, 2012; Axelrod et al., 2011; Foster et al., 2010; Yasuda et al., 2008) have gained good performance, the methods which only adopt language models to score the sentence pairs are suboptimal.
Related Work
The reason is that a sentence pair contains a source language sentence and a target language sentence, while the existing methods are incapable of evaluating the mutual translation probability of sentence pair in the target domain.
Training Data Selection Methods
We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance.
Training Data Selection Methods
However, in this paper, we adopt the translation model to evaluate the translation probability of sentence pair and develop a simple but effective variant of translation model to rank the sentence pairs in the general-domain corpus.
sentence pairs is mentioned in 35 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Data preparation
2. for each aligned sentence pair (sentences E SS, sentencet E St) in the parallel corpus split (88,875):
Data preparation
The output of the algorithm in Figure 1 is a modified set of sentence pairs (sentence; sentencet), in which the same sentence pair may be used multiple times with different Ll substitutions for different fragments.
Evaluation
If output 0 is a subset of reference 7“ then a score of % is assigned for that sentence pair .
Evaluation
The word accuracy for the entire set is then computed by taking the sum of the word accuracies per sentence pair, divided by the total number of sentence pairs .
Experiments & Results
The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs.
Experiments & Results
The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair.
Experiments & Results
English fallback in a Spanish context, consists of 5, 608, 015 sentence pairs .
sentence pairs is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Adaptive MT Quality Estimation
Our proposed method is as follows: we select a fixed set of sentence pairs (Sq, Rq) to train the QE model.
Discussion and Conclusion
Another option is to select the sentence pairs from the MT system subsampled training data, which is more similar to the input document thus the trained QE model could be a better match to the input document.
Document-specific MT System
Our parallel corpora includes tens of millions of sentence pairs covering a wide range of topics.
Document-specific MT System
The document-specific system is built based on sub-sampling: from the parallel corpora we select sentence pairs that are the most similar to the sentences from the input document, then build the MT system with the sub-sampled sentence pairs .
Document-specific MT System
From the extracted sentence pairs , we utilize the standard pipeline in SMT system building: word align-
Introduction
First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs , which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011).
Static MT Quality Estimation
The high FM phrases are selected from sentence pairs which are closest in terms of n-gram overlap to the input sentence.
sentence pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Paperno, Denis and Pham, Nghia The and Baroni, Marco
Evaluation
(2013), features 200 sentence pairs that were rated for similarity by 43 annotators.
Evaluation
We also consider a similar data set introduced by Grefenstette (2013), comprising 200 sentence pairs rated by 50 annotators.
Evaluation
Evaluation is carried out by computing the Spearman correlation between the annotator similarity ratings for the sentence pairs and the cosines of the vectors produced by the various systems for the same sentence pairs .
sentence pairs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Experiments and Results
The training data contains 81k sentence pairs , 655K Chinese words and 806K English words.
Model Training
The training samples for RAE are phrase pairs {31, 32} in translation table, where 31 and 32 can form a continuous partial sentence pair in the training data.
Model Training
Forced decoding performs sentence pair segmentation using the same translation system as decoding.
Model Training
For each sentence pair in the training data, SMT decoder is applied to the source side, and any candidate which is not the partial substring of the target sentence is removed from the n-best list during decoding.
sentence pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Conclusion
Moreover, our dependency-based pre-ordering rule set substantially decreased the time for applying pre-ordering rules about 60% compared with WR07, on the training set of 1M sentences pairs .
Experiments
Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs .
Experiments
Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs .
sentence pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Narayan, Shashi and Gardent, Claire
Experiments
PWKP contains 108016 / 114924 comple)dsimple sentence pairs .
Simplification Framework
(2010); and build training graphs (Figure 2) from the pair of complex and simple sentence pairs in the training data.
Simplification Framework
Each training graph represents a complex-simple sentence pair and consists of two types of nodes: major nodes (M-nodes) and operation nodes (O-nodes).
sentence pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Complexity Analysis
The computational complexity of our method is linear in the number of iterations, the size of the corpus, and the complexity of calculating the expectations on each sentence or sentence pair .
Complexity Analysis
This was verified by experiments on a corpus of 1-million sentence pairs on which traditional MCMC approaches would struggle (Xu et al., 2008).
Methods
(.7-",E) a bilingual sentence pair
sentence pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: