SciSurf: Index of "sentence pairs" in Proc. ACL 2014

Index of papers in Proc. ACL 2014 that mention

sentence pairs

Seen in text as:

sentence pairs (72)
sentence pair (29)

Seen in 94 sentences in 10 papers.

1. A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Chang, Yin-Wen and Rush, Alexander M. and DeNero, John and Collins, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The algorithm finds provably exact solutions on 86% of sentence pairs and shows improvements over directional models.
Conclusion	optimal score, averaged over all sentence pairs .
Conclusion	The plot shows an average over all sentence pairs .
Conclusion	We can construct a sentence pair in which I = J = N and e-alignments have infinite cost.
Experiments	Following past work, the first 150 sentence pairs of the training section are used for evaluation.
Experiments	With incremental constraints and pruning, we are able to solve over 86% of sentence pairs including many longer and more difficult pairs.
Experiments	Table 3: The average number of constraints added for sentence pairs where Lagrangian relaxation is not able to find an exact solution.
Introduction	o Empirically, it is able to find exact solutions on 86% of sentence pairs and is significantly faster than general-purpose solvers.

sentence pairs is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs .
Experiments	In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
Experiments	In total, the datasets contain nearly 1.1 million sentence pairs .
Topic Similarity Model with Neural Network	Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
Topic Similarity Model with Neural Network	Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.
Topic Similarity Model with Neural Network	Given a parallel sentence pair ( f, e), the DAB learns representations for f and 6 respectively, as zf = g(f) and ze = g(e) in Figure 1.

sentence pairs is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. Effective Selection of Translation Model Training Data

Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Abstract	By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model.
Abstract	When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points.
Introduction	For this, an effective approach is to automatically select and eXpand domain-specific sentence pairs from large scale general-domain parallel corpus.
Introduction	Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
Introduction	Meanwhile, the translation model measures the translation probability of sentence pair , being used to verify the parallelism of the selected domain-relevant bitext.
Related Work	(2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models.
Related Work	Although previous works in data selection (Duh et al., 2013; Koehn and Haddow, 2012; Axelrod et al., 2011; Foster et al., 2010; Yasuda et al., 2008) have gained good performance, the methods which only adopt language models to score the sentence pairs are suboptimal.
Related Work	The reason is that a sentence pair contains a source language sentence and a target language sentence, while the existing methods are incapable of evaluating the mutual translation probability of sentence pair in the target domain.
Training Data Selection Methods	We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance.
Training Data Selection Methods	However, in this paper, we adopt the translation model to evaluate the translation probability of sentence pair and develop a simple but effective variant of translation model to rank the sentence pairs in the general-domain corpus.

sentence pairs is mentioned in 35 sentences in this paper.

Topics mentioned in this paper:

4. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data preparation	2. for each aligned sentence pair (sentences E SS, sentencet E St) in the parallel corpus split (88,875):
Data preparation	The output of the algorithm in Figure 1 is a modified set of sentence pairs (sentence; sentencet), in which the same sentence pair may be used multiple times with different Ll substitutions for different fragments.
Evaluation	If output 0 is a subset of reference 7“ then a score of % is assigned for that sentence pair .
Evaluation	The word accuracy for the entire set is then computed by taking the sum of the word accuracies per sentence pair, divided by the total number of sentence pairs .
Experiments & Results	The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs.
Experiments & Results	The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair.
Experiments & Results	English fallback in a Spanish context, consists of 5, 608, 015 sentence pairs .

sentence pairs is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Adaptive HTER Estimation for Document-Specific MT Post-Editing

Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Adaptive MT Quality Estimation	Our proposed method is as follows: we select a fixed set of sentence pairs (Sq, Rq) to train the QE model.
Discussion and Conclusion	Another option is to select the sentence pairs from the MT system subsampled training data, which is more similar to the input document thus the trained QE model could be a better match to the input document.
Document-specific MT System	Our parallel corpora includes tens of millions of sentence pairs covering a wide range of topics.
Document-specific MT System	The document-specific system is built based on sub-sampling: from the parallel corpora we select sentence pairs that are the most similar to the sentences from the input document, then build the MT system with the sub-sampled sentence pairs .
Document-specific MT System	From the extracted sentence pairs , we utilize the standard pipeline in SMT system building: word align-
Introduction	First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs , which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011).
Static MT Quality Estimation	The high FM phrases are selected from sentence pairs which are closest in terms of n-gram overlap to the input sentence.

sentence pairs is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. A practical and linguistically-motivated approach to compositional distributional semantics

Paperno, Denis and Pham, Nghia The and Baroni, Marco

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	(2013), features 200 sentence pairs that were rated for similarity by 43 annotators.
Evaluation	We also consider a similar data set introduced by Grefenstette (2013), comprising 200 sentence pairs rated by 50 annotators.
Evaluation	Evaluation is carried out by computing the Spearman correlation between the annotator similarity ratings for the sentence pairs and the cosines of the vectors produced by the various systems for the same sentence pairs .

sentence pairs is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

7. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The training data contains 81k sentence pairs , 655K Chinese words and 806K English words.
Model Training	The training samples for RAE are phrase pairs {31, 32} in translation table, where 31 and 32 can form a continuous partial sentence pair in the training data.
Model Training	Forced decoding performs sentence pair segmentation using the same translation system as decoding.
Model Training	For each sentence pair in the training data, SMT decoder is applied to the source side, and any candidate which is not the partial substring of the target sentence is removed from the n-best list during decoding.

sentence pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Dependency-based Pre-ordering for Chinese-English Machine Translation

Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Moreover, our dependency-based pre-ordering rule set substantially decreased the time for applying pre-ordering rules about 60% compared with WR07, on the training set of 1M sentences pairs .
Experiments	Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs .
Experiments	Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs .

sentence pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. Hybrid Simplification using Deep Semantics and Machine Translation

Narayan, Shashi and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	PWKP contains 108016 / 114924 comple)dsimple sentence pairs .
Simplification Framework	(2010); and build training graphs (Figure 2) from the pair of complex and simple sentence pairs in the training data.
Simplification Framework	Each training graph represents a complex-simple sentence pair and consists of two types of nodes: major nodes (M-nodes) and operation nodes (O-nodes).

sentence pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

10. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Complexity Analysis	The computational complexity of our method is linear in the number of iterations, the size of the corpus, and the complexity of calculating the expectations on each sentence or sentence pair .
Complexity Analysis	This was verified by experiments on a corpus of 1-million sentence pairs on which traditional MCMC approaches would struggle (Xu et al., 2008).
Methods	(.7-",E) a bilingual sentence pair

sentence pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)