Index of papers in Proc. ACL 2012 that mention
  • sentence pairs
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
They are manually translated into the other language to produce 7,000 sentence pairs , which are split into two parts: 2,000 pairs as development set (dev) and the other 5,000 pairs as test set (web test).
Experiments
After removing duplicates, we have about 18 million sentence pairs , which contain about 270 millions of English tokens and 320 millions of Japanese tokens.
Experiments
As we do not have access to a golden reordered sentence set, we decide to use the alignment crossing-link numbers between aligned sentence pairs as the measure for reorder performance.
Ranking Model Training
For a sentence pair (6, f, a) with syntax tree Te on the source side, we need to determine which reordered tree Té, best represents the word order in target sentence f. For a tree node 75 in T6, if its children align to disjoint target spans, we can simply arrange them in the order of their corresponding target
Ranking Model Training
Figure 2: Fragment of a sentence pair .
Ranking Model Training
Figure 2 shows a fragment of one sentence pair in our training data.
Word Reordering as Syntax Tree Node Ranking
Figure 1: An English-to-Japanese sentence pair .
sentence pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo
Data and task
A total of 13,410 English-Bulgarian and 8,832 English-Korean sentence pairs were extracted.
Data and task
Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages.
Data and task
Figure 1 illustrates a Bulgarian-English sentence pair with alignment.
Introduction
Our results show that the semi-CRF model improves on the performance of projection models by more than 10 points in F—measure, and that we can achieve tagging F-measure of over 91 using a very small number of annotated sentence pairs .
sentence pairs is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Experiments and Results
The training data contains 81k sentence pairs , 655k Chinese words and 806 English words.
Experiments and Results
The training data contains 354k sentence pairs , 8M Chinese words and 10M English words.
Experiments and Results
re-ranking methods are performed in the same way as for IWSLT data, but for consensus-based decoding, the data set contains too many sentence pairs to be held in one graph for our machine.
Features and Training
For the nodes representing the training sentence pairs , this posterior is fixed.
Graph Construction
If there are sentence pairs with the same source sentence but different translations, all the translations will be assigned as labels to that source sentence, and the corresponding probabilities are estimated by MLE.
Graph Construction
There is no edge between training nodes, since we suppose all the sentences of the training data are correct, and it is pointless to reestimate the confidence of those sentence pairs .
Graph Construction
Forced alignment performs phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding (Wuebker et al., 2010).
sentence pairs is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Experiments
After tokenization and filtering, this bilingual corpus contained 319,694 sentence pairs (7.9M tokens on
Extraction of Paraphrase Rules
3.2 Selecting Paraphrase Sentence Pairs
Extraction of Paraphrase Rules
If the sentence in T 2 has a higher BLEU score than the aligned sentence in T1, the corresponding sentences in S0 and S1 are selected as candidate paraphrase sentence pairs , which are used in the following steps of paraphrase extractions.
Extraction of Paraphrase Rules
From the word-aligned sentence pairs , we then extract a set of rules that are consistent with the word alignments.
Forward-Translation vs. Back-Translation
The aligned sentence pairs in (S0, S1) can be considered as paraphrases.
sentence pairs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Diab, Mona
Evaluation for SS
The two data sets we know of for SS are: l. human-rated sentence pair similarity data set (Li et al., 2006) [L106]; 2. the Microsoft Research Paraphrase Corpus (Dolan et al., 2004) [MSR04].
Evaluation for SS
On the other hand, the MSR04 data set comprises a much larger set of sentence pairs : 4,076 training and 1,725 test pairs.
Evaluation for SS
This is not a problem per se, however the issue is that it is very strict in its assignment of a positive label, for example the following sentence pair as cited in (Islam and Inkpen, 2008) is rated not semantically similar: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.
Experiments and Results
Note that 7“ and p are much lower for 35 pairs set, since most of the sentence pairs have a very low similarity (the average similarity value is 0.065 in 35 pairs set and 0.367 in 30 pairs set) and SS models need to identify the tiny difference among them, thereby rendering this set much harder to predict.
Experiments and Results
We use the same parameter setting used for the L106 evaluation setting since both sets are human-rated sentence pairs (A = 20,10m = 0.01,K = 100).
sentence pairs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef
Experiments
We train our model on a dataset with ~1.5M sentence pairs from the LDC dataset.2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs , respectively) as the test data.
Head-Driven HPB Translation Model
For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007).
Introduction
Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence.
sentence pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: