SciSurf: Index of "sentence pairs" in Proc. ACL 2012

Index of papers in Proc. ACL 2012 that mention

sentence pairs

Seen in text as:

sentence pairs (28)
sentence pair (15)

Seen in 42 sentences in 6 papers.

1. A Ranking-based Approach to Word Reordering for Statistical Machine Translation

Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	They are manually translated into the other language to produce 7,000 sentence pairs , which are split into two parts: 2,000 pairs as development set (dev) and the other 5,000 pairs as test set (web test).
Experiments	After removing duplicates, we have about 18 million sentence pairs , which contain about 270 millions of English tokens and 320 millions of Japanese tokens.
Experiments	As we do not have access to a golden reordered sentence set, we decide to use the alignment crossing-link numbers between aligned sentence pairs as the measure for reorder performance.
Ranking Model Training	For a sentence pair (6, f, a) with syntax tree Te on the source side, we need to determine which reordered tree Té, best represents the word order in target sentence f. For a tree node 75 in T6, if its children align to disjoint target spans, we can simply arrange them in the order of their corresponding target
Ranking Model Training	Figure 2: Fragment of a sentence pair .
Ranking Model Training	Figure 2 shows a fragment of one sentence pair in our training data.
Word Reordering as Syntax Tree Node Ranking	Figure 1: An English-to-Japanese sentence pair .

sentence pairs is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

2. Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and task	A total of 13,410 English-Bulgarian and 8,832 English-Korean sentence pairs were extracted.
Data and task	Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages.
Data and task	Figure 1 illustrates a Bulgarian-English sentence pair with alignment.
Introduction	Our results show that the semi-CRF model improves on the performance of projection models by more than 10 points in F—measure, and that we can achieve tagging F-measure of over 91 using a very small number of annotated sentence pairs .

sentence pairs is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. Learning Translation Consensus with Structured Label Propagation

Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The training data contains 81k sentence pairs , 655k Chinese words and 806 English words.
Experiments and Results	The training data contains 354k sentence pairs , 8M Chinese words and 10M English words.
Experiments and Results	re-ranking methods are performed in the same way as for IWSLT data, but for consensus-based decoding, the data set contains too many sentence pairs to be held in one graph for our machine.
Features and Training	For the nodes representing the training sentence pairs , this posterior is fixed.
Graph Construction	If there are sentence pairs with the same source sentence but different translations, all the translations will be assigned as labels to that source sentence, and the corresponding probabilities are estimated by MLE.
Graph Construction	There is no edge between training nodes, since we suppose all the sentences of the training data are correct, and it is pointless to reestimate the confidence of those sentence pairs .
Graph Construction	Forced alignment performs phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding (Wuebker et al., 2010).

sentence pairs is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

4. Improve SMT Quality with Automatically Extracted Paraphrase Rules

He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	After tokenization and filtering, this bilingual corpus contained 319,694 sentence pairs (7.9M tokens on
Extraction of Paraphrase Rules	3.2 Selecting Paraphrase Sentence Pairs
Extraction of Paraphrase Rules	If the sentence in T 2 has a higher BLEU score than the aligned sentence in T1, the corresponding sentences in S0 and S1 are selected as candidate paraphrase sentence pairs , which are used in the following steps of paraphrase extractions.
Extraction of Paraphrase Rules	From the word-aligned sentence pairs , we then extract a set of rules that are consistent with the word alignments.
Forward-Translation vs. Back-Translation	The aligned sentence pairs in (S0, S1) can be considered as paraphrases.

sentence pairs is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. Modeling Sentences in the Latent Space

Guo, Weiwei and Diab, Mona

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation for SS	The two data sets we know of for SS are: l. human-rated sentence pair similarity data set (Li et al., 2006) [L106]; 2. the Microsoft Research Paraphrase Corpus (Dolan et al., 2004) [MSR04].
Evaluation for SS	On the other hand, the MSR04 data set comprises a much larger set of sentence pairs : 4,076 training and 1,725 test pairs.
Evaluation for SS	This is not a problem per se, however the issue is that it is very strict in its assignment of a positive label, for example the following sentence pair as cited in (Islam and Inkpen, 2008) is rated not semantically similar: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.
Experiments and Results	Note that 7“ and p are much lower for 35 pairs set, since most of the sentence pairs have a very low similarity (the average similarity value is 0.065 in 35 pairs set and 0.367 in 30 pairs set) and SS models need to identify the tiny difference among them, thereby rendering this set much harder to predict.
Experiments and Results	We use the same parameter setting used for the L106 evaluation setting since both sets are human-rated sentence pairs (A = 20,10m = 0.01,K = 100).

sentence pairs is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

6. Head-Driven Hierarchical Phrase-based Translation

Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We train our model on a dataset with ~1.5M sentence pairs from the LDC dataset.2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs , respectively) as the test data.
Head-Driven HPB Translation Model	For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007).
Introduction	Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence.

sentence pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: