Index of papers in Proc. ACL 2008 that mention

sentence pair

Seen in text as:

sentence pair (34)
sentence pairs (15)

Seen in 49 sentences in 8 papers.

1. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automatic Evaluation Metrics	A uniform average of the counts is then taken as the score for the sentence pair .
Automatic Evaluation Metrics	Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair .
Introduction	To match each system item to at most one reference item, we model the items in the sentence pair as nodes in a bipartite graph and use the Kuhn-Munkres algorithm (Kuhn, 1955; Munkres, 1957) to find a maximum weight matching (or alignment) between the items in polynomial time.
Introduction	Also, metrics such as METEOR determine an alignment between the items of a sentence pair by using heuristics such as the least number of matching crosses.
Introduction	Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations) gathered from a sentence pair .
Metric Design Considerations	Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
Metric Design Considerations	To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores.
Metric Design Considerations	Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all \|S \| sentence pairs:

sentence pair is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

2. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bootstrapping Phrasal ITG from Word-based ITG	Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair .
Experiments	The training data was a subset of 175K sentence pairs from the NIST Chinese-English training data, automatically selected to maximize character-level overlap with the source side of the test data.
Experiments	We put a length limit of 35 on both sides, producing a training set of 141K sentence pairs .
Experiments	Figure 4 examines the difference between EM and VB with varying sparse priors for the word-based model of ITG on the 500 sentence pairs , both after 10 iterations of training.
Introduction	Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair ; these form the basis of the decoding process.
Introduction	Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair.
Phrasal Inversion Transduction Grammar	However, it is easy to show that the maximum likelihood training will lead to the saturated solution where PC = l —each sentence pair is generated by a single phrase spanning the whole sentence.
Summary of the Pipeline	Then we use the efficient bidirectional tic-tac-toe pruning to prune the bitext space within each of the sentence pairs ; ITG parsing will be carried out on only this this sparse set of bitext cells.
Variational Bayes for ITG	If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair .

sentence pair is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

3. Better Alignments = Better Translations?

Ganchev, Kuzman and Graça, João V. and Taskar, Ben

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Adding agreement constraints	For each sentence pair instance x = (5,13), we find the posterior
Introduction	The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair .
Statistical word alignment	Figure 2 shows two examples of word alignment of a sentence pair .
Word alignment results	Figure 2 shows two machine-generated alignments of a sentence pair .
Word alignment results	The figure illustrates a problem known as garbage collection (Brown et al., 1993), where rare source words tend to align to many target words, since the probability mass of the rare word translations can be hijacked to fit the sentence pair .

sentence pair is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

4. Applying Morphology Generation Models to Machine Translation

Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inflection prediction models	Figure 1 shows an example of an aligned English-Russian sentence pair : on the source (English) side, POS tags and word dependency structure are indicated by solid arcs.
Inflection prediction models	Figure 1: Aligned English—Russian sentence pair with syntactic and morphological annotation.
MT performance results	The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0).
Machine translation systems and data	These aligned sentence pairs form the training data of the inflection models as well.
Machine translation systems and data	Dataset sent pairs word tokens (avg/sent) English-Russian

sentence pair is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

5. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Generic Phrase Training Procedure	1: Train Model-1 and HMM word alignment models 2: for all sentence pair (e, f) do
A Generic Phrase Training Procedure	Phrase pair filtering is simply thresholding on the final score by comparing to the maximum within the sentence pair .
Features	Given a phrase pair in a sentence pair , there will be many generative paths that align the source phrase to the target phrase.
Features	Given a sentence pair , the basic assumption is that if the HMM word alignment model can align an English phrase well to a foreign phrase, the posterior distribution of the English phrase generating all foreign phrases on the other side is significantly biased.

sentence pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

6. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We extracted both development and test data set from years of NIST Chinese-to-English evaluation data by filtering out sentence pairs not containing measure words.
Experiments	The development set is extracted from NIST evaluation data from 2002 to 2004, and the test set consists of sentence pairs from NIST evaluation data from 2005 to 2006.
Experiments	There are 759 testing cases for measure word generation in our test data consisting of 2746 sentence pairs .
Model Training and Application 3.1 Training	We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair .

sentence pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

7. Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora

Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using the presented method, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which exceeds 67%.
Conclusion	Experimental results show that the pivot approach is effective, which extracts over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs .
Introduction	Using the proposed approach, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which is above 67%.
Related Work	Thus the method acquired paraphrase patterns from sentence pairs that share comparable NEs.

sentence pair is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Cohesive Phrase-Based Decoding for Statistical Machine Translation

Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cohesive Phrasal Output	Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003).
Experiments	We test our cohesion-enhanced Moses decoder trained using 688K sentence pairs of Europarl French-English data, provided by the SMT 2006 Shared Task (Koehn and Monz, 2006).
Experiments	Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU, when measured on all sentence pairs , with improvements ranging between 0.2 and 0.5 absolute points.

sentence pair is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

subtrees (11)
phrase-based (10)
BLEU (9)