Index of papers in Proc. ACL 2008 that mention
  • sentence pair
Chan, Yee Seng and Ng, Hwee Tou
Automatic Evaluation Metrics
A uniform average of the counts is then taken as the score for the sentence pair .
Automatic Evaluation Metrics
Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair .
Introduction
To match each system item to at most one reference item, we model the items in the sentence pair as nodes in a bipartite graph and use the Kuhn-Munkres algorithm (Kuhn, 1955; Munkres, 1957) to find a maximum weight matching (or alignment) between the items in polynomial time.
Introduction
Also, metrics such as METEOR determine an alignment between the items of a sentence pair by using heuristics such as the least number of matching crosses.
Introduction
Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations) gathered from a sentence pair .
Metric Design Considerations
Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
Metric Design Considerations
To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores.
Metric Design Considerations
Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all |S | sentence pairs:
sentence pair is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel
Bootstrapping Phrasal ITG from Word-based ITG
Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair .
Experiments
The training data was a subset of 175K sentence pairs from the NIST Chinese-English training data, automatically selected to maximize character-level overlap with the source side of the test data.
Experiments
We put a length limit of 35 on both sides, producing a training set of 141K sentence pairs .
Experiments
Figure 4 examines the difference between EM and VB with varying sparse priors for the word-based model of ITG on the 500 sentence pairs , both after 10 iterations of training.
Introduction
Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair ; these form the basis of the decoding process.
Introduction
Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair.
Phrasal Inversion Transduction Grammar
However, it is easy to show that the maximum likelihood training will lead to the saturated solution where PC = l —each sentence pair is generated by a single phrase spanning the whole sentence.
Summary of the Pipeline
Then we use the efficient bidirectional tic-tac-toe pruning to prune the bitext space within each of the sentence pairs ; ITG parsing will be carried out on only this this sparse set of bitext cells.
Variational Bayes for ITG
If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair .
sentence pair is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Adding agreement constraints
For each sentence pair instance x = (5,13), we find the posterior
Introduction
The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair .
Statistical word alignment
Figure 2 shows two examples of word alignment of a sentence pair .
Word alignment results
Figure 2 shows two machine-generated alignments of a sentence pair .
Word alignment results
The figure illustrates a problem known as garbage collection (Brown et al., 1993), where rare source words tend to align to many target words, since the probability mass of the rare word translations can be hijacked to fit the sentence pair .
sentence pair is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Inflection prediction models
Figure 1 shows an example of an aligned English-Russian sentence pair : on the source (English) side, POS tags and word dependency structure are indicated by solid arcs.
Inflection prediction models
Figure 1: Aligned English—Russian sentence pair with syntactic and morphological annotation.
MT performance results
The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0).
Machine translation systems and data
These aligned sentence pairs form the training data of the inflection models as well.
Machine translation systems and data
Dataset sent pairs word tokens (avg/sent) English-Russian
sentence pair is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
1: Train Model-1 and HMM word alignment models 2: for all sentence pair (e, f) do
A Generic Phrase Training Procedure
Phrase pair filtering is simply thresholding on the final score by comparing to the maximum within the sentence pair .
Features
Given a phrase pair in a sentence pair , there will be many generative paths that align the source phrase to the target phrase.
Features
Given a sentence pair , the basic assumption is that if the HMM word alignment model can align an English phrase well to a foreign phrase, the posterior distribution of the English phrase generating all foreign phrases on the other side is significantly biased.
sentence pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Experiments
We extracted both development and test data set from years of NIST Chinese-to-English evaluation data by filtering out sentence pairs not containing measure words.
Experiments
The development set is extracted from NIST evaluation data from 2002 to 2004, and the test set consists of sentence pairs from NIST evaluation data from 2005 to 2006.
Experiments
There are 759 testing cases for measure word generation in our test data consisting of 2746 sentence pairs .
Model Training and Application 3.1 Training
We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair .
sentence pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Abstract
Using the presented method, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which exceeds 67%.
Conclusion
Experimental results show that the pivot approach is effective, which extracts over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs .
Introduction
Using the proposed approach, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which is above 67%.
Related Work
Thus the method acquired paraphrase patterns from sentence pairs that share comparable NEs.
sentence pair is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cherry, Colin
Cohesive Phrasal Output
Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003).
Experiments
We test our cohesion-enhanced Moses decoder trained using 688K sentence pairs of Europarl French-English data, provided by the SMT 2006 Shared Task (Koehn and Monz, 2006).
Experiments
Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU, when measured on all sentence pairs , with improvements ranging between 0.2 and 0.5 absolute points.
sentence pair is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: