Automatic Evaluation Metrics | A uniform average of the counts is then taken as the score for the sentence pair . |
Automatic Evaluation Metrics | Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair . |
Introduction | To match each system item to at most one reference item, we model the items in the sentence pair as nodes in a bipartite graph and use the Kuhn-Munkres algorithm (Kuhn, 1955; Munkres, 1957) to find a maximum weight matching (or alignment) between the items in polynomial time. |
Introduction | Also, metrics such as METEOR determine an alignment between the items of a sentence pair by using heuristics such as the least number of matching crosses. |
Introduction | Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations) gathered from a sentence pair . |
Metric Design Considerations | Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores. |
Metric Design Considerations | To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores. |
Metric Design Considerations | Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all |S | sentence pairs: |
Bootstrapping Phrasal ITG from Word-based ITG | Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair . |
Experiments | The training data was a subset of 175K sentence pairs from the NIST Chinese-English training data, automatically selected to maximize character-level overlap with the source side of the test data. |
Experiments | We put a length limit of 35 on both sides, producing a training set of 141K sentence pairs . |
Experiments | Figure 4 examines the difference between EM and VB with varying sparse priors for the word-based model of ITG on the 500 sentence pairs , both after 10 iterations of training. |
Introduction | Finally, the set of phrases consistent with the word alignments are extracted from every sentence pair ; these form the basis of the decoding process. |
Introduction | Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair. |
Phrasal Inversion Transduction Grammar | However, it is easy to show that the maximum likelihood training will lead to the saturated solution where PC = l —each sentence pair is generated by a single phrase spanning the whole sentence. |
Summary of the Pipeline | Then we use the efficient bidirectional tic-tac-toe pruning to prune the bitext space within each of the sentence pairs ; ITG parsing will be carried out on only this this sparse set of bitext cells. |
Variational Bayes for ITG | If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair . |
Adding agreement constraints | For each sentence pair instance x = (5,13), we find the posterior |
Introduction | The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair . |
Statistical word alignment | Figure 2 shows two examples of word alignment of a sentence pair . |
Word alignment results | Figure 2 shows two machine-generated alignments of a sentence pair . |
Word alignment results | The figure illustrates a problem known as garbage collection (Brown et al., 1993), where rare source words tend to align to many target words, since the probability mass of the rare word translations can be hijacked to fit the sentence pair . |
Inflection prediction models | Figure 1 shows an example of an aligned English-Russian sentence pair : on the source (English) side, POS tags and word dependency structure are indicated by solid arcs. |
Inflection prediction models | Figure 1: Aligned English—Russian sentence pair with syntactic and morphological annotation. |
MT performance results | The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0). |
Machine translation systems and data | These aligned sentence pairs form the training data of the inflection models as well. |
Machine translation systems and data | Dataset sent pairs word tokens (avg/sent) English-Russian |
A Generic Phrase Training Procedure | 1: Train Model-1 and HMM word alignment models 2: for all sentence pair (e, f) do |
A Generic Phrase Training Procedure | Phrase pair filtering is simply thresholding on the final score by comparing to the maximum within the sentence pair . |
Features | Given a phrase pair in a sentence pair , there will be many generative paths that align the source phrase to the target phrase. |
Features | Given a sentence pair , the basic assumption is that if the HMM word alignment model can align an English phrase well to a foreign phrase, the posterior distribution of the English phrase generating all foreign phrases on the other side is significantly biased. |
Experiments | We extracted both development and test data set from years of NIST Chinese-to-English evaluation data by filtering out sentence pairs not containing measure words. |
Experiments | The development set is extracted from NIST evaluation data from 2002 to 2004, and the test set consists of sentence pairs from NIST evaluation data from 2005 to 2006. |
Experiments | There are 759 testing cases for measure word generation in our test data consisting of 2746 sentence pairs . |
Model Training and Application 3.1 Training | We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair . |
Abstract | Using the presented method, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which exceeds 67%. |
Conclusion | Experimental results show that the pivot approach is effective, which extracts over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs . |
Introduction | Using the proposed approach, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs , the precision of which is above 67%. |
Related Work | Thus the method acquired paraphrase patterns from sentence pairs that share comparable NEs. |
Cohesive Phrasal Output | Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003). |
Experiments | We test our cohesion-enhanced Moses decoder trained using 688K sentence pairs of Europarl French-English data, provided by the SMT 2006 Shared Task (Koehn and Monz, 2006). |
Experiments | Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU, when measured on all sentence pairs , with improvements ranging between 0.2 and 0.5 absolute points. |