A Joint Model with Unlabeled Parallel Text | the sentence pairs may be noisily parallel (or even comparable) instead of fully parallel (Munteanu and Marcu, 2005). |
A Joint Model with Unlabeled Parallel Text | In such noisy cases, the labels (positive or negative) could be different for the two monolingual sentences in a sentence pair . |
A Joint Model with Unlabeled Parallel Text | Although we do not know the exact probability that a sentence pair exhibits the same label, we can approximate it using their translation |
Experimental Setup 4.1 Data Sets and Preprocessing | Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair , and select the 100,000 pairs with the highest translation probabilities.5 |
Experimental Setup 4.1 Data Sets and Preprocessing | We then classify each unlabeled sentence pair by combining the two sentences in each pair into one. |
Experimental Setup 4.1 Data Sets and Preprocessing | 5We removed sentence pairs with an original confidence score (given in the corpus) smaller than 0.98, and also removed the pairs that are too long (more than 60 characters in one sentence) to facilitate Giza++. |
Results and Analysis | Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1). |
Results and Analysis | However, even with only 2,000 unlabeled sentence pairs , the proposed approach still produces large performance gains. |
Results and Analysis | Examination of those sentence pairs in setting 2 for which the two monolingual models still |
Introduction | Word alignment is the task of identifying corresponding words in sentence pairs . |
Model Definition | Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values. |
Model Definition | The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time. |
Model Definition | Figure l: The structure of our graphical model for a simple sentence pair . |
Model Inference | Moreover, the value of u is specific to a sentence pair . |
Model Inference | Memory requirements are virtually identical to the baseline: only 11 must be stored for each sentence pair as it is being processed, but can then be immediately discarded once alignments are inferred. |
Experiments | The parallel training data comprises of 9.6M sentence pairs (206M Chinese and 228M English words). |
Hard rule labeling from word classes | (2003) to provide us with a set of phrase pairs for each sentence pair in the training corpus, annotated with their respective start and end positions in the source and target sentences. |
Hard rule labeling from word classes | Consider the target-tagged example sentence pair: |
Hard rule labeling from word classes | Intuitively, the labeling of initial rules with tags marking the boundary of their target sides results in complex rules whose nonterminal occurrences impose weak syntactic constraints on the rules eligible for substitution in a PSCFG derivation: The left and right boundary word tags of the inserted rule’s target side have to match the respective boundary word tags of the phrase pair that was replaced by a nonterminal when the complex rule was created from a training sentence pair . |
Experiments | For all language pairs we employ 200K and 400K sentence pairs for training, 2K for development and 2K for testing (single reference per source sentence). |
Experiments | Table 1 presents the results for the baseline and our method for the 4 language pairs, for training sets of both 200K and 400K sentence pairs . |
Experiments | In addition, increasing the size of the training data from 200K to 400K sentence pairs widens the performance margin between the baseline and our system, in some cases considerably. |
Introduction | Starting with the classic IBM work (Brown et al., 1993), training has been viewed as a maximization problem involving hidden word alignments (a) that are assumed to underlie observed sentence pairs |
Machine Translation as a Decipherment Task | (1993) provide an efficient algorithm for training IBM Model 3 translation model when parallel sentence pairs are available. |
Machine Translation as a Decipherment Task | We see that deciphering with 10k monolingual Spanish sentences yields the same performance as training with around 200-500 parallel English/Spanish sentence pairs . |