Experiments | Figure 3: Evaluation of paraphrase systems trained on different numbers of parallel sentences . |
Experiments | Overall, all the trained models produce reasonable paraphrase systems, even the model trained on just 28K single parallel sentences . |
Experiments | Examples of the outputs produced by the models trained on single parallel sentences and on all parallel sentences are shown in Table 2. |
Related Work | Another method for collecting monolingual paraphrase data involves aligning semantically parallel sentences from different news articles describing the same event (Shinyama et al., 2002; Barzilay and Lee, 2003; Dolan et al., 2004). |
Related Work | While utilizing multiple translations of literary work or multiple news stories of the same event can yield significant numbers of parallel sentences , this data tend to be noisy, and reliably identifying good paraphrases among all possible sentence pairs remains an open problem. |
Experiments | We randomly take 200,000 parallel sentences from the UN corpus of the year 2000. |
Experiments | Then, we extract every f that cooccurs with e in a parallel sentence and add it to nbestTI(e) which gives us the list of candidate transliteration pairs candidateTI(e). |
Experiments | Algorithm 3 Estimation of transliteration probabilities, e-to-f direction 1: unfiltered data <—1ist of word pairs 2: filtered data <—transliteration pairs extracted using Algorithm 1 3: Train a transliteration system on the filtered data 4: for all e do 5: nbestTI(e) <— 10 best transliterations for e according to the transliteration system 6: 0000(6) <— set of all f that cooccur with e in a parallel sentence candidateTI(e) <— cooc(e) U nbestTI(e) : end for : for all f do 10: pmoses( f, e) <— joint transliteration probability of e and f according to the transliterator |
Introduction | The NEWSlO data sets are extracted Wikipedia InterLanguage Links (WIL) which consist of parallel phrases, whereas a parallel corpus consists of parallel sentences . |
Models | For training Moses as a transliteration system, we treat each word pair as if it were a parallel sentence , by putting spaces between the characters of each word. |
A Joint Model with Unlabeled Parallel Text | each x,- is a sentence, and x}, and x3, are parallel sentences . |
A Joint Model with Unlabeled Parallel Text | Given the problem definition above, we now present a novel model to exploit the correspondence of parallel sentences in unlabeled bilingual text. |
A Joint Model with Unlabeled Parallel Text | If we assume that parallel sentences are perfect translations, the two sentences in each pair should have the same polarity label, which gives us: |
Abstract | We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language. |
Experiments | The J ST J apanese-English paper abstract corpus6 (Utiyama and Isahara, 2007), which consists of one million parallel sentences , was used for training, tuning, and testing. |
Introduction | However, forest-based translation systems, and, in general, most linguistically syntax-based SMT systems (Galley et al., 2004; Galley et al., 2006; Liu et al., 2006; Zhang et al., 2007; Mi et al., 2008; Liu et al., 2009; Chiang, 2010), are built upon word aligned parallel sentences and thus share a critical dependence on word alignments. |
Introduction | In order to investigate this problem, we manually analyzed the alignments of the first 100 parallel sentences in our English-Japanese training data (to be shown in Table 2). |