Experimental Results | We only present the average results over all four language pairs . |
Experimental Results | Group II: includes the metrics that participated in the WMT12 metrics task, excluding metrics which did not have results for all language pairs . |
Experimental Results | Note that, even though DR-LEX has better individual performance than DR, it does not yield improvements when combined with most of the metrics in group IV.8 However, over all metrics and all language pairs , DR-LEX is able to obtain an average improvement in correlation of +. |
Experimental Setup | In our experiments, we used the data available for the WMT12 and the WMTll metrics shared tasks for translations into English.3 This included the output from the systems that participated in the WMT12 and the WMTll MT evaluation campaigns, both consisting of 3,003 sentences, for four different language pairs : Czech-English (CS-EN), French-English (FR-EN), German-English (DE-EN), and Spanish-English (ES-EN); as well as a dataset with the English references. |
Experimental Setup | Table 1: Number of systems (systs), judgments (ranks), unique sentences (sents), and different judges (judges) for the different language pairs , for the human evaluation of the WMT12 and WMT11 shared tasks. |
Experimental Setup | In order to make the scores of the different metrics comparable, we performed a min—max normalization, for each metric, and for each language pair combination. |
Related Work | Compared to the previous work, (i) we use a different discourse representation (RST), (ii) we compare discourse parses using all-subtree kernels (Collins and Duffy, 2001), (iii) we evaluate on much larger datasets, for several language pairs and for multiple metrics, and (iv) we do demonstrate better correlation with human judgments. |
Evaluation | Two language pairs were used: Arabic-English and Urdu-English. |
Evaluation | The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair , and is evaluated with the larger language model only. |
Evaluation | Bilingual corpus statistics for both language pairs are presented in Table 2. |
Introduction | With large amounts of data, phrase-based translation systems (Koehn et al., 2003; Chiang, 2007) achieve state-of-the-art results in many ty-pologically diverse language pairs (Bojar et al., 2013). |
Introduction | This problem is exacerbated in the many language pairs for which parallel resources are either limited or nonexistent. |
Related Work | As with previous BLI work, these approaches only take into account source-side similarity of words; only moderate gains (and in the latter work, on a subset of language pairs evaluated) are obtained. |
Abstract | In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders. |
Introduction | SMT systems have difficulties translating between distant language pairs such as Chinese and English. |
Introduction | Reordering therefore becomes a key issue in SMT systems between distant language pairs . |
Introduction | Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs , such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010). |
Conclusions | training over domain-specific dictionaries from other language pairs ), and low-density languages where there are few dictionaries and Wikipedia articles to train the method on. |
Motivation | org aspire to aggregate these dictionaries into a single lexical database, but are hampered by the need to identify individual multilingual dictionaries, especially for language pairs where there is a sparsity of data from existing dictionaries (Baldwin et al., 2010; Kamholz and Pool, to appear). |
Motivation | This paper is an attempt to automate the detection of multilingual dictionaries on the web, through query construction for an arbitrary language pair . |
Results | Most queries returned no results; indeed, for the en-ar language pair , only 49/1000 queries returned documents. |
Results | Among the 7 language pairs , en-es, en-de, en-fr and en-it achieved the highest MAP scores. |
Results | In terms of unique lexical resources found with 50 queries, the most successful language pairs were en-fr, en-de and en-it. |
Corpora | We considered the English-German and English-French language pairs from this corpus. |
Experiments | This task involves learning language independent embeddings which are then used for document classification across the English-German language pair . |
Experiments | In the single mode, vectors are learnt from a single language pair (en-X), while in the joint mode vector-learning is performed on all parallel sub-corpora simultaneously. |
Experiments | In the English case we train twelve individual classifiers, each using the training data of a single language pair only. |
Experiments & Results | The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs . |
Experiments & Results | The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair . |
Experiments & Results | Let us first zoom in to convey a sense of scale on a specific language pair . |
Abstract | Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs . |
Extrinsic Evaluation: Consistency of Wikipedia Sentiment | We use the Spearman correlation coefficient to measure the consistence of sentiment distribution across all entities with pages in a particular language pair . |
Introduction | Each language pair exhibits a Spearman sentiment correlation of at least 0.14, with an average correlation of 0.28 over all pairs. |
Knowledge Graph Construction | Closely related language pairs (i.e. |
Experiments | Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings. |
Experiments | The vocabulary consists of words that occur in at least two different sentences, which is 31K words for both language pairs . |
Experiments | The results (Table 1 and Table 2) show that direct integration improves accuracy across all six test sets on both language pairs . |
Word Alignment | We carried out experiments on two language pairs : Arabic to English and Czech to English. |
Word Alignment | Variational Bayes is not consistent across different language pairs . |
Word Alignment | While fractional KN does beat the baseline for both language pairs, the value of D, which we optimized D to maximize Fl , is not consistent across language pairs : as shown in Figure 2, on Arabic-English, a smaller D is better, while for Czech-English, a larger D is better. |