Index of papers in Proc. ACL that mention
  • language pairs
Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam
Abstract
Our large-scale experiment uncovers large amounts of parallel text in dozens of language pairs across a variety of domains and genres, some previously unavailable in curated datasets.
Abstract
Even with minimal cleaning and filtering, the resulting data boosts translation performance across the board for five different language pairs in the news domain, and on open domain test sets we see improvements of up to 5 BLEU.
Abstract
of language pairs , large amounts of parallel data
language pairs is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Experiments
First, intrinsically, by observing how well our method identifies tweets containing parallel data, the language pair and what their spans are.
Parallel Data Extraction
The target domains in this work are Twitter and Sina Weibo, and the main language pair is Chinese-English.
Parallel Data Extraction
Furthermore, we also run the system for the Arabic-English language pair using the Twitter data.
Parallel Data Extraction
In both cases, we first filter the collection of tweets for messages containing at least one trigram in each language of the target language pair , determined by their Unicode ranges.
Parallel Segment Retrieval
The lexical tables PM for the various language pairs are trained a priori using available parallel corpora.
Parallel Segment Retrieval
While IBM Model 1 produces worse alignments than other models, in our problem, we need to efficiently consider all possible spans, language pairs and word alignments, which makes the problem intractable.
Parallel Segment Retrieval
Our goal is to find the spans, language pair and alignments such that:
language pairs is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Conclusions
We have shown that improvement in clustering can be obtained across a range of language pairs , evaluated in terms of their value as features in an extrinsic NER task.
Experiments
Each language pair contained around 1.5 million German words.
Experiments
Monolingual Clustering: For every language pair , we train German word clusters on the monolingual German data from the parallel data.
Experiments
Table 1 shows the performance of NER when the word clusters are obtained using only the bilingual information for different language pairs .
language pairs is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob
Abstract
We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs .
Abstract
The performance of our classifier reaches the 100% precision level for many language pairs .
Experiments 5.1 Data Sources
We also built comparable corpora in the information technology (IT) and automotive domains by gathering documents from Wikipedia for the English-German language pair .
Experiments 5.1 Data Sources
4Note that we do not use the Maltese-English language pair , as for this pair we found that 5861 out of 6797 term pairs were identical, i.e.
Experiments 5.1 Data Sources
Furthermore, we performed data selection for each language pair separately.
Feature extraction
For instance, the cognate methods are not directly applicable to the English-Bulgarian and English-Greek language pairs , as both the Bulgarian and Greek alphabets, which are Cyrillic-based, differ from the English Latin-based alphabet.
Feature extraction
We created mapping rules for 20 EU language pairs using primarily Wikipedia as a resource for describing phonetic mappings to English.
Method
We have run our approach on the 21 official EU languages covered by EUROVOC, constructing 20 language pairs with English as the source
Method
We run this evaluation on all 20 language pairs .
language pairs is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero.
AL-SMT: Multilingual Setting
2.1 for AL in the multilingual setting includes the single language pair setting as a special case (Haffari et al., 2009).
AL-SMT: Multilingual Setting
For a single language pair we use U and L.
Abstract
We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting.
Introduction
The multilingual setting provides new opportunities for AL over and above a single language pair .
Introduction
In our case, the multiple tasks are individual machine translation tasks for several language pairs .
Introduction
languages to the new language depending on the characteristics of each source-target language pair , hence these tasks are competing for annotating the same resource.
language pairs is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Larkin, Samuel
Abstract
We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs .
BLEU and PORT
For our experiments, we tuned a on Chinese-English data, setting it to 0.25 and keeping this value for the other language pairs .
Conclusions
Most important, our results show that PORT-tuned MT systems yield better translations than BLEU-tuned systems on several language pairs , according both to automatic metrics and human evaluations.
Conclusions
In future work, we plan to tune the free parameter 0t for each language pair .
Experiments
Most WMT submissions involve language pairs with similar word order, so the ordering factor v in PORT won’t play a big role.
Experiments
In internal tests we have found no systematic difference in dev-set BLEUs, so we speculate that PORT’s emphasis on reordering yields models that generalize better for these two language pairs .
Experiments
Of the Table 5 language pairs , the one where PORT tuning helps most has the lowest BLEU in Table 4 (German-English); the one where it helps least in Table 5 has the highest BLEU in Table 4 (French-English).
Introduction
However, since PORT is designed for tuning, the most important results are those showing that PORT tuning yields systems with better translations than those produced by BLEU tuning — both as determined by automatic metrics (including BLEU), and according to human judgment, as applied to five data conditions involving four language pairs .
language pairs is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Experiments
Their systems behave differently on English/Russian than on other language pairs .
Experiments
We create gold standards for both language pairs by randomly selecting a few thousand word pairs from the lists of word pairs extracted from the two corpora.
Experiments
4This solution is appropriate for all of the language pairs used in our experiments, but should be revisited if there is inflection realized as prefixes, etc.
Extraction of Transliteration Pairs
In this section, we present an iterative method for the extraction of transliteration pairs from parallel corpora which is fully unsuperVised and language pair independent.
Extraction of Transliteration Pairs
We ignore non-l-to-l alignments because they are less likely to be transliterations for most language pairs .
Introduction
Such resources are also not applicable to other language pairs .
Introduction
We compare our unsupervised transliteration mining method with the semi-supervised systems presented at the NEWS 2010 shared task on transliteration mining (Kumaran et al., 2010) using four language pairs .
Introduction
We also do experiments on parallel corpora for two language pairs .
language pairs is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Mylonakis, Markos and Sima'an, Khalil
Abstract
We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
Experiments
These extra features assess translation quality past the synchronous grammar derivation and learning general reordering or word emission preferences for the language pair .
Experiments
We evaluate our method on four different language pairs with English as the source language and French, German, Dutch and Chinese as target.
Experiments
The data for the first three language pairs are derived from parliament proceedings sourced from the Europarl corpus (Koehn, 2005), with WMT—07 development and test data for French and German.
Introduction
utilised an ITG-flavour which focused on hierarchical phrase-pairs to capture context-driven translation and reordering patterns with ‘gaps’, offering competitive performance particularly for language pairs with extensive reordering.
Introduction
By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline.
language pairs is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Abstract
We show that high—precision lexicons can be learned in a variety of language pairs and from a range of corpus types.
Experimental Setup
all languages pairs except English-Arabic, we extract evaluation lexicons from the Wiktionary online dictionary.
Experiments
We also explored how system performance varies for language pairs other than English-Spanish.
Experiments
One concern is how our system performs on language pairs where orthographic features are less applicable.
Features
While orthographic features are clearly effective for historically related language pairs, they are more limited for other language pairs , where we need to appeal to other clues.
Features
(section 6.2), (c) a variety of language pairs (see section 6.3).
Introduction
Although parallel text is plentiful for some language pairs such as English-Chinese or English-Arabic, it is scarce or even nonexistent for most others, such as English-Hindi or French-Japanese.
Introduction
Moreover, parallel text could be scarce for a language pair even if monolingual data is readily available for both languages.
Introduction
This task, though clearly more difficult than the standard parallel text approach, can operate on language pairs and in domains where standard approaches cannot.
language pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Abstract
In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs .
Experiments
In order to test the effectiveness of character-based translation, we performed experiments over a variety of language pairs and experimental settings.
Experiments
As previous research has shown that it is more difficult to translate into morphologically rich languages than into English (Koehn, 2005), we perform experiments translating in both directions for all language pairs .
Experiments
This confirms that character-based translation is performing well on languages that have long words or ambiguous boundaries, and less well on language pairs with relatively strong one-to-one correspondence between words.
Introduction
This method is attractive, as it is theoretically able to handle all sparsity phenomena in a single unified framework, but has only been shown feasible between similar language pairs such as Spanish-Catalan (Vilar et al., 2007), Swedish-Norwegian (Tiedemann, 2009), and Thai-Lao (Somlertlamvanich et al., 2008), which have a strong co-occurrence between single characters.
Introduction
(2007) state and we confirm, accurate translations cannot be achieved when applying traditional translation techniques to character-based translation for less similar language pairs .
Introduction
An evaluation on four language pairs with differing morphological properties shows that for distant language pairs , character-based SMT can achieve translation accuracy comparable to word-based systems.
Related Work on Data Sparsity in SMT
However, while the approach is attractive conceptually, previous research has only been shown effective for closely related language pairs (Vilar et al., 2007; Tiedemann, 2009; Sornlertlamvanich et al., 2008).
Related Work on Data Sparsity in SMT
In this work, we propose effective alignment techniques that allow character-based translation to achieve accurate translation results for both close and distant language pairs .
language pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav
Experimental Results
We only present the average results over all four language pairs .
Experimental Results
Group II: includes the metrics that participated in the WMT12 metrics task, excluding metrics which did not have results for all language pairs .
Experimental Results
Note that, even though DR-LEX has better individual performance than DR, it does not yield improvements when combined with most of the metrics in group IV.8 However, over all metrics and all language pairs , DR-LEX is able to obtain an average improvement in correlation of +.
Experimental Setup
In our experiments, we used the data available for the WMT12 and the WMTll metrics shared tasks for translations into English.3 This included the output from the systems that participated in the WMT12 and the WMTll MT evaluation campaigns, both consisting of 3,003 sentences, for four different language pairs : Czech-English (CS-EN), French-English (FR-EN), German-English (DE-EN), and Spanish-English (ES-EN); as well as a dataset with the English references.
Experimental Setup
Table 1: Number of systems (systs), judgments (ranks), unique sentences (sents), and different judges (judges) for the different language pairs , for the human evaluation of the WMT12 and WMT11 shared tasks.
Experimental Setup
In order to make the scores of the different metrics comparable, we performed a min—max normalization, for each metric, and for each language pair combination.
Related Work
Compared to the previous work, (i) we use a different discourse representation (RST), (ii) we compare discourse parses using all-subtree kernels (Collins and Duffy, 2001), (iii) we evaluate on much larger datasets, for several language pairs and for multiple metrics, and (iv) we do demonstrate better correlation with human judgments.
language pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Abstract
Our experiments show speedups from MERT and MBR as well as performance improvements from MBR decoding on several language pairs .
Discussion
This may not be optimal in practice for unseen test sets and language pairs , and the resulting linear loss may be quite different from the corpus level BLEU.
Discussion
On an experiment with 40 language pairs , we obtain improvements on 26 pairs, no difference on 8 pairs and drops on 5 pairs.
Discussion
This was achieved without any need for manual tuning for each language pair .
Experiments
We report results on nist03 set and present three systems for each language pair : phrase-based (pb), hierarchical (hier), and SAMT; Lattice MBR is done for the phrase-based system while HGMBR is used for the other two.
Experiments
For the multi-language case, we train phrase-based systems and perform lattice MBR for all language pairs .
Experiments
When we optimize MBR features with MERT, the number of language pairs with gains/no changes/-drops is 22/5/12.
language pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Shezaf, Daphna and Rappoport, Ari
Abstract
Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs .
Introduction
However, for most language pairs parallel bilingual corpora either do not exist or are at best small and unrepresentative of the general language.
Introduction
Pivot language approaches deal with the scarcity of bilingual data for most language pairs by relying on the availability of bilingual data for each of the languages in question with a third, pivot, language.
Lexicon Generation Experiments
We chose a language pair for which basically no parallel corpora existz, and that do not share ancestry or writing system in a way that can provide cues for alignment.
Lexicon Generation Experiments
These considerations lead us to believe that our choice of language pair is more challenging than, for example, a pair of European languages.
NAS Score Properties
For other language pairs lemmatization may be needed.
Previous Work
The limited availability of parallel corpora of sufficient size for most language pairs restricts the usefulness of these methods.
Previous Work
(2009) used many input bilingual lexicons to create bilingual lexicons for new language pairs .
language pairs is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
Two language pairs were used: Arabic-English and Urdu-English.
Evaluation
The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair , and is evaluated with the larger language model only.
Evaluation
Bilingual corpus statistics for both language pairs are presented in Table 2.
Introduction
With large amounts of data, phrase-based translation systems (Koehn et al., 2003; Chiang, 2007) achieve state-of-the-art results in many ty-pologically diverse language pairs (Bojar et al., 2013).
Introduction
This problem is exacerbated in the many language pairs for which parallel resources are either limited or nonexistent.
Related Work
As with previous BLI work, these approaches only take into account source-side similarity of words; only moderate gains (and in the latter work, on a subset of language pairs evaluated) are obtained.
language pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Mausam and Soderland, Stephen and Etzioni, Oren and Weld, Daniel and Skinner, Michael and Bilmes, Jeff
Abstract
PANDICTIONARY contains more than four times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.
Empirical Evaluation
Such people are hard to find and may not even exist for many language pairs (e. g., Basque and Maori).
Empirical Evaluation
For this study we tagged 7 language pairs : Hindi-Hebrew,
Introduction and Motivation
PANDICTIONARY, that could serve as a resource for translation systems operating over a very broad set of language pairs .
Introduction and Motivation
PANDICTIONARY currently contains over 200 million pairwise translations in over 200,000 language pairs at precision 0.8.
Introduction and Motivation
We describe the design and construction of PAN DICTIONARY—a novel lexical resource that spans over 200 million pairwise translations in over 200,000 language pairs at 0.8 precision, a fourfold increase when compared to the union of its input translation dictionaries.
Related Work
lingual corpora, which may scale to several language pairs in future (Haghighi et al., 2008).
language pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Abstract
We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs .
Abstract
We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
Conclusions
Table 3: BLEU scores for all language pairs using all available data.
Conclusions
We tested this hypothesis on six different language pairs from three different domains, and found that the new alignment scheme not only performs better than the baseline, but also improves over a more complicated, intractable model.
Introduction
language pairs used in recent MT competitions.
Phrase-based machine translation
Our next set of experiments look at our performance in both directions across our 6 corpora, when we have small to moderate amounts of training data: for the language pairs with more than 100,000 sentences, we use only the first 100,000 sentences.
Phrase-based machine translation
Table 2: BLEU scores for all language pairs using up to 100k sentences.
language pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kozhevnikov, Mikhail and Titov, Ivan
Abstract
This approach is then evaluated on three language pairs , demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline.
Background and Motivation
We evaluate on five (directed) language pairs —EN-ZH, ZH-EN, EN-CZ, CZ-EN and EN-FR, where EN, FR, CZ and ZH denote English, French, Czech and Chinese, respectively.
Evaluation
We have identified three language pairs for which such resources are available: English-Chinese, English-Czech and English-French.
Evaluation
The data for the second language pair is drawn from the Prague Czech-English Dependency Treebank 2.0 (Hajic et al., 2012), which we converted to a format similar to that of CoNLL-8T1.
Results
It is easy to see that the scores vary strongly depending on the language pair , due to both the difference in the annotation scheme used and the degree of relatedness between the languages.
Results
The source language scores for English vary between language pairs because of the difference in syntactic annotation and role subset used.
Results
For more distant language pairs , the contributions of individual feature groups are less interpretable, so we only highlight a few observations.
language pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Grigonyte, Gintare and Baldwin, Timothy
Conclusions
training over domain-specific dictionaries from other language pairs ), and low-density languages where there are few dictionaries and Wikipedia articles to train the method on.
Motivation
org aspire to aggregate these dictionaries into a single lexical database, but are hampered by the need to identify individual multilingual dictionaries, especially for language pairs where there is a sparsity of data from existing dictionaries (Baldwin et al., 2010; Kamholz and Pool, to appear).
Motivation
This paper is an attempt to automate the detection of multilingual dictionaries on the web, through query construction for an arbitrary language pair .
Results
Most queries returned no results; indeed, for the en-ar language pair , only 49/1000 queries returned documents.
Results
Among the 7 language pairs , en-es, en-de, en-fr and en-it achieved the highest MAP scores.
Results
In terms of unique lexical resources found with 50 queries, the most successful language pairs were en-fr, en-de and en-it.
language pairs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Abstract
In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.
Introduction
SMT systems have difficulties translating between distant language pairs such as Chinese and English.
Introduction
Reordering therefore becomes a key issue in SMT systems between distant language pairs .
Introduction
Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs , such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).
language pairs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Corpora
We considered the English-German and English-French language pairs from this corpus.
Experiments
This task involves learning language independent embeddings which are then used for document classification across the English-German language pair .
Experiments
In the single mode, vectors are learnt from a single language pair (en-X), while in the joint mode vector-learning is performed on all parallel sub-corpora simultaneously.
Experiments
In the English case we train twelve individual classifiers, each using the training data of a single language pair only.
language pairs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Experiments & Results
The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs .
Experiments & Results
The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair .
Experiments & Results
Let us first zoom in to convey a sense of scale on a specific language pair .
language pairs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Abstract
These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
Conclusion and discussion
Using automatically obtained word clusters instead of POS tags yields essentially the same results, thus making our methods applicable to all languages pairs with parallel corpora, whether syntactic resources are available for them or not.
Experiments
Even though a key advantage of our method is its applicability to resource-poor languages, we used a language pair for which lin-
Experiments
Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources.
Introduction
The Probabilistic Synchronous Context Free Grammar (PSCFG) formalism suggests an intuitive approach to model the long-distance and lexically sensitive reordering phenomena that often occur across language pairs considered for statistical machine translation.
language pairs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lim, Lian Tze and Soon, Lay-Ki and Lim, Tek Yong and Tang, Enya Kong and Ranaivo-Malançon, Bali
Abstract
We extract translation context knowledge from a bilingual comparable corpora of a richer-resourced language pair , and inject it into a multilingual lexicon.
Introduction
We are interested in leveraging richer-resourced language pairs to enable context-dependent lexical lookup for under-resourced languages.
Introduction
We propose a rapid approach for acquiring them from an untagged, comparable bilingual corpus of a (richer-resourced) language pair in section 3.
Typical Resource Requirements for Translation Selection
However, aligned corpora can be difficult to obtain for under-resourced language pairs , and are expensive to construct.
Typical Resource Requirements for Translation Selection
(2011) also tackled the problem of cross-lingual disambiguation for under-resourced language pairs (English—Persian) using Wikipedia articles, by applying the one sense per collocation and one sense per discourse heuristics on a comparable corpus.
language pairs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Conclusion
The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs .
Experiments
For each language pair , we extracted grammar rules from the same data that were used for word alignment.
Introduction
These models are unsupervised, making them applicable to any language pair for which parallel text is available.
Introduction
Although manually-aligned data is very valuable, it is only available for a small number of language pairs .
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Experimental SetUp
To obtain our corpus of short parallel phrases, we preprocessed each language pair using the Giza++ alignment toolkit.6 Given word alignments for each language pair , we extract a list of phrase pairs that form independent sets in the bipartite alignment graph.
Introduction
When modeled in tandem, gains are observed for all language pairs , reducing relative error by as much as 24%.
Introduction
Furthermore, our experiments show that both related and unrelated language pairs benefit from multilingual learning.
Results
However, once character-to-character phonetic correspondences are added as an abstract morpheme prior (final two rows), we find the performance of related language pairs outstrips English, reducing relative error over MONOLINGUAL by 10% and 24% for the Hebrew/Arabic pair.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Conclusion and future work
In the future, we would like to include more sophistication in the design of a lexicon for a particular language pair based on error analysis, and extend our preprocessing to include other operations such as word segmentation.
Integration of inflection models with MT systems
However, for some language pairs , stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target.
Machine translation systems and data
For each language pair , we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
Machine translation systems and data
All MT systems for a given language pair used the same datasets.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok
Experiment
Three human annotators who are fluent in the two languages manually annotated N-to-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP).
Experiment
By keeping only the sentence chunks whose Korean chunk appears in all language pairs , we were left with 859 sentence chunk pairs.
Experiment
The subjectivity analysis systems are evaluated with all language pairs with kappa and Pearson’s correlation coefficients.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Yanqing and Skiena, Steven
Abstract
Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs .
Extrinsic Evaluation: Consistency of Wikipedia Sentiment
We use the Spearman correlation coefficient to measure the consistence of sentiment distribution across all entities with pages in a particular language pair .
Introduction
Each language pair exhibits a Spearman sentiment correlation of at least 0.14, with an average correlation of 0.28 over all pairs.
Knowledge Graph Construction
Closely related language pairs (i.e.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Introduction
Of course, for many language pairs and domains, parallel data is not available.
Introduction
As successful work develops along this line, we expect more domains and language pairs to be conquered by SMT.
Machine Translation as a Decipherment Task
Data: We work with the Spanish/English language pair and use the following corpora in our MT experiments:
Machine Translation as a Decipherment Task
0 OPUS movie subtitle corpus: This is a large open source collection of parallel corpora available for multiple language pairs (Tiedemann, 2009).
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Word Alignment
We carried out experiments on two language pairs : Arabic to English and Czech to English.
Word Alignment
Variational Bayes is not consistent across different language pairs .
Word Alignment
While fractional KN does beat the baseline for both language pairs, the value of D, which we optimized D to maximize Fl , is not consistent across language pairs : as shown in Figure 2, on Arabic-English, a smaller D is better, while for Czech-English, a larger D is better.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Experiments
Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Experiments
The vocabulary consists of words that occur in at least two different sentences, which is 31K words for both language pairs .
Experiments
The results (Table 1 and Table 2) show that direct integration improves accuracy across all six test sets on both language pairs .
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Conclusion
As future work we would like to evaluate our models on other language pairs .
Related work
The task of directly learning a reordering model for language pairs that are very different is closely related to the task of parsing and hence work on semi-supervised parsing (Koo et al., 2008; McClosky et al., 2006; Suzuki et al., 2009) is broadly related to our work.
Reordering issues in Urdu-English translation
In this section we describe the main sources of word order differences between Urdu and English since this is the language pair we experiment with in this paper.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Experiment Results
We will report the impact of integrating phrase-based features into Hiero systems for three language pairs : Arabic-English, Chinese-English and German-English.
Introduction
Yet, tree-based translation often underperforms phrase-based translation in language pairs with short range reordering such as Arabic-English translation (Zollmann et al., 2008; Birch et al., 2009).
Introduction
This is important for language pairs with strict reordering.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
For each configuration (combination of language pair and domain) 0 and test set If in Table 2, a gold curve is fitted using the selected tri-parameter power-law family using a fine grid of corpus sizes.
Introduction
Our experiments involve 30 distinct language pair and domain combinations and 96 different learning curves.
Selecting a parametric family of curves
for all the six families on a test dataset for English-German language pair .
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak
Abstract
Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.
Introduction
Such a bilingual bootstrapping strategy when tested on two domains, viz, Tourism and Health using Hindi (L1) and Marathi (L2) as the language pair , consistently does better than a baseline strategy which uses only seed data for training without performing any bootstrapping.
Synset Aligned Multilingual Dictionary
The average number of such links per synset per language pair is approximately 3.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Abstract
We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures.
Experimental Setup and Results
12The experience with MERT for this language pair has not been very positive.
Experimental Setup and Results
In order to alleviate the lack of large scale parallel corpora for the English—Turkish language pair , we experimented with augmenting the training data with reliable phrase pairs obtained from a previous alignment.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Abstract
This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
Conclusion
In closely related language pairs such as Hindi-Urdu with a significant amount of vocabulary overlap,
Evaluation
The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Discussion
(Zeman and Resnik, 2008) assumed that the morphology and syntax in the language pair should be very similar, and that is so for the language pair that they considered, Danish and Swedish, two very close north European languages.
Introduction
What our method relies on is not the close relation of the chosen language pair but the similarity of two treebanks, this is the most different from the previous work.
The Related Work
As fewer language properties are concerned, our approach holds the more possibility to be extended to other language pairs than theirs.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Introduction
Unfortunately, large quantities of parallel data are not readily available for some languages pairs , therefore limiting the potential use of current SMT systems.
Introduction
It is especially difficult to obtain such a domain-specific corpus for some language pairs such as Chinese to Spanish translation.
Using RBMT Systems for Pivot Translation
For many source-target language pairs , the commercial pivot-source and/or pivot-target RBMT systems are available on markets.
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
(2008) showed that most of the improvement from lattice-based consensus decoding comes from lattice-based expectations, not search: searching over lattices instead of k-best lists did not change results for two language pairs, and improved a third language pair by 0.3 BLEU.
Experimental Results
Despite this optimization, our new Algorithm 3 was an average of 80 times faster across systems and language pairs .
Introduction
We also show that using forests outperforms using k-best lists consistently across language pairs .
language pairs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: