Index of papers in Proc. ACL that mention

language pairs

Seen in text as:

language pairs (185)
language pair (106)
languages pairs (4)

Seen in 287 sentences in 40 papers.

1. Dirt Cheap Web-Scale Parallel Text from the Common Crawl

Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our large-scale experiment uncovers large amounts of parallel text in dozens of language pairs across a variety of domains and genres, some previously unavailable in curated datasets.
Abstract	Even with minimal cleaning and filtering, the resulting data boosts translation performance across the board for five different language pairs in the news domain, and on open domain test sets we see improvements of up to 5 BLEU.
Abstract	of language pairs , large amounts of parallel data

language pairs is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

2. Microblogs as Parallel Corpora

Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	First, intrinsically, by observing how well our method identifies tweets containing parallel data, the language pair and what their spans are.
Parallel Data Extraction	The target domains in this work are Twitter and Sina Weibo, and the main language pair is Chinese-English.
Parallel Data Extraction	Furthermore, we also run the system for the Arabic-English language pair using the Twitter data.
Parallel Data Extraction	In both cases, we first filter the collection of tweets for messages containing at least one trigram in each language of the target language pair , determined by their Unicode ranges.
Parallel Segment Retrieval	The lexical tables PM for the various language pairs are trained a priori using available parallel corpora.
Parallel Segment Retrieval	While IBM Model 1 produces worse alignments than other models, in our problem, we need to efficiently consider all possible spans, language pairs and word alignments, which makes the problem intractable.
Parallel Segment Retrieval	Our goal is to find the spans, language pair and alignments such that:

language pairs is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. An Information Theoretic Approach to Bilingual Word Clustering

Faruqui, Manaal and Dyer, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	We have shown that improvement in clustering can be obtained across a range of language pairs , evaluated in terms of their value as features in an extrinsic NER task.
Experiments	Each language pair contained around 1.5 million German words.
Experiments	Monolingual Clustering: For every language pair , we train German word clusters on the monolingual German data from the parallel data.
Experiments	Table 1 shows the performance of NER when the word clusters are obtained using only the bilingual information for different language pairs .

language pairs is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Extracting bilingual terminologies from comparable corpora

Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs .
Abstract	The performance of our classifier reaches the 100% precision level for many language pairs .
Experiments 5.1 Data Sources	We also built comparable corpora in the information technology (IT) and automotive domains by gathering documents from Wikipedia for the English-German language pair .
Experiments 5.1 Data Sources	4Note that we do not use the Maltese-English language pair , as for this pair we found that 5861 out of 6797 term pairs were identical, i.e.
Experiments 5.1 Data Sources	Furthermore, we performed data selection for each language pair separately.
Feature extraction	For instance, the cognate methods are not directly applicable to the English-Bulgarian and English-Greek language pairs , as both the Bulgarian and Greek alphabets, which are Cyrillic-based, differ from the English Latin-based alphabet.
Feature extraction	We created mapping rules for 20 EU language pairs using primarily Wikipedia as a resource for describing phonetic mappings to English.
Method	We have run our approach on the 21 official EU languages covered by EUROVOC, constructing 20 language pairs with English as the source
Method	We run this evaluation on all 20 language pairs .

language pairs is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

5. Active Learning for Multilingual Statistical Machine Translation

Haffari, Gholamreza and Sarkar, Anoop

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

AL-SMT: Multilingual Setting	The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero.
AL-SMT: Multilingual Setting	2.1 for AL in the multilingual setting includes the single language pair setting as a special case (Haffari et al., 2009).
AL-SMT: Multilingual Setting	For a single language pair we use U and L.
Abstract	We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting.
Introduction	The multilingual setting provides new opportunities for AL over and above a single language pair .
Introduction	In our case, the multiple tasks are individual machine translation tasks for several language pairs .
Introduction	languages to the new language depending on the characteristics of each source-target language pair , hence these tasks are competing for annotating the same resource.

language pairs is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

6. PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs .
BLEU and PORT	For our experiments, we tuned a on Chinese-English data, setting it to 0.25 and keeping this value for the other language pairs .
Conclusions	Most important, our results show that PORT-tuned MT systems yield better translations than BLEU-tuned systems on several language pairs , according both to automatic metrics and human evaluations.
Conclusions	In future work, we plan to tune the free parameter 0t for each language pair .
Experiments	Most WMT submissions involve language pairs with similar word order, so the ordering factor v in PORT won’t play a big role.
Experiments	In internal tests we have found no systematic difference in dev-set BLEUs, so we speculate that PORT’s emphasis on reordering yields models that generalize better for these two language pairs .
Experiments	Of the Table 5 language pairs , the one where PORT tuning helps most has the lowest BLEU in Table 4 (German-English); the one where it helps least in Table 5 has the highest BLEU in Table 4 (French-English).
Introduction	However, since PORT is designed for tuning, the most important results are those showing that PORT tuning yields systems with better translations than those produced by BLEU tuning — both as determined by automatic metrics (including BLEU), and according to human judgment, as applied to five data conditions involving four language pairs .

language pairs is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

7. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Their systems behave differently on English/Russian than on other language pairs .
Experiments	We create gold standards for both language pairs by randomly selecting a few thousand word pairs from the lists of word pairs extracted from the two corpora.
Experiments	4This solution is appropriate for all of the language pairs used in our experiments, but should be revisited if there is inflection realized as prefixes, etc.
Extraction of Transliteration Pairs	In this section, we present an iterative method for the extraction of transliteration pairs from parallel corpora which is fully unsuperVised and language pair independent.
Extraction of Transliteration Pairs	We ignore non-l-to-l alignments because they are less likely to be transliterations for most language pairs .
Introduction	Such resources are also not applicable to other language pairs .
Introduction	We compare our unsupervised transliteration mining method with the semi-supervised systems presented at the NEWS 2010 shared task on transliteration mining (Kumaran et al., 2010) using four language pairs .
Introduction	We also do experiments on parallel corpora for two language pairs .

language pairs is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

8. Learning Hierarchical Translation Structure with Linguistic Annotations

Mylonakis, Markos and Sima'an, Khalil

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
Experiments	These extra features assess translation quality past the synchronous grammar derivation and learning general reordering or word emission preferences for the language pair .
Experiments	We evaluate our method on four different language pairs with English as the source language and French, German, Dutch and Chinese as target.
Experiments	The data for the first three language pairs are derived from parliament proceedings sourced from the Europarl corpus (Koehn, 2005), with WMT—07 development and test data for French and German.
Introduction	utilised an ITG-flavour which focused on hierarchical phrase-pairs to capture context-driven translation and reordering patterns with ‘gaps’, offering competitive performance particularly for language pairs with extensive reordering.
Introduction	By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline.

language pairs is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

9. Learning Bilingual Lexicons from Monolingual Corpora

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show that high—precision lexicons can be learned in a variety of language pairs and from a range of corpus types.
Experimental Setup	all languages pairs except English-Arabic, we extract evaluation lexicons from the Wiktionary online dictionary.
Experiments	We also explored how system performance varies for language pairs other than English-Spanish.
Experiments	One concern is how our system performs on language pairs where orthographic features are less applicable.
Features	While orthographic features are clearly effective for historically related language pairs, they are more limited for other language pairs , where we need to appeal to other clues.
Features	(section 6.2), (c) a variety of language pairs (see section 6.3).
Introduction	Although parallel text is plentiful for some language pairs such as English-Chinese or English-Arabic, it is scarce or even nonexistent for most others, such as English-Hindi or French-Japanese.
Introduction	Moreover, parallel text could be scarce for a language pair even if monolingual data is readily available for both languages.
Introduction	This task, though clearly more difficult than the standard parallel text approach, can operate on language pairs and in domains where standard approaches cannot.

language pairs is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

10. Machine Translation without Words through Substring Alignment

Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs .
Experiments	In order to test the effectiveness of character-based translation, we performed experiments over a variety of language pairs and experimental settings.
Experiments	As previous research has shown that it is more difficult to translate into morphologically rich languages than into English (Koehn, 2005), we perform experiments translating in both directions for all language pairs .
Experiments	This confirms that character-based translation is performing well on languages that have long words or ambiguous boundaries, and less well on language pairs with relatively strong one-to-one correspondence between words.
Introduction	This method is attractive, as it is theoretically able to handle all sparsity phenomena in a single unified framework, but has only been shown feasible between similar language pairs such as Spanish-Catalan (Vilar et al., 2007), Swedish-Norwegian (Tiedemann, 2009), and Thai-Lao (Somlertlamvanich et al., 2008), which have a strong co-occurrence between single characters.
Introduction	(2007) state and we confirm, accurate translations cannot be achieved when applying traditional translation techniques to character-based translation for less similar language pairs .
Introduction	An evaluation on four language pairs with differing morphological properties shows that for distant language pairs , character-based SMT can achieve translation accuracy comparable to word-based systems.
Related Work on Data Sparsity in SMT	However, while the approach is attractive conceptually, previous research has only been shown effective for closely related language pairs (Vilar et al., 2007; Tiedemann, 2009; Sornlertlamvanich et al., 2008).
Related Work on Data Sparsity in SMT	In this work, we propose effective alignment techniques that allow character-based translation to achieve accurate translation results for both close and distant language pairs .

language pairs is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

11. Using Discourse Structure Improves Machine Translation Evaluation

Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	We only present the average results over all four language pairs .
Experimental Results	Group II: includes the metrics that participated in the WMT12 metrics task, excluding metrics which did not have results for all language pairs .
Experimental Results	Note that, even though DR-LEX has better individual performance than DR, it does not yield improvements when combined with most of the metrics in group IV.8 However, over all metrics and all language pairs , DR-LEX is able to obtain an average improvement in correlation of +.
Experimental Setup	In our experiments, we used the data available for the WMT12 and the WMTll metrics shared tasks for translations into English.3 This included the output from the systems that participated in the WMT12 and the WMTll MT evaluation campaigns, both consisting of 3,003 sentences, for four different language pairs : Czech-English (CS-EN), French-English (FR-EN), German-English (DE-EN), and Spanish-English (ES-EN); as well as a dataset with the English references.
Experimental Setup	Table 1: Number of systems (systs), judgments (ranks), unique sentences (sents), and different judges (judges) for the different language pairs , for the human evaluation of the WMT12 and WMT11 shared tasks.
Experimental Setup	In order to make the scores of the different metrics comparable, we performed a min—max normalization, for each metric, and for each language pair combination.
Related Work	Compared to the previous work, (i) we use a different discourse representation (RST), (ii) we compare discourse parses using all-subtree kernels (Collins and Duffy, 2001), (iii) we evaluate on much larger datasets, for several language pairs and for multiple metrics, and (iv) we do demonstrate better correlation with human judgments.

language pairs is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

12. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our experiments show speedups from MERT and MBR as well as performance improvements from MBR decoding on several language pairs .
Discussion	This may not be optimal in practice for unseen test sets and language pairs , and the resulting linear loss may be quite different from the corpus level BLEU.
Discussion	On an experiment with 40 language pairs , we obtain improvements on 26 pairs, no difference on 8 pairs and drops on 5 pairs.
Discussion	This was achieved without any need for manual tuning for each language pair .
Experiments	We report results on nist03 set and present three systems for each language pair : phrase-based (pb), hierarchical (hier), and SAMT; Lattice MBR is done for the phrase-based system while HGMBR is used for the other two.
Experiments	For the multi-language case, we train phrase-based systems and perform lattice MBR for all language pairs .
Experiments	When we optimize MBR features with MERT, the number of language pairs with gains/no changes/-drops is 22/5/12.

language pairs is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

13. Bilingual Lexicon Generation Using Non-Aligned Signatures

Shezaf, Daphna and Rappoport, Ari

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs .
Introduction	However, for most language pairs parallel bilingual corpora either do not exist or are at best small and unrepresentative of the general language.
Introduction	Pivot language approaches deal with the scarcity of bilingual data for most language pairs by relying on the availability of bilingual data for each of the languages in question with a third, pivot, language.
Lexicon Generation Experiments	We chose a language pair for which basically no parallel corpora existz, and that do not share ancestry or writing system in a way that can provide cues for alignment.
Lexicon Generation Experiments	These considerations lead us to believe that our choice of language pair is more challenging than, for example, a pair of European languages.
NAS Score Properties	For other language pairs lemmatization may be needed.
Previous Work	The limited availability of parallel corpora of sufficient size for most language pairs restricts the usefulness of these methods.
Previous Work	(2009) used many input bilingual lexicons to create bilingual lexicons for new language pairs .

language pairs is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

14. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Two language pairs were used: Arabic-English and Urdu-English.
Evaluation	The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair , and is evaluated with the larger language model only.
Evaluation	Bilingual corpus statistics for both language pairs are presented in Table 2.
Introduction	With large amounts of data, phrase-based translation systems (Koehn et al., 2003; Chiang, 2007) achieve state-of-the-art results in many ty-pologically diverse language pairs (Bojar et al., 2013).
Introduction	This problem is exacerbated in the many language pairs for which parallel resources are either limited or nonexistent.
Related Work	As with previous BLI work, these approaches only take into account source-side similarity of words; only moderate gains (and in the latter work, on a subset of language pairs evaluated) are obtained.

language pairs is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

15. Compiling a Massive, Multilingual Dictionary via Probabilistic Inference

Mausam and Soderland, Stephen and Etzioni, Oren and Weld, Daniel and Skinner, Michael and Bilmes, Jeff

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	PANDICTIONARY contains more than four times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.
Empirical Evaluation	Such people are hard to find and may not even exist for many language pairs (e. g., Basque and Maori).
Empirical Evaluation	For this study we tagged 7 language pairs : Hindi-Hebrew,
Introduction and Motivation	PANDICTIONARY, that could serve as a resource for translation systems operating over a very broad set of language pairs .
Introduction and Motivation	PANDICTIONARY currently contains over 200 million pairwise translations in over 200,000 language pairs at precision 0.8.
Introduction and Motivation	We describe the design and construction of PAN DICTIONARY—a novel lexical resource that spans over 200 million pairwise translations in over 200,000 language pairs at 0.8 precision, a fourfold increase when compared to the union of its input translation dictionaries.
Related Work	lingual corpora, which may scale to several language pairs in future (Haghighi et al., 2008).

language pairs is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

16. Better Alignments = Better Translations?

Ganchev, Kuzman and Graça, João V. and Taskar, Ben

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs .
Abstract	We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
Conclusions	Table 3: BLEU scores for all language pairs using all available data.
Conclusions	We tested this hypothesis on six different language pairs from three different domains, and found that the new alignment scheme not only performs better than the baseline, but also improves over a more complicated, intractable model.
Introduction	language pairs used in recent MT competitions.
Phrase-based machine translation	Our next set of experiments look at our performance in both directions across our 6 corpora, when we have small to moderate amounts of training data: for the language pairs with more than 100,000 sentences, we use only the first 100,000 sentences.
Phrase-based machine translation	Table 2: BLEU scores for all language pairs using up to 100k sentences.

language pairs is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

17. Cross-lingual Transfer of Semantic Role Labeling Models

Kozhevnikov, Mikhail and Titov, Ivan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This approach is then evaluated on three language pairs , demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline.
Background and Motivation	We evaluate on five (directed) language pairs —EN-ZH, ZH-EN, EN-CZ, CZ-EN and EN-FR, where EN, FR, CZ and ZH denote English, French, Czech and Chinese, respectively.
Evaluation	We have identified three language pairs for which such resources are available: English-Chinese, English-Czech and English-French.
Evaluation	The data for the second language pair is drawn from the Prague Czech-English Dependency Treebank 2.0 (Hajic et al., 2012), which we converted to a format similar to that of CoNLL-8T1.
Results	It is easy to see that the scores vary strongly depending on the language pair , due to both the difference in the annotation scheme used and the degree of relatedness between the languages.
Results	The source language scores for English vary between language pairs because of the difference in syntactic annotation and role subset used.
Results	For more distant language pairs , the contributions of individual feature groups are less interpretable, so we only highlight a few observations.

language pairs is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

18. Automatic Detection of Multilingual Dictionaries on the Web

Grigonyte, Gintare and Baldwin, Timothy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	training over domain-specific dictionaries from other language pairs ), and low-density languages where there are few dictionaries and Wikipedia articles to train the method on.
Motivation	org aspire to aggregate these dictionaries into a single lexical database, but are hampered by the need to identify individual multilingual dictionaries, especially for language pairs where there is a sparsity of data from existing dictionaries (Baldwin et al., 2010; Kamholz and Pool, to appear).
Motivation	This paper is an attempt to automate the detection of multilingual dictionaries on the web, through query construction for an arbitrary language pair .
Results	Most queries returned no results; indeed, for the en-ar language pair , only 49/1000 queries returned documents.
Results	Among the 7 language pairs , en-es, en-de, en-fr and en-it achieved the highest MAP scores.
Results	In terms of unique lexical resources found with 50 queries, the most successful language pairs were en-fr, en-de and en-it.

language pairs is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

language pairs (6)

19. Dependency-based Pre-ordering for Chinese-English Machine Translation

Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.
Introduction	SMT systems have difficulties translating between distant language pairs such as Chinese and English.
Introduction	Reordering therefore becomes a key issue in SMT systems between distant language pairs .
Introduction	Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs , such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).

language pairs is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

20. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpora	We considered the English-German and English-French language pairs from this corpus.
Experiments	This task involves learning language independent embeddings which are then used for document classification across the English-German language pair .
Experiments	In the single mode, vectors are learnt from a single language pair (en-X), while in the joint mode vector-learning is performed on all parallel sub-corpora simultaneously.
Experiments	In the English case we train twelve individual classifiers, each using the training data of a single language pair only.

language pairs is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments & Results	The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs .
Experiments & Results	The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair .
Experiments & Results	Let us first zoom in to convey a sense of scale on a specific language pair .

language pairs is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

Zollmann, Andreas and Vogel, Stephan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
Conclusion and discussion	Using automatically obtained word clusters instead of POS tags yields essentially the same results, thus making our methods applicable to all languages pairs with parallel corpora, whether syntactic resources are available for them or not.
Experiments	Even though a key advantage of our method is its applicability to resource-poor languages, we used a language pair for which lin-
Experiments	Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources.
Introduction	The Probabilistic Synchronous Context Free Grammar (PSCFG) formalism suggests an intuitive approach to model the long-distance and lexically sensitive reordering phenomena that often occur across language pairs considered for statistical machine translation.

language pairs is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Context-Dependent Multilingual Lexical Lookup for Under-Resourced Languages

Lim, Lian Tze and Soon, Lay-Ki and Lim, Tek Yong and Tang, Enya Kong and Ranaivo-Malançon, Bali

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We extract translation context knowledge from a bilingual comparable corpora of a richer-resourced language pair , and inject it into a multilingual lexicon.
Introduction	We are interested in leveraging richer-resourced language pairs to enable context-dependent lexical lookup for under-resourced languages.
Introduction	We propose a rapid approach for acquiring them from an untagged, comparable bilingual corpus of a (richer-resourced) language pair in section 3.
Typical Resource Requirements for Translation Selection	However, aligned corpora can be difficult to obtain for under-resourced language pairs , and are expensive to construct.
Typical Resource Requirements for Translation Selection	(2011) also tackled the problem of cross-lingual disambiguation for under-resourced language pairs (English—Persian) using Wikipedia articles, by applying the one sense per collocation and one sense per discourse heuristics on a comparable corpus.

language pairs is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

24. Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

Vaswani, Ashish and Huang, Liang and Chiang, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs .
Experiments	For each language pair , we extracted grammar rules from the same data that were used for word alignment.
Introduction	These models are unsupervised, making them applicable to any language pair for which parallel text is available.
Introduction	Although manually-aligned data is very valuable, it is only available for a small number of language pairs .

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. Unsupervised Multilingual Learning for Morphological Segmentation

Snyder, Benjamin and Barzilay, Regina

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental SetUp	To obtain our corpus of short parallel phrases, we preprocessed each language pair using the Giza++ alignment toolkit.6 Given word alignments for each language pair , we extract a list of phrase pairs that form independent sets in the bipartite alignment graph.
Introduction	When modeled in tandem, gains are observed for all language pairs , reducing relative error by as much as 24%.
Introduction	Furthermore, our experiments show that both related and unrelated language pairs benefit from multilingual learning.
Results	However, once character-to-character phonetic correspondences are added as an abstract morpheme prior (final two rows), we find the performance of related language pairs outstrips English, reducing relative error over MONOLINGUAL by 10% and 24% for the Hebrew/Arabic pair.

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. Applying Morphology Generation Models to Machine Translation

Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and future work	In the future, we would like to include more sophistication in the design of a lexicon for a particular language pair based on error analysis, and extend our preprocessing to include other operations such as word segmentation.
Integration of inflection models with MT systems	However, for some language pairs , stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target.
Machine translation systems and data	For each language pair , we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
Machine translation systems and data	All MT systems for a given language pair used the same datasets.

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

27. Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	Three human annotators who are fluent in the two languages manually annotated N-to-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP).
Experiment	By keeping only the sentence chunks whose Korean chunk appears in all language pairs , we were left with 859 sentence chunk pairs.
Experiment	The subjectivity analysis systems are evaluated with all language pairs with kappa and Pearson’s correlation coefficients.

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Building Sentiment Lexicons for All Major Languages

Chen, Yanqing and Skiena, Steven

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs .
Extrinsic Evaluation: Consistency of Wikipedia Sentiment	We use the Spearman correlation coefficient to measure the consistence of sentiment distribution across all entities with pages in a particular language pair .
Introduction	Each language pair exhibits a Spearman sentiment correlation of at least 0.14, with an average correlation of 0.28 over all pairs.
Knowledge Graph Construction	Closely related language pairs (i.e.

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Of course, for many language pairs and domains, parallel data is not available.
Introduction	As successful work develops along this line, we expect more domains and language pairs to be conquered by SMT.
Machine Translation as a Decipherment Task	Data: We work with the Spanish/English language pair and use the following corpora in our MT experiments:
Machine Translation as a Decipherment Task	0 OPUS movie subtitle corpus: This is a large open source collection of parallel corpora available for multiple language pairs (Tiedemann, 2009).

language pairs is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

30. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Word Alignment	We carried out experiments on two language pairs : Arabic to English and Czech to English.
Word Alignment	Variational Bayes is not consistent across different language pairs .
Word Alignment	While fractional KN does beat the baseline for both language pairs, the value of D, which we optimized D to maximize Fl , is not consistent across language pairs : as shown in Figure 2, on Arabic-English, a smaller D is better, while for Czech-English, a larger D is better.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Experiments	The vocabulary consists of words that occur in at least two different sentences, which is 31K words for both language pairs .
Experiments	The results (Table 1 and Table 2) show that direct integration improves accuracy across all six test sets on both language pairs .

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

32. Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	As future work we would like to evaluate our models on other language pairs .
Related work	The task of directly learning a reordering model for language pairs that are very different is closely related to the task of parsing and hence work on semi-supervised parsing (Koo et al., 2008; McClosky et al., 2006; Suzuki et al., 2009) is broadly related to our work.
Reordering issues in Urdu-English translation	In this section we describe the main sources of word order differences between Urdu and English since this is the language pair we experiment with in this paper.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Nguyen, ThuyLinh and Vogel, Stephan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Results	We will report the impact of integrating phrase-based features into Hiero systems for three language pairs : Arabic-English, Chinese-English and German-English.
Introduction	Yet, tree-based translation often underperforms phrase-based translation in language pairs with short range reordering such as Arabic-English translation (Zollmann et al., 2008; Birch et al., 2009).
Introduction	This is important for language pairs with strict reordering.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Prediction of Learning Curves in Machine Translation

Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inferring a learning curve from mostly monolingual data	For each configuration (combination of language pair and domain) 0 and test set If in Table 2, a gold curve is fitted using the selected tri-parameter power-law family using a fine grid of corpus sizes.
Introduction	Our experiments involve 30 distinct language pair and domain combinations and 96 different learning curves.
Selecting a parametric family of curves	for all the six families on a test dataset for English-German language pair .

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. Together We Can: Bilingual Bootstrapping for WSD

Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.
Introduction	Such a bilingual bootstrapping strategy when tested on two domains, viz, Tourism and Health using Hindi (L1) and Marathi (L2) as the language pair , consistently does better than a baseline strategy which uses only seed data for training without performing any bootstrapping.
Synset Aligned Multilingual Dictionary	The average number of such links per synset per language pair is approximately 3.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Wordnet (14)
synset (13)
F-score (12)

36. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures.
Experimental Setup and Results	12The experience with MERT for this language pair has not been very positive.
Experimental Setup and Results	In order to alleviate the lack of large scale parallel corpora for the English—Turkish language pair , we experimented with augmenting the training data with reliable phrase pairs obtained from a previous alignment.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

37. Hindi-to-Urdu Machine Translation through Transliteration

Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
Conclusion	In closely related language pairs such as Hindi-Urdu with a significant amount of vocabulary overlap,
Evaluation	The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

38. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	(Zeman and Resnik, 2008) assumed that the morphology and syntax in the language pair should be very similar, and that is so for the language pair that they considered, Danish and Swedish, two very close north European languages.
Introduction	What our method relies on is not the close relation of the chosen language pair but the similarity of two treebanks, this is the most different from the previous work.
The Related Work	As fewer language properties are concerned, our approach holds the more possibility to be extended to other language pairs than theirs.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

39. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Unfortunately, large quantities of parallel data are not readily available for some languages pairs , therefore limiting the potential use of current SMT systems.
Introduction	It is especially difficult to obtain such a domain-specific corpus for some language pairs such as Chinese to Spanish translation.
Using RBMT Systems for Pivot Translation	For many source-target language pairs , the commercial pivot-source and/or pivot-target RBMT systems are available on markets.

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

40. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	(2008) showed that most of the improvement from lattice-based consensus decoding comes from lattice-based expectations, not search: searching over lattices instead of k-best lists did not change results for two language pairs, and improved a third language pair by 0.3 BLEU.
Experimental Results	Despite this optimization, our new Algorithm 3 was an average of 80 times faster across systems and language pairs .
Introduction	We also show that using forests outperforms using k-best lists consistently across language pairs .

language pairs is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: