Index of papers in Proc. ACL 2013 that mention
  • parallel corpora
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Clustering for Cross Lingual Sentiment Analysis
Given a parallel bilingual corpus, word clusters in S can be aligned to clusters in T. Word alignments are created using parallel corpora .
Clustering for Cross Lingual Sentiment Analysis
Direct cluster linking approach suffers from the size of alignment dataset in the form of parallel corpora .
Conclusion and Future Work
For CLSA, clusters linked together using unlabelled parallel corpora do away with the need of translating labelled corpora from one language to another using an intermediary MT system or bilingual dictionary.
Conclusion and Future Work
The approach presented here for CLSA will still require a parallel corpora .
Conclusion and Future Work
However, the size of the parallel corpora required
Experimental Setup
To create alignments, English-Hindi and English-Marathi parallel corpora from ILCI were used.
parallel corpora is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Abstract
We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora , extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding.
Introduction
names in parallel corpora , updating word segmentation, word alignment and grammar extraction (Section 3.1).
Name-aware MT
We built a NAMT system from such name-tagged parallel corpora .
Name-aware MT
The realigned parallel corpora are used to train our NAMT system based on SCFG.
Name-aware MT
However, the original parallel corpora contain many high-frequency names, which can already be handled well by the baseline MT.
parallel corpora is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Discussion and Future Work
These when combined with standard MT systems such as Moses (Koehn et al., 2007) trained on parallel corpora , have been shown to yield some BLEU score improvements.
Experiments and Results
OPUS movie subtitle corpus (Tiedemann, 2009): This is a large open source collection of parallel corpora available for multiple language pairs.
Experiments and Results
This is achieved without using any seed lexicon or parallel corpora .
Introduction
Statistical machine translation (SMT) systems these days are built using large amounts of bilingual parallel corpora .
Introduction
The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc.
parallel corpora is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob
Experiments 5.1 Data Sources
Another application of the extracted term pairs is to use them to enhance existing parallel corpora to train SMT systems.
Introduction
choose to focus on comparable corpora because for many less widely spoken languages and for technical domains where new terminology is constantly being introduced, parallel corpora are simply not available.
Related Work
For instance, Kupiec (1993) uses statistical techniques and extracts bilingual noun phrases from parallel corpora tagged with terms.
Related Work
(2010) also apply statistical methods to extract terms/phrases from parallel corpora .
parallel corpora is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
Abstract
Instead of difficult and expensive annotation, we build a gold-standard by leveraging cheaply available parallel corpora , targeting our approach to the problem of domain adaptation for machine translation.
Data and Gold Standard
In all parallel corpora , we normalize the English for American spelling.
Experiments
“representative tokens”) extracted from fairly large new domain parallel corpora (see Table 3), consisting of between 22 and 36 thousand parallel sentences, which yield between 8 and 35 thousand representative tokens.
Related Work
In contrast, our SENSESPOTTING task leverages automatically word-aligned parallel corpora as a source of annotation for supervision during training and evaluation.
parallel corpora is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Abstract
We present an information theoretic obj ec-tive for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages.
Experiments
Corpora for Clustering: We used parallel corpora for {Arabic, English, French, Korean & Turkish}-German pairs from WIT-3 corpus (Cet-tolo et al., 2012) 5, which is a collection of translated transcriptions of TED talks.
Experiments
Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different.
parallel corpora is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nastase, Vivi and Strapparava, Carlo
Cross Language Text Categorization
categorization problem to the monolingual setting (Fortuna and Shawe-Taylor, 2005); to cast the cross-language text categorization problem into two monolingual settings for active learning (Liu et al., 2012); to translate and adapt a model built on language L8 to language L; (Rigutini et al., 2005), (Shi et al., 2010); to produce parallel corpora for multi-View learning (Guo and Xiao, 2012).
Cross Language Text Categorization
posed to parallel corpora for CLTC, use LSA to build multilingual domain models.
Introduction
The task becomes more difficult when the data consists of comparable corpora in the two languages — documents on the same topics (e. g. sports, economy) — instead of parallel corpora — there exists a one-to-one correspondence
parallel corpora is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Abstract
Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains.
Introduction
However, all of these state-of-the-art translation models rely on the parallel corpora to induce translation rules and estimate the corresponding parameters.
Introduction
It is unfortunate that the parallel corpora are very expensive to collect and are usually not available for resource-poor languages and for many specific domains even in a resource-rich language pair.
parallel corpora is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: