Index of papers in Proc. ACL 2013 that mention

parallel corpora

Seen in text as:

parallel corpora (34)

Seen in 34 sentences in 8 papers.

1. The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering for Cross Lingual Sentiment Analysis	Given a parallel bilingual corpus, word clusters in S can be aligned to clusters in T. Word alignments are created using parallel corpora .
Clustering for Cross Lingual Sentiment Analysis	Direct cluster linking approach suffers from the size of alignment dataset in the form of parallel corpora .
Conclusion and Future Work	For CLSA, clusters linked together using unlabelled parallel corpora do away with the need of translating labelled corpora from one language to another using an intermediary MT system or bilingual dictionary.
Conclusion and Future Work	The approach presented here for CLSA will still require a parallel corpora .
Conclusion and Future Work	However, the size of the parallel corpora required
Experimental Setup	To create alignments, English-Hindi and English-Marathi parallel corpora from ILCI were used.

parallel corpora is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

2. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora , extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding.
Introduction	names in parallel corpora , updating word segmentation, word alignment and grammar extraction (Section 3.1).
Name-aware MT	We built a NAMT system from such name-tagged parallel corpora .
Name-aware MT	The realigned parallel corpora are used to train our NAMT system based on SCFG.
Name-aware MT	However, the original parallel corpora contain many high-frequency names, which can already be handled well by the baseline MT.

parallel corpora is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

3. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	These when combined with standard MT systems such as Moses (Koehn et al., 2007) trained on parallel corpora , have been shown to yield some BLEU score improvements.
Experiments and Results	OPUS movie subtitle corpus (Tiedemann, 2009): This is a large open source collection of parallel corpora available for multiple language pairs.
Experiments and Results	This is achieved without using any seed lexicon or parallel corpora .
Introduction	Statistical machine translation (SMT) systems these days are built using large amounts of bilingual parallel corpora .
Introduction	The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc.

parallel corpora is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

4. Extracting bilingual terminologies from comparable corpora

Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments 5.1 Data Sources	Another application of the extracted term pairs is to use them to enhance existing parallel corpora to train SMT systems.
Introduction	choose to focus on comparable corpora because for many less widely spoken languages and for technical domains where new terminology is constantly being introduced, parallel corpora are simply not available.
Related Work	For instance, Kupiec (1993) uses statistical techniques and extracts bilingual noun phrases from parallel corpora tagged with terms.
Related Work	(2010) also apply statistical methods to extract terms/phrases from parallel corpora .

parallel corpora is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

5. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Instead of difficult and expensive annotation, we build a gold-standard by leveraging cheaply available parallel corpora , targeting our approach to the problem of domain adaptation for machine translation.
Data and Gold Standard	In all parallel corpora , we normalize the English for American spelling.
Experiments	“representative tokens”) extracted from fairly large new domain parallel corpora (see Table 3), consisting of between 22 and 36 thousand parallel sentences, which yield between 8 and 35 thousand representative tokens.
Related Work	In contrast, our SENSESPOTTING task leverages automatically word-aligned parallel corpora as a source of annotation for supervision during training and evaluation.

parallel corpora is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

6. An Information Theoretic Approach to Bilingual Word Clustering

Faruqui, Manaal and Dyer, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an information theoretic obj ec-tive for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages.
Experiments	Corpora for Clustering: We used parallel corpora for {Arabic, English, French, Korean & Turkish}-German pairs from WIT-3 corpus (Cet-tolo et al., 2012) 5, which is a collection of translated transcriptions of TED talks.
Experiments	Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different.

parallel corpora is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. Bridging Languages through Etymology: The case of cross language text categorization

Nastase, Vivi and Strapparava, Carlo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross Language Text Categorization	categorization problem to the monolingual setting (Fortuna and Shawe-Taylor, 2005); to cast the cross-language text categorization problem into two monolingual settings for active learning (Liu et al., 2012); to translate and adapt a model built on language L8 to language L; (Rigutini et al., 2005), (Shi et al., 2010); to produce parallel corpora for multi-View learning (Guo and Xiao, 2012).
Cross Language Text Categorization	posed to parallel corpora for CLTC, use LSA to build multilingual domain models.
Introduction	The task becomes more difficult when the data consists of comparable corpora in the two languages — documents on the same topics (e. g. sports, economy) — instead of parallel corpora — there exists a one-to-one correspondence

parallel corpora is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

8. Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Zhang, Jiajun and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains.
Introduction	However, all of these state-of-the-art translation models rely on the parallel corpora to induce translation rules and estimate the corresponding parameters.
Introduction	It is unfortunate that the parallel corpora are very expensive to collect and are usually not available for resource-poor languages and for many specific domains even in a resource-rich language pair.

parallel corpora is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: