Index of papers in Proc. ACL that mention

parallel data

Seen in text as:

parallel data (250)
Parallel Data (5)
Parallel data (3)

Seen in 253 sentences in 28 papers.

1. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
Evaluation	We also examine how our approach can learn from noisy parallel data compared to the traditional SMT system.
Evaluation	We used this set in two ways: either to augment the parallel data presented in Table 2, or to augment the non-comparable monolingual data in Table 3 for graph construction.
Generation & Propagation	If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data ) is used as the label, and is sub-
Introduction	However, the limiting factor in the success of these techniques is parallel data availability.
Introduction	While parallel data is generally scarce, monolingual resources exist in abundance and are being created at accelerating rates.
Introduction	Can we use monolingual data to augment the phrasal translations acquired from parallel data ?

parallel data is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

2. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

Ma, Xuezhe and Xia, Fei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Tools	The parallel data come from the Europarl corpus version 7 (Koehn, 2005) and Kaist Corpus4.
Data and Tools	The parallel data for these three languages are also from the Europarl corpus version 7.
Data and Tools	POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro-
Experiments	6Japanese and Indonesia are excluded as no practicable parallel data are available.
Introduction	In this paper, we consider a practically motivated scenario, in which we want to build statistical parsers for resource-poor target languages, using existing resources from a resource-rich source language (like English).1 We assume that there are absolutely no labeled training data for the target language, but we have access to parallel data with a resource-rich language and a sufficient amount of labeled training data to build an accurate parser for the resource-rich language.
Introduction	(2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.
Introduction	We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data.
Our Approach	Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
Our Approach	In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
Our Approach	We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:

parallel data is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

3. Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Joint Model with Unlabeled Parallel Text	sentiment) bilingual (in L1 and L2) parallel data U that are defined as follows.
A Joint Model with Unlabeled Parallel Text	where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
A Joint Model with Unlabeled Parallel Text	However, there could be considerable noise in real-world parallel data , i.e.
Abstract	We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data .
Experimental Setup 4.1 Data Sets and Preprocessing	We also try to remove neutral sentences from the parallel data since they can introduce noise into our model, which deals only with positive and negative examples.
Experimental Setup 4.1 Data Sets and Preprocessing	Co-Training with SVMs (Co-SVM): This method applies SVM-based co-training given both the labeled training data and the unlabeled parallel data following Wan (2009).
Introduction	We furthermore find that improvements, albeit smaller, are obtained when the parallel data is replaced with a pseudo-parallel (i.e.
Results and Analysis	8 By making use of the unlabeled parallel data , our proposed approach improves the accuracy, compared to MaXEnt, by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.

parallel data is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

4. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Of course, for many language pairs and domains, parallel data is not available.
Machine Translation as a Decipherment Task	We now turn to the problem of MT without parallel data .
Machine Translation as a Decipherment Task	Next, we present two novel decipherment approaches for MT training without parallel data .
Machine Translation as a Decipherment Task	Bayesian Decipherment: We introduce a novel method for estimating IBM Model 3 parameters without parallel data , using Bayesian learning.
Word Substitution Decipherment	Before we tackle machine translation without parallel data , we first solve a simpler problem—word substitution decipherment.

parallel data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

5. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.
Background: Deep Learning	Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase.
Experiments	In the pre-training phase, all parallel data is fed into two neural networks respectively for DAE training, where network parameters W and b are randomly initialized.
Experiments	The parallel data we use is released by LDC3.
Experiments	Translation models are trained over the parallel data that is automatically word-aligned
Introduction	One typical property of these approaches in common is that they only utilize parallel data where document boundaries are explicitly given.
Introduction	However, this situation does not always happen since there is considerable amount of parallel data which does not have document boundaries.
Introduction	This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible.
Topic Similarity Model with Neural Network	learn topic representations using sentence-level parallel data .
Topic Similarity Model with Neural Network	3.2 Fine-tuning with parallel data
Topic Similarity Model with Neural Network	Consequently, the whole neural network can be fine-tuned towards the supervised criteria with the help of parallel data .

parallel data is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

6. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data .
Abstract	By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly.
Experiment	CLMM includes two hyper-parameters (A3 and At) controlling the contribution of unlabeled parallel data .
Experiment	4.5 The Influence of Unlabeled Parallel Data
Experiment	We investigate how the size of the unlabeled parallel data affects the sentiment classification in this subsection.
Introduction	Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
Introduction	CLMM is a generative model that treats the source language and target language words in parallel data as generated simultaneously by a set of mixture components.
Introduction	This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.

parallel data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

7. Dirt Cheap Web-Scale Parallel Text from the Common Crawl

Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	of language pairs, large amounts of parallel data
Abstract	However, for most language pairs and domains there is little to no curated parallel data available.
Abstract	Hence discovery of parallel data is an important first step for translation between most of the world’s languages.

parallel data is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

8. Microblogs as Parallel Corpora

Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As a supplement to existing parallel training data, our automatically extracted parallel data yields substantial translation quality improvements in translating microblog text and modest improvements in translating edited news commentary.
Introduction	Section 2 describes the related work in parallel data extraction.
Introduction	Section 3 presents our model to extract parallel data within the same document.
Parallel Data Extraction	We will now describe our method to extract parallel data from Microblogs.
Parallel Data Extraction	these are also considered for the extraction of parallel data .
Parallel Segment Retrieval	Prior work on finding parallel data attempts to reason about the probability that pairs of documents (x, y) are parallel.
Parallel Segment Retrieval	,xn, and consisting of n tokens, and need to determine whether there is parallel data in X, and if so, where are the parallel segments and their languages.
Parallel Segment Retrieval	The main problem we address is to find the parallel data when the boundaries of the parallel segments are not defined explicitly.
Related Work	Automatic collection of parallel data is a well-studied problem.
Related Work	We aim to propose a method that acquires large amounts of parallel data for free.

parallel data is mentioned in 28 sentences in this paper.

Topics mentioned in this paper:

9. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	of the full parallel text; we do not use the English side of the parallel data for actually building systems.
Experiments	One disadvantage to the previous method for evaluating the SENSESPOTTING task is that it requires parallel data in a new domain.
Experiments	Suppose we have no parallel data in the new domain at all, yet still want to attack the SENSESPOTTING task.
Introduction	We operate under the framework of phrase sense disambiguation (Carpuat and Wu, 2007), in which we take automatically align parallel data in an old domain to generate an initial old-domain sense inventory.
New Sense Indicators	Table 2: Basic characteristics of the parallel data .
Related Work	In contrast, the SENSESPOTTING task consists of detecting when senses are unknown in parallel data .
Task Definition	From an applied perspective, the assumption of a small amount of parallel data in the new domain is reasonable: if we want an MT system for a new domain, we will likely have some data for system tuning and evaluation.

parallel data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

10. Translating Dialectal Arabic to English

Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Further, adapting large MSAflEnglish parallel data increases the lexical coverage, reduces OOVs to 0.7% and leads to an absolute BLEU improvement of 2.73 points.
Conclusion	adapted parallel data showed an improvement of 1.87 BLEU points over our best baseline.
Introduction	Later, we applied an adaptation method to incorporate MSA/English parallel data .
Introduction	— We built a phrasal Machine Translation (MT) system on adapted EgyptiarflEnglish parallel data , which outperformed a non-adapted baseline by 1.87 BLEU points.
Introduction	— We used phrase-table merging (Nakov and Ng, 2009) to utilize MSA/English parallel data with the available in-domain parallel data .
Previous Work	This can be done by either translating between the related languages using word-level translation, character level transformations, and language specific rules (Durrani et al., 2010; Hajic et al., 2000; Nakov and Tiedemann, 2012), or by concatenating the parallel data for both languages (Nakov and Ng, 2009).
Previous Work	These translation methods generally require parallel data , for which hardly any exists between dialects and MSA.
Previous Work	Their best Egyptian/English system was trained on dialect/English parallel data .

parallel data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

BLEU (13)
parallel data (9)
LM (8)

11. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We presented a novel approach for inducing oov translations from a monolingual corpus on the source side and a parallel data using graph propagation.
Experiments & Results 4.1 Experimental Setup	From the deV and test sets, we extract all source words that do not appear in the phrase-table constructed from the parallel data .
Experiments & Results 4.1 Experimental Setup	Similarly, the value of original four probability features in the phrase-table for the new entries are set to l. The entire training pipeline is as follows: (i) a phrase table is constructed using parallel data as usual, (ii) oovs for dev and test sets are extracted, (iii) oovs are translated using graph propagation, (iv) oovs and translations are added to the phrase table, introducing a new feature type, (v) the new phrase table is tuned (with a LM) using MERT (Och, 2003) on the dev set.
Experiments & Results 4.1 Experimental Setup	The correctness of this gold standard is limited to the size of the parallel data used as well as the quality of the word alignment software toolkit, and is not 100% precise.
Graph-based Lexicon Induction	Given a (possibly small amount of) parallel data between the source and target languages, and a large monolingual data in the source language, we construct a graph over all phrase types in the monolingual text and the source side of the parallel corpus and connect phrases that have similar meanings (i.e.
Graph-based Lexicon Induction	When a relatively small parallel data is used, unlabeled nodes outnumber labeled ones and many of them lie on the paths between an oov node to labeled ones.
Introduction	Increasing the size of the parallel data can reduce the number of oovs.
Introduction	Pivot language techniques tackle this problem by taking advantage of available parallel data between the source language and a third language.
Introduction	(2009) in which a graph is constructed from source language monolingual text1 and the source-side of the available parallel data .

parallel data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

12. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.
Abstract	Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data .
Approach	The idea is that, given enough parallel data , a shared representation of two parallel sentences would be forced to capture the common elements between these two sentences.
Conclusion	To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models.
Overview	A key difference between our approach and those listed above is that we only require sentence-aligned parallel data in our otherwise unsupervised learning function.
Overview	Parallel data in multiple languages provides an
Related Work	However, there exists a corpus of prior work on learning multilingual embeddings or on using parallel data to transfer linguistic information across languages.
Related Work	(2012), our baseline in §5.2, use a form of multi-agent learning on word-aligned parallel data to transfer embeddings from one language to another.

parallel data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

13. Crosslingual Induction of Semantic Roles

Titov, Ivan and Klementiev, Alexandre

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations.
Abstract	When applied to German—English parallel data , our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on nonparallel sentences.
Conclusions	We show that an agreement signal extracted from parallel data provides indirect supervision capable of substantially improving a state-of-the-art model for semantic role induction.
Introduction	The goal of this work is to show that parallel data is useful in unsupervised induction of shallow semantic representations.
Multilingual Extension	As we argued in Section 1, our goal is to penalize for disagreement in semantic structures predicted for each language on parallel data .
Multilingual Extension	Intuitively, when two arguments are aligned in parallel data , we expect them to be labeled with the same semantic role in both languages.
Multilingual Extension	Specifically, we augment the joint probability with a penalty term computed on parallel data:

parallel data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

14. Cross-lingual Transfer of Semantic Role Labeling Models

Kozhevnikov, Mikhail and Titov, Ivan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and Motivation	The approaches in this third group often use parallel data to bridge the gap between languages.
Background and Motivation	However, they are very sensitive to the quality of parallel data , as well as the accuracy of a source-language model on it.
Background and Motivation	This approach yields an SRL model for a new language at a very low cost, effectively requiring only a source language model and parallel data .
Conclusion	It allows one to quickly construct an SRL model for a new language without manual annotation or language-specific heuristics, provided an accurate model is available for one of the related languages along with a certain amount of parallel data for the two languages.
Conclusion	notation projection approaches require sentence-and word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus, cross-lingual model transfer can be performed using only a bilingual dictionary.
Evaluation	We use parallel data to construct a bilingual dictionary used in word mapping, as well as in the projection baseline.
Related Work	The basic idea behind model transfer is similar to that of cross-lingual annotation projection, as we can see from the way parallel data is used in, for example, McDonald et al.

parallel data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Unsupervised Multilingual Grammar Induction

Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters.
Experimental setup	Though the model is trained using parallel data , during testing it has access only to monolingual data.
Experimental setup	This setup ensures that we are testing our model’s ability to learn better parameters at training time, rather than its ability to exploit parallel data at test time.
Introduction	We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters.
Related Work	More recently, there has been a body of work attempting to improve parsing performance by exploiting syntactically annotated parallel data .
Related Work	In one strand of this work, annotations are assumed only in a resource-rich language and are projected onto a resource-poor language using the parallel data (Hwa et al., 2005; Xi and Hwa, 2005).
Related Work	In another strand of work, syntactic annotations are assumed on both sides of the parallel data, and a model is trained to exploit the parallel data at test time as well (Smith and Smith, 2004; Burkett and Klein, 2008).

parallel data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

16. Deciphering Foreign Language by Combining Language Models and Context Vectors

Nuhn, Malte and Mauser, Arne and Ney, Hermann

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	We also compare the results on these corpora to a system trained on parallel data .
Experimental Evaluation	Och (2002) reports results of 48.2 BLEU for a single-word based translation system and 56.1 BLEU using the alignment template approach, both trained on parallel data .
Related Work	Unsupervised training of statistical translations systems without parallel data and related problems have been addressed before.
Related Work	Close to the methods described in this work, Ravi and Knight (2011) treat training and translation without parallel data as a deciphering problem.
Related Work	They perform experiments on a SpanislflEnglish task with vocabulary sizes of about 500 words and achieve a performance of around 20 BLEU compared to 70 BLEU obtained by a system that was trained on parallel data .

parallel data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

LM (27)
BLEU (16)
translation model (15)

17. Prediction of Learning Curves in Machine Translation

Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose.
Inferring a learning curve from mostly monolingual data	However, when a configuration )f four initial points is used for the same amount of ‘seed” parallel data , it outperforms both the config-Jrations with three initial points.
Inferring a learning curve from mostly monolingual data	The ability to predict the amount of parallel data required to achieve a given level of quality is very valuable in planning business deployments of statistical machine translation; yet, we are not aware of any rigorous proposal for addressing this need.
Introduction	Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific business purpose.
Introduction	This prediction, or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data , is the objective of this paper.
Introduction	They show that without any parallel data we can predict the expected translation accuracy at 75K segments within an error of 6 BLEU points (Table 4), while using a seed training corpus of 10K segments narrows this error to within 1.5 points (Table 6).

parallel data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Das, Dipanjan and Petrov, Slav

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	thank Amamag Subramanya for helping us with the implementation of label propagation and Shankar Kumar for access to the parallel data .
Experiments and Results	The parallel data came from the Europarl corpus (Koehn, 2005) and the ODS United Nations dataset (UN, 2006).
Experiments and Results	Taking the intersection of languages in these resources, and selecting languages with large amounts of parallel data , yields the following set of eight Indo-European languages: Danish, Dutch, German, Greek, Italian, Portuguese, Spanish and Swedish.
Experiments and Results	0 Projection: Our third baseline incorporates bilingual information by projecting POS tags directly across alignments in the parallel data .
Introduction	To bridge this gap, we consider a practically motivated scenario, in which we want to leverage existing resources from a resource-rich language (like English) when building tools for resource-poor foreign languages.1 We assume that absolutely no labeled training data is available for the foreign language of interest, but that we have access to parallel data with a resource-rich language.

parallel data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

19. Effective Selection of Translation Model Training Data

Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Additionally, we adopt GIZA++ to get the word alignment of in-domain parallel data and form the word translation probability table.
Experiments	We adopt five methods for extracting domain-relevant parallel data from general-domain corpus.
Experiments	When top 600k sentence pairs are picked out from general-domain corpus to train machine translation systems, the systems perform higher than the General-domain baseline trained on 16 million parallel data .
Training Data Selection Methods	These methods are based on language model and translation model, which are trained on small in-domain parallel data .
Training Data Selection Methods	t(ej\|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data .

parallel data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. An Information Theoretic Approach to Bilingual Word Clustering

Faruqui, Manaal and Dyer, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Our model can be extended for clustering any number of given languages together in a joint framework, and incorporate both monolingual and parallel data .
Experiments	Monolingual Clustering: For every language pair, we train German word clusters on the monolingual German data from the parallel data .
Experiments	Recall that A(cc, y) is the count of the alignment links between cc and 3/ observed in the parallel data , and A(cc) and A(y) are the respective marginal counts.
Introduction	Since the objective consists of terms representing the entropy monolingual data (for each language) and parallel bilingual data, it is particularly attractive for the usual situation in which there is much more monolingual data available than parallel data .

parallel data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

21. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We introduced a data collection framework that produces highly parallel data by asking different annotators to describe the same video segments.
Discussions and Future Work	While our data collection framework yields useful parallel data , it also has some limitations.
Discussions and Future Work	By pairing up descriptions of the same video in different languages, we obtain parallel data without requiring any bilingual skills.
Experiments	We quantified the utility of our highly parallel data by computing the correlation between BLEU and human ratings when different numbers of references were available.

parallel data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

22. Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

Li, Zhifei and Yarowsky, David

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	To be able to translate a Chinese abbreviation that s unseen in available parallel corpora, one may an-lotate more parallel data .
Unsupervised Translation Induction for Chinese Abbreviations	This is particularly interesting since we normally have enormous monolingual data, but a small amount of parallel data .
Unsupervised Translation Induction for Chinese Abbreviations	For example, in the translation task between Chinese and English, both the Chinese and English Gigaword have billions of words, but the parallel data has only about 30 million words.

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	We also report the first BLEU results on such a large-scale MT task under truly nonparallel settings (without using any parallel data or seed lexicon).
Experiments and Results	The results are encouraging and demonstrates the ability of the method to scale to large-scale settings while performing efficient inference with compleX models, which we believe will be especially useful for future MT application in scenarios where parallel data is hard to obtain.
Introduction	But obtaining parallel data is an expensive process and not available for all language

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

24. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

Darwish, Kareem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-lingual Features	The sentences were drawn from the UN parallel data along with a variety of parallel news data from LDC and the GALE project.
Related Work	If cross-lingual resources are available, such as parallel data , increased training data, better resources, or superior features can be used to improve the processing (ex.
Related Work	They did so by training a bilingual model and then generating more training data from unlabeled parallel data .

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

25. An Infinite Hierarchical Bayesian Model of Phrasal Translation

Cohn, Trevor and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper presents a novel method for inducing phrase-based translation units directly from parallel data , which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Analysis	We have presented a novel method for leam-ing a phrase-based model of translation directly from parallel data which we have framed as leam-ing an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Introduction	Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data , from

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

26. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We use a phrase-based system similar to Moses (Koehn et al., 2007) based on a set of common features including maximum likelihood estimates pML (elf) and pML (f \|e), lexically weighted estimates pLW(e\| f) and p LW( f \|e), word and phrase-penalties, a hierarchical reordering model (Galley and Manning, 2008), a linear distortion feature, and a modified Kneser—Ney language model trained on the target-side of the parallel data .
Experiments	Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Experiments	All neural network models are trained on the news portion of the parallel data , corresponding to 136K sentences, which we found to be most useful in initial experiments.

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

27. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We align the parallel data with GIZA++ (Och et al., 2003) and decode using Moses (Koehn et al., 2007).
Experimental Setup	A KN-smoothed 5-gram language model is trained on the target side of the parallel data with SRILM (Stolcke, 2002).
Related Work	With word-boundary-aware phrase extraction, a phrase pair containing all of “with his blue car” must have been seen in the parallel data to translate the phrase correctly at test time.

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

28. Sentence Level Dialect Identification for Machine Translation System Selection

Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	For statistical machine translation (MT), which relies on the existence of parallel data , translating from nonstandard dialects is a challenge.
Machine Translation Experiments	DAT (in the fourth column) is the DA part of the 5M word DA-En parallel data processed with the DA-MSA MT system.
Related Work	Two approaches have emerged to alleviate the problem of DA-English parallel data scarcity: using MSA as a bridge language (Sawaf, 2010; Salloum and Habash, 2011; Salloum and Habash, 2013; Sajjad et al., 2013), and using crowd sourcing to acquire parallel data (Zbib et al., 2012).

parallel data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: