SciSurf: Index of "parallel corpus" in Proc. ACL 2013

Index of papers in Proc. ACL 2013 that mention

parallel corpus

Seen in text as:

parallel corpus (37)

Seen in 36 sentences in 7 papers.

1. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Future work includes studying the effect of size of parallel corpus on the induced oov translations.
Conclusion	Increasing the size of parallel corpus on one hand reduces the number of oovs.
Experiments & Results 4.1 Experimental Setup	We word-aligned the dev/test sets by concatenating them to a large parallel corpus and running GIZA++ on the whole set.
Experiments & Results 4.1 Experimental Setup	appearing more than once in the parallel corpus and being assigned to multiple different phrases), we take the average of reciprocal ranks for each of them.
Experiments & Results 4.1 Experimental Setup	The generated candidate translations for the oovs can be added to the phrase-table created using the parallel corpus to increase the coverage of the phrase-table.
Graph-based Lexicon Induction	Given a (possibly small amount of) parallel data between the source and target languages, and a large monolingual data in the source language, we construct a graph over all phrase types in the monolingual text and the source side of the parallel corpus and connect phrases that have similar meanings (i.e.
Graph-based Lexicon Induction	There are three types of vertices in the graph: i) labeled nodes which appear in the parallel corpus and for which we have the target-side
Graph-based Lexicon Induction	The labels are translations and their probabilities (more specifically p(e\| f )) from the phrase-table extracted from the parallel corpus .

parallel corpus is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

2. Generalizing Image Captions for Image-Text Parallel Corpus

Kuznetsova, Polina and Ordonez, Vicente and Berg, Alexander and Berg, Tamara and Choi, Yejin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus with respect to a concrete application of image caption transfer.
Code was provided by Deng et a1. (2012).	We evaluate the usefulness of our new image-text parallel corpus for automatic generation of image descriptions.
Code was provided by Deng et a1. (2012).	Therefore, we also report scores based on semantic matching, which gives partial credits to word pairs based on their lexical similarity.5 The best performing approach with semantic matching is VISUAL (with LM = Image corpus), improving BLEU, Precision, F—score substantially over those of ORIG, demonstrating the extrinsic utility of our newly generated image-text parallel corpus in comparison to the original database.
Conclusion	We have introduced the task of image caption generalization as a means to reduce noise in the parallel corpus of images and text.
Introduction	Evaluation results show both the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus .
Introduction	The new parallel corpus will be made publicly available.2

parallel corpus is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

3. Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	We use a parallel corpus of 3.9M words consisting of 1.7M words from the NIST MT—08 training data set and 2.2M words extracted from parallel news stories on the
Experimental setup	The parallel corpus is used for building our phrased based machine translation system and to add training data for our reordering model.
Experimental setup	For our English language model, we use the Gigaword English corpus in addition to the English side of our parallel corpus .
Generating reference reordering from parallel sentences	This model allows us to combine features from the original reordering model along with information coming from the alignments to find source reorderings given a parallel corpus and alignments.
Related work	(DeNero and Uszkoreit, 2011; Visweswariah et al., 2011; Neubig et al., 2012) focus on the use of manual word alignments to learn preordering models and in both cases no benefit was obtained by using the parallel corpus in addition to manual word alignments.
Results and Discussions	Table 3: mBLEU with different methods to generate reordering model training data from a machine aligned parallel corpus in addition to manual word alignments.

parallel corpus is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

4. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	Our parallel corpus contains about 26 million unique sentence pairs in total which are mined from web.
Experiments and Results	The result is not surprising considering our parallel corpus is quite large, and similar observations have been made in previous work as (DeNero and Macherey, 2011) that better alignment quality does not necessarily lead to better end-to-end result.
Training	As we do not have a large manually word aligned corpus, we use traditional word alignment models such as HMM and IBM model 4 to generate word alignment on a large parallel corpus .
Training	Our vocabularies V8 and Vt contain the most frequent 100,000 words from each side of the parallel corpus , and all other words are treated as unknown words.
Training	As there is no clear stopping criteria, we simply run the stochastic optimizer through parallel corpus for N iterations.

parallel corpus is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. An Information Theoretic Approach to Bilingual Word Clustering

Faruqui, Manaal and Dyer, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different.
Word Clustering	For concreteness, A(:c, y) will be the number of times that cc is aligned to y in a word aligned parallel corpus .
Word Clustering	We compare two different clusterings of a two-sentence Arabic-English parallel corpus (the English half of the corpus contains the same sentence, twice, while the Arabic half has two variants with the same meaning).

parallel corpus is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

6. Cross-lingual Transfer of Semantic Role Labeling Models

Kozhevnikov, Mikhail and Titov, Ivan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	notation projection approaches require sentence-and word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus , cross-lingual model transfer can be performed using only a bilingual dictionary.
Evaluation	Projection Baseline: The projection baseline we use for English-Czech and English-Chinese is a straightforward one: we label the source side of a parallel corpus using the source-language model, then identify those verbs on the target side that are aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion.
Model Transfer	The mapping (bilingual dictionary) we use is derived from a word-aligned parallel corpus , by identifying, for each word in the target language,

parallel corpus is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering for Cross Lingual Sentiment Analysis	As a viable alternative, cluster linkages could be learned from a bilingual parallel corpus and these linkages can be used to bridge the language gap for CLSA.
Experimental Setup	English-Hindi parallel corpus contains 45992 sentences and English-Marathi parallel corpus contains 47881 sentences.
Introduction	To perform CLSA, this study leverages unlabelled parallel corpus to generate the word alignments.

parallel corpus is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: