Index of papers in Proc. ACL 2013 that mention
  • parallel corpus
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Conclusion
Future work includes studying the effect of size of parallel corpus on the induced oov translations.
Conclusion
Increasing the size of parallel corpus on one hand reduces the number of oovs.
Experiments & Results 4.1 Experimental Setup
We word-aligned the dev/test sets by concatenating them to a large parallel corpus and running GIZA++ on the whole set.
Experiments & Results 4.1 Experimental Setup
appearing more than once in the parallel corpus and being assigned to multiple different phrases), we take the average of reciprocal ranks for each of them.
Experiments & Results 4.1 Experimental Setup
The generated candidate translations for the oovs can be added to the phrase-table created using the parallel corpus to increase the coverage of the phrase-table.
Graph-based Lexicon Induction
Given a (possibly small amount of) parallel data between the source and target languages, and a large monolingual data in the source language, we construct a graph over all phrase types in the monolingual text and the source side of the parallel corpus and connect phrases that have similar meanings (i.e.
Graph-based Lexicon Induction
There are three types of vertices in the graph: i) labeled nodes which appear in the parallel corpus and for which we have the target-side
Graph-based Lexicon Induction
The labels are translations and their probabilities (more specifically p(e| f )) from the phrase-table extracted from the parallel corpus .
parallel corpus is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kuznetsova, Polina and Ordonez, Vicente and Berg, Alexander and Berg, Tamara and Choi, Yejin
Abstract
Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus with respect to a concrete application of image caption transfer.
Code was provided by Deng et a1. (2012).
We evaluate the usefulness of our new image-text parallel corpus for automatic generation of image descriptions.
Code was provided by Deng et a1. (2012).
Therefore, we also report scores based on semantic matching, which gives partial credits to word pairs based on their lexical similarity.5 The best performing approach with semantic matching is VISUAL (with LM = Image corpus), improving BLEU, Precision, F—score substantially over those of ORIG, demonstrating the extrinsic utility of our newly generated image-text parallel corpus in comparison to the original database.
Conclusion
We have introduced the task of image caption generalization as a means to reduce noise in the parallel corpus of images and text.
Introduction
Evaluation results show both the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus .
Introduction
The new parallel corpus will be made publicly available.2
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Experimental setup
We use a parallel corpus of 3.9M words consisting of 1.7M words from the NIST MT—08 training data set and 2.2M words extracted from parallel news stories on the
Experimental setup
The parallel corpus is used for building our phrased based machine translation system and to add training data for our reordering model.
Experimental setup
For our English language model, we use the Gigaword English corpus in addition to the English side of our parallel corpus .
Generating reference reordering from parallel sentences
This model allows us to combine features from the original reordering model along with information coming from the alignments to find source reorderings given a parallel corpus and alignments.
Related work
(DeNero and Uszkoreit, 2011; Visweswariah et al., 2011; Neubig et al., 2012) focus on the use of manual word alignments to learn preordering models and in both cases no benefit was obtained by using the parallel corpus in addition to manual word alignments.
Results and Discussions
Table 3: mBLEU with different methods to generate reordering model training data from a machine aligned parallel corpus in addition to manual word alignments.
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
Experiments and Results
Our parallel corpus contains about 26 million unique sentence pairs in total which are mined from web.
Experiments and Results
The result is not surprising considering our parallel corpus is quite large, and similar observations have been made in previous work as (DeNero and Macherey, 2011) that better alignment quality does not necessarily lead to better end-to-end result.
Training
As we do not have a large manually word aligned corpus, we use traditional word alignment models such as HMM and IBM model 4 to generate word alignment on a large parallel corpus .
Training
Our vocabularies V8 and Vt contain the most frequent 100,000 words from each side of the parallel corpus , and all other words are treated as unknown words.
Training
As there is no clear stopping criteria, we simply run the stochastic optimizer through parallel corpus for N iterations.
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Experiments
Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different.
Word Clustering
For concreteness, A(:c, y) will be the number of times that cc is aligned to y in a word aligned parallel corpus .
Word Clustering
We compare two different clusterings of a two-sentence Arabic-English parallel corpus (the English half of the corpus contains the same sentence, twice, while the Arabic half has two variants with the same meaning).
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kozhevnikov, Mikhail and Titov, Ivan
Conclusion
notation projection approaches require sentence-and word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus , cross-lingual model transfer can be performed using only a bilingual dictionary.
Evaluation
Projection Baseline: The projection baseline we use for English-Czech and English-Chinese is a straightforward one: we label the source side of a parallel corpus using the source-language model, then identify those verbs on the target side that are aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion.
Model Transfer
The mapping (bilingual dictionary) we use is derived from a word-aligned parallel corpus , by identifying, for each word in the target language,
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Clustering for Cross Lingual Sentiment Analysis
As a viable alternative, cluster linkages could be learned from a bilingual parallel corpus and these linkages can be used to bridge the language gap for CLSA.
Experimental Setup
English-Hindi parallel corpus contains 45992 sentences and English-Marathi parallel corpus contains 47881 sentences.
Introduction
To perform CLSA, this study leverages unlabelled parallel corpus to generate the word alignments.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: