Index of papers in Proc. ACL that mention
  • parallel sentences
Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina
Experimental setup
We test our model on three corpora of bilingual parallel sentences : English-Korean, English-Urdu, and English-Chinese.
Experimental setup
To obtain leXical alignments between the parallel sentences we employ GIZA++ (Och and Ney, 2003).
Experimental setup
Then for each pair of parallel sentences we randomly sample an initial alignment tree for the two sampled trees.
Introduction
Finally, parallel sentences are assembled from these generated part-of-speech sequences and word-level alignments.
Model
We treat the part-of-speech tag sequences of parallel sentences , as well as their
Model
Now we describe the stochastic process whereby the observed parallel sentences and their word-level alignments are generated, according to our model.
Model
Then, each pair of word-aligned parallel sentences is generated through the following process:
Related Work
Assuming that trees induced over parallel sentences have to exhibit certain structural regularities, Kuhn manually specifies a set of rules for determining when parsing decisions in the two languages are inconsistent with GIZA++ word-level alignments.
parallel sentences is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Conclusion and Future Work
We enrich contexts of parallel sentence pairs with topic related monolingual data
Experiments
These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs.
Experiments
In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
Experiments
This is not simply coincidence since we can interpret their approach as a special case in our neural network method: when a parallel sentence pair has
Related Work
In addition, our method directly maximizes the similarity between parallel sentence pairs, which is ideal for SMT decoding.
Topic Similarity Model with Neural Network
Parallel sentence
Topic Similarity Model with Neural Network
Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
Topic Similarity Model with Neural Network
Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.
parallel sentences is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Experiments
Figure 3: Evaluation of paraphrase systems trained on different numbers of parallel sentences .
Experiments
Overall, all the trained models produce reasonable paraphrase systems, even the model trained on just 28K single parallel sentences .
Experiments
Examples of the outputs produced by the models trained on single parallel sentences and on all parallel sentences are shown in Table 2.
Related Work
Another method for collecting monolingual paraphrase data involves aligning semantically parallel sentences from different news articles describing the same event (Shinyama et al., 2002; Barzilay and Lee, 2003; Dolan et al., 2004).
Related Work
While utilizing multiple translations of literary work or multiple news stories of the same event can yield significant numbers of parallel sentences , this data tend to be noisy, and reliably identifying good paraphrases among all possible sentence pairs remains an open problem.
parallel sentences is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Conclusion and Future Work
In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons.
Cross-Lingual Mixture Model for Sentiment Classification
termining polarity classes of the parallel sentences .
Cross-Lingual Mixture Model for Sentiment Classification
Particularly, for each pair of parallel sentences U: E U, we generate the words as follows.
Cross-Lingual Mixture Model for Sentiment Classification
class label for unlabeled parallel sentences ) is computed according to the following equations.
Experiment
The unlabeled parallel sentences
Experiment
This model use English labeled data and Chinese labeled data to obtain initial parameters for two maximum entropy classifiers (for English documents and Chinese documents), and then conduct EM-iterations to update the parameters to gradually improve the agreement of the two monolingual classifiers on the unlabeled parallel sentences .
Experiment
When we have 10,000 parallel sentences , the accuracy of CLMM on the two data sets quickly increases to 68.77% and 68.91%, respectively.
Related Work
They assume parallel sentences in the corpus should have the same sentiment polarity.
parallel sentences is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Conclusion
We show that a considerable amount of parallel sentence pairs can be crawled from microblogs and these can be used to improve Machine Translation by updating our translation tables with translations of newer terms.
Experiments
The quality of the parallel sentence detection did not vary significantly with different setups, so we will only show the results for the best setup, which is the baseline model with span constraints.
Experiments
From the precision and recall curves, we observe that most of the parallel data can be found at the top 30% of the filtered tweets, where 5 in 6 tweets are detected correctly as parallel, and only 1 in every 6 parallel sentences is lost.
Experiments
If we generalize this ratio for the complete set with 1124k tweets, we can expect approximately 337k parallel sentences .
parallel sentences is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Experiments
We randomly take 200,000 parallel sentences from the UN corpus of the year 2000.
Experiments
Then, we extract every f that cooccurs with e in a parallel sentence and add it to nbestTI(e) which gives us the list of candidate transliteration pairs candidateTI(e).
Experiments
Algorithm 3 Estimation of transliteration probabilities, e-to-f direction 1: unfiltered data <—1ist of word pairs 2: filtered data <—transliteration pairs extracted using Algorithm 1 3: Train a transliteration system on the filtered data 4: for all e do 5: nbestTI(e) <— 10 best transliterations for e according to the transliteration system 6: 0000(6) <— set of all f that cooccur with e in a parallel sentence candidateTI(e) <— cooc(e) U nbestTI(e) : end for : for all f do 10: pmoses( f, e) <— joint transliteration probability of e and f according to the transliterator
Introduction
The NEWSlO data sets are extracted Wikipedia InterLanguage Links (WIL) which consist of parallel phrases, whereas a parallel corpus consists of parallel sentences .
Models
For training Moses as a transliteration system, we treat each word pair as if it were a parallel sentence , by putting spaces between the characters of each word.
parallel sentences is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Experiment
However, if most domains are similar (FBIS data set) or if there are enough parallel sentence pairs (NIST data set) in each domain, then the translation performances are almost similar even with the opposite integrating orders.
Introduction
Today, more parallel sentences are drawn from divergent domains, and the size keeps growing.
Related Work
A number of approaches have been proposed to make use of the full potential of the available parallel sentences from various domains, such as domain adaptation and incremental learning for SMT.
Related Work
The similarity calculated by a information retrieval system between the training subset and the test set is used as a feature for each parallel sentence (Lu et al., 2007).
Related Work
Incremental learning in which new parallel sentences are incrementally updated to the training data is employed for SMT.
parallel sentences is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Experiments
They are neither parallel nor comparable because we cannot even extract a small number of parallel sentence pairs from this monolingual data using the method of (Munteanu and Marcu, 2006).
Experiments
For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs, and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword.
Phrase Pair Refinement and Parameterization
However, for the phrase-level probability, we cannot use maximum likelihood estimation since the phrase pairs are not extracted from parallel sentences .
Related Work
Munteanu and Marcu (2006) first extract the candidate parallel sentences from the comparable corpora and further extract the accurate sub-sentential bilingual fragments from the candidate parallel sentences using the in-domain probabilistic bilingual lexicon.
Related Work
Thus, finding the candidate parallel sentences is not possible in our situation.
parallel sentences is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Experiments
We train our parsing model with different numbers of parallel sentences to analyze the influence of the amount of parallel data on the parsing performance of our approach.
Experiments
The parallel data sets contain 500, 1000, 2000, 5000, 10000 and 20000 parallel sentences , respectively.
Experiments
We randomly extract parallel sentences from each corpora, and smaller data sets are subsets of larger ones.
parallel sentences is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
each x,- is a sentence, and x}, and x3, are parallel sentences .
A Joint Model with Unlabeled Parallel Text
Given the problem definition above, we now present a novel model to exploit the correspondence of parallel sentences in unlabeled bilingual text.
A Joint Model with Unlabeled Parallel Text
If we assume that parallel sentences are perfect translations, the two sentences in each pair should have the same polarity label, which gives us:
Abstract
We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language.
parallel sentences is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Approach
The idea is that, given enough parallel data, a shared representation of two parallel sentences would be forced to capture the common elements between these two sentences.
Approach
What parallel sentences share, of course, are their semantics.
Approach
For every pair of parallel sentences (a, b) we sample a number of additional sentence pairs (-, n) E C, where n—with high probability—is not semantically equivalent to a.
Experiments
The ADD+ model uses an additional 500k parallel sentences from the English-French corpus, resulting in one million English sentences, each paired up with either a German or a French sentence, with BI and BI+ trained accordingly.
parallel sentences is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo
Data and task
The approach uses a small amount of manually annotated article-pairs to train a document-level CRF model for parallel sentence extraction.
Data and task
This is due to two phenomena: one is that the parallel sentences sometimes contain different amounts of information and one language might use more detail than the other.
Data and task
We presented a direct semi-CRF tagging model for labeling foreign sentences in parallel sentence pairs, which outperformed projection by more than 10 F—measure points for Bulgarian and Korean.
Introduction
Here we combine elements of both Wikipedia metadata-based approaches and projection-based approaches, making use of parallel sentences extracted from Wikipedia.
parallel sentences is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan
Data
(2013), consisting of 1.8M parallel sentences from the NTCIR-7 J PEN PatentMT subtask (Fujii et al., 2008) and 2k parallel sentences for parameter development from the NTCIR-8 test collection.
Data
For Wikipedia, we trained a DE-EN system on 4.1M parallel sentences from Europarl, Common Crawl, and News-Commentary.
Data
Parameter tuning was done on 3k parallel sentences from the WMT’ 11 test set.
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
Discussions
Ideally, a perfect combination of feature functions divides the correct and incorrect candidate phrase pairs within a parallel sentence into two ordered separate sets.
Experimental Results
The training corpus consists of 40K Chinese-English parallel sentences in travel domain with to-
Features
We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence .
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Bilingual NER by Agreement
The inputs to our models are parallel sentence pairs (see Figure 1 for an example in English and
Experimental Setup
After discarding sentences with no aligned counterpart, a total of 402 documents and 8,249 parallel sentence pairs were used for evaluation.
Experimental Setup
An extra set of 5,000 unannotated parallel sentence pairs are used for
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Generating reference reordering from parallel sentences
The main aim of our work is to improve the reordering model by using parallel sentences for which manual word alignments are not available.
Generating reference reordering from parallel sentences
In other words, we want to generate relatively clean reference reorderings from parallel sentences and use them for training a reordering model.
Generating reference reordering from parallel sentences
word alignments (H) and a much larger corpus of parallel sentences (U) that are not word aligned.
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Experiments
The J ST J apanese-English paper abstract corpus6 (Utiyama and Isahara, 2007), which consists of one million parallel sentences , was used for training, tuning, and testing.
Introduction
However, forest-based translation systems, and, in general, most linguistically syntax-based SMT systems (Galley et al., 2004; Galley et al., 2006; Liu et al., 2006; Zhang et al., 2007; Mi et al., 2008; Liu et al., 2009; Chiang, 2010), are built upon word aligned parallel sentences and thus share a critical dependence on word alignments.
Introduction
In order to investigate this problem, we manually analyzed the alignments of the first 100 parallel sentences in our English-Japanese training data (to be shown in Table 2).
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Long and Yang, Shiquan and Zhou, Ming and Liu, Xiaohua and Zhu, Qingsheng
Experimental Results
As we mentioned in Section 2, (Shi et al., 2006) reported that in total they mined 1,069,423 pairs of English-Chinese parallel sentences from bilingual web sites.
Related Work
As far as we know, there is no publication available on mining parallel sentences directly from bilingual web pages.
Related Work
(Shi et al., 2006), mined a total of 1,069,423 pairs of English-Chinese parallel sentences .
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Experimental Setup
7Note that the although the corpora here are derived from a parallel corpus, there are no parallel sentences .
Experimental Setup
10These corpora contain no parallel sentences .
Experimental Setup
For English-Arabic, we extract a lexicon from 100k parallel sentences of UN parallel corpora by running the HMM intersected alignment model (Liang et al., 2008), adding (3, t) to the lexicon if s was aligned to t at least three times and more than any other word.
parallel sentences is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: