Index of papers in Proc. ACL that mention

parallel sentences

Seen in text as:

parallel sentences (77)
parallel sentence (28)

Seen in 101 sentences in 19 papers.

1. Unsupervised Multilingual Grammar Induction

Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	We test our model on three corpora of bilingual parallel sentences : English-Korean, English-Urdu, and English-Chinese.
Experimental setup	To obtain leXical alignments between the parallel sentences we employ GIZA++ (Och and Ney, 2003).
Experimental setup	Then for each pair of parallel sentences we randomly sample an initial alignment tree for the two sampled trees.
Introduction	Finally, parallel sentences are assembled from these generated part-of-speech sequences and word-level alignments.
Model	We treat the part-of-speech tag sequences of parallel sentences , as well as their
Model	Now we describe the stochastic process whereby the observed parallel sentences and their word-level alignments are generated, according to our model.
Model	Then, each pair of word-aligned parallel sentences is generated through the following process:
Related Work	Assuming that trees induced over parallel sentences have to exhibit certain structural regularities, Kuhn manually specifies a set of rules for determining when parsing decisions in the two languages are inconsistent with GIZA++ word-level alignments.

parallel sentences is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	We enrich contexts of parallel sentence pairs with topic related monolingual data
Experiments	These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs.
Experiments	In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
Experiments	This is not simply coincidence since we can interpret their approach as a special case in our neural network method: when a parallel sentence pair has
Related Work	In addition, our method directly maximizes the similarity between parallel sentence pairs, which is ideal for SMT decoding.
Topic Similarity Model with Neural Network	Parallel sentence
Topic Similarity Model with Neural Network	Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
Topic Similarity Model with Neural Network	Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.

parallel sentences is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

3. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Figure 3: Evaluation of paraphrase systems trained on different numbers of parallel sentences .
Experiments	Overall, all the trained models produce reasonable paraphrase systems, even the model trained on just 28K single parallel sentences .
Experiments	Examples of the outputs produced by the models trained on single parallel sentences and on all parallel sentences are shown in Table 2.
Related Work	Another method for collecting monolingual paraphrase data involves aligning semantically parallel sentences from different news articles describing the same event (Shinyama et al., 2002; Barzilay and Lee, 2003; Dolan et al., 2004).
Related Work	While utilizing multiple translations of literary work or multiple news stories of the same event can yield significant numbers of parallel sentences , this data tend to be noisy, and reliably identifying good paraphrases among all possible sentence pairs remains an open problem.

parallel sentences is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

4. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons.
Cross-Lingual Mixture Model for Sentiment Classification	termining polarity classes of the parallel sentences .
Cross-Lingual Mixture Model for Sentiment Classification	Particularly, for each pair of parallel sentences U: E U, we generate the words as follows.
Cross-Lingual Mixture Model for Sentiment Classification	class label for unlabeled parallel sentences ) is computed according to the following equations.
Experiment	The unlabeled parallel sentences
Experiment	This model use English labeled data and Chinese labeled data to obtain initial parameters for two maximum entropy classifiers (for English documents and Chinese documents), and then conduct EM-iterations to update the parameters to gradually improve the agreement of the two monolingual classifiers on the unlabeled parallel sentences .
Experiment	When we have 10,000 parallel sentences , the accuracy of CLMM on the two data sets quickly increases to 68.77% and 68.91%, respectively.
Related Work	They assume parallel sentences in the corpus should have the same sentiment polarity.

parallel sentences is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

5. Microblogs as Parallel Corpora

Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We show that a considerable amount of parallel sentence pairs can be crawled from microblogs and these can be used to improve Machine Translation by updating our translation tables with translations of newer terms.
Experiments	The quality of the parallel sentence detection did not vary significantly with different setups, so we will only show the results for the best setup, which is the baseline model with span constraints.
Experiments	From the precision and recall curves, we observe that most of the parallel data can be found at the top 30% of the filtered tweets, where 5 in 6 tweets are detected correctly as parallel, and only 1 in every 6 parallel sentences is lost.
Experiments	If we generalize this ratio for the complete set with 1124k tweets, we can expect approximately 337k parallel sentences .

parallel sentences is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

6. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We randomly take 200,000 parallel sentences from the UN corpus of the year 2000.
Experiments	Then, we extract every f that cooccurs with e in a parallel sentence and add it to nbestTI(e) which gives us the list of candidate transliteration pairs candidateTI(e).
Experiments	Algorithm 3 Estimation of transliteration probabilities, e-to-f direction 1: unfiltered data <—1ist of word pairs 2: filtered data <—transliteration pairs extracted using Algorithm 1 3: Train a transliteration system on the filtered data 4: for all e do 5: nbestTI(e) <— 10 best transliterations for e according to the transliteration system 6: 0000(6) <— set of all f that cooccur with e in a parallel sentence candidateTI(e) <— cooc(e) U nbestTI(e) : end for : for all f do 10: pmoses( f, e) <— joint transliteration probability of e and f according to the transliterator
Introduction	The NEWSlO data sets are extracted Wikipedia InterLanguage Links (WIL) which consist of parallel phrases, whereas a parallel corpus consists of parallel sentences .
Models	For training Moses as a transliteration system, we treat each word pair as if it were a parallel sentence , by putting spaces between the characters of each word.

parallel sentences is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	However, if most domains are similar (FBIS data set) or if there are enough parallel sentence pairs (NIST data set) in each domain, then the translation performances are almost similar even with the opposite integrating orders.
Introduction	Today, more parallel sentences are drawn from divergent domains, and the size keeps growing.
Related Work	A number of approaches have been proposed to make use of the full potential of the available parallel sentences from various domains, such as domain adaptation and incremental learning for SMT.
Related Work	The similarity calculated by a information retrieval system between the training subset and the test set is used as a feature for each parallel sentence (Lu et al., 2007).
Related Work	Incremental learning in which new parallel sentences are incrementally updated to the training data is employed for SMT.

parallel sentences is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Zhang, Jiajun and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	They are neither parallel nor comparable because we cannot even extract a small number of parallel sentence pairs from this monolingual data using the method of (Munteanu and Marcu, 2006).
Experiments	For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs, and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword.
Phrase Pair Refinement and Parameterization	However, for the phrase-level probability, we cannot use maximum likelihood estimation since the phrase pairs are not extracted from parallel sentences .
Related Work	Munteanu and Marcu (2006) first extract the candidate parallel sentences from the comparable corpora and further extract the accurate sub-sentential bilingual fragments from the candidate parallel sentences using the in-domain probabilistic bilingual lexicon.
Related Work	Thus, finding the candidate parallel sentences is not possible in our situation.

parallel sentences is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

Ma, Xuezhe and Xia, Fei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We train our parsing model with different numbers of parallel sentences to analyze the influence of the amount of parallel data on the parsing performance of our approach.
Experiments	The parallel data sets contain 500, 1000, 2000, 5000, 10000 and 20000 parallel sentences , respectively.
Experiments	We randomly extract parallel sentences from each corpora, and smaller data sets are subsets of larger ones.

parallel sentences is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

10. Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Joint Model with Unlabeled Parallel Text	each x,- is a sentence, and x}, and x3, are parallel sentences .
A Joint Model with Unlabeled Parallel Text	Given the problem definition above, we now present a novel model to exploit the correspondence of parallel sentences in unlabeled bilingual text.
A Joint Model with Unlabeled Parallel Text	If we assume that parallel sentences are perfect translations, the two sentences in each pair should have the same polarity label, which gives us:
Abstract	We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language.

parallel sentences is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	The idea is that, given enough parallel data, a shared representation of two parallel sentences would be forced to capture the common elements between these two sentences.
Approach	What parallel sentences share, of course, are their semantics.
Approach	For every pair of parallel sentences (a, b) we sample a number of additional sentence pairs (-, n) E C, where n—with high probability—is not semantically equivalent to a.
Experiments	The ADD+ model uses an additional 500k parallel sentences from the English-French corpus, resulting in one million English sentences, each paired up with either a German or a French sentence, with BI and BI+ trained accordingly.

parallel sentences is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and task	The approach uses a small amount of manually annotated article-pairs to train a document-level CRF model for parallel sentence extraction.
Data and task	This is due to two phenomena: one is that the parallel sentences sometimes contain different amounts of information and one language might use more detail than the other.
Data and task	We presented a direct semi-CRF tagging model for labeling foreign sentences in parallel sentence pairs, which outperformed projection by more than 10 F—measure points for Bulgarian and Korean.
Introduction	Here we combine elements of both Wikipedia metadata-based approaches and projection-based approaches, making use of parallel sentences extracted from Wikipedia.

parallel sentences is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval

Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	(2013), consisting of 1.8M parallel sentences from the NTCIR-7 J PEN PatentMT subtask (Fujii et al., 2008) and 2k parallel sentences for parameter development from the NTCIR-8 test collection.
Data	For Wikipedia, we trained a DE-EN system on 4.1M parallel sentences from Europarl, Common Crawl, and News-Commentary.
Data	Parameter tuning was done on 3k parallel sentences from the WMT’ 11 test set.

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

14. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussions	Ideally, a perfect combination of feature functions divides the correct and incorrect candidate phrase pairs within a parallel sentence into two ordered separate sets.
Experimental Results	The training corpus consists of 40K Chinese-English parallel sentences in travel domain with to-
Features	We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence .

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

15. Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual NER by Agreement	The inputs to our models are parallel sentence pairs (see Figure 1 for an example in English and
Experimental Setup	After discarding sentences with no aligned counterpart, a total of 402 documents and 8,249 parallel sentence pairs were used for evaluation.
Experimental Setup	An extra set of 5,000 unannotated parallel sentence pairs are used for

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

NER (41)
word alignment (20)
CRF (15)

16. Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generating reference reordering from parallel sentences	The main aim of our work is to improve the reordering model by using parallel sentences for which manual word alignments are not available.
Generating reference reordering from parallel sentences	In other words, we want to generate relatively clean reference reorderings from parallel sentences and use them for training a reordering model.
Generating reference reordering from parallel sentences	word alignments (H) and a much larger corpus of parallel sentences (U) that are not word aligned.

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Effective Use of Function Words for Rule Generalization in Forest-Based Translation

Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The J ST J apanese-English paper abstract corpus6 (Utiyama and Isahara, 2007), which consists of one million parallel sentences , was used for training, tuning, and testing.
Introduction	However, forest-based translation systems, and, in general, most linguistically syntax-based SMT systems (Galley et al., 2004; Galley et al., 2006; Liu et al., 2006; Zhang et al., 2007; Mi et al., 2008; Liu et al., 2009; Chiang, 2010), are built upon word aligned parallel sentences and thus share a critical dependence on word alignments.
Introduction	In order to investigate this problem, we manually analyzed the alignments of the first 100 parallel sentences in our English-Japanese training data (to be shown in Table 2).

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Mining Bilingual Data from the Web with Adaptively Learnt Patterns

Jiang, Long and Yang, Shiquan and Zhou, Ming and Liu, Xiaohua and Zhu, Qingsheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	As we mentioned in Section 2, (Shi et al., 2006) reported that in total they mined 1,069,423 pairs of English-Chinese parallel sentences from bilingual web sites.
Related Work	As far as we know, there is no publication available on mining parallel sentences directly from bilingual web pages.
Related Work	(Shi et al., 2006), mined a total of 1,069,423 pairs of English-Chinese parallel sentences .

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Learning Bilingual Lexicons from Monolingual Corpora

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	7Note that the although the corpora here are derived from a parallel corpus, there are no parallel sentences .
Experimental Setup	10These corpora contain no parallel sentences .
Experimental Setup	For English-Arabic, we extract a lexicon from 100k parallel sentences of UN parallel corpora by running the HMM intersected alignment model (Liang et al., 2008), adding (3, t) to the lexicon if s was aligned to t at least three times and more than any other word.

parallel sentences is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: