SciSurf: Index of "phrase-based" in Proc. ACL 2010

Index of papers in Proc. ACL 2010 that mention

phrase-based

Seen in text as:

phrase-based (118)
Phrase-based (4)
Phrase-Based (4)

Seen in 119 sentences in 11 papers.

1. How Many Words Is a Picture Worth? Automatic Caption Generation for News Images

Feng, Yansong and Lapata, Mirella

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstractive Caption Generation	Phrase-based Model The model outlined in equation (8) will generate captions with function words.
Abstractive Caption Generation	Search To generate a caption it is necessary to find the sequence of words that maximizes P(w1,w2, ...,wn) for the word-based model (equation (8)) and P(p1, p2, ..., pm) for the phrase-based model (equation (15)).
Experimental Setup	Documents and captions were parsed with the Stanford parser (Klein and Manning, 2003) in order to obtain dependencies for the phrase-based abstractive model.
Experimental Setup	We tuned the caption length parameter on the development set using a range of [5, 14] tokens for the word-based model and [2, 5] phrases for the phrase-based model.
Experimental Setup	For the phrase-based model, we also experimented with reducing the search scope, either by considering only the n most similar sentences to the keywords (range [2,10]), or simply the single most similar sentence and its neighbors (range [2, 5]).
Results	different from phrase-based abstractive system.
Results	Table 4: Captions written by humans (G) and generated by extractive (KL), word-based abstractive (AW), and phrase-based extractive (Ap systems).
Results	It is significantly worse than the phrase-based abstractive system (0c < 0.01), the extractive system (0c < 0.01), and the gold standard (0c < 0.01).

phrase-based is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

2. Improving Statistical Machine Translation with Monolingual Collocation

Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT.
Abstract	As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system.
Experiments on Phrase-Based SMT	Moses (Koehn et al., 2007) is used as the baseline phrase-based SMT system.
Experiments on Phrase-Based SMT	6.2 Effect of improved word alignment on phrase-based SMT
Improving Phrase Table	Phrase-based SMT system automatically extracts bilingual phrase pairs from the word aligned bilingual corpus.
Improving Phrase Table	These collocation probabilities are incorporated into the phrase-based SMT system as features.
Introduction	In phrase-based SMT (Koehn et al., 2003), the phrase boundary is usually determined based on the bidirectional word alignments.
Introduction	Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT.
Introduction	Then the phrase collocation probabilities are used as additional features in phrase-based SMT systems.

phrase-based is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

3. Learning Phrase-Based Spelling Error Models from Clickthrough Data

Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Phrase-Based Error Model	The goal of the phrase-based error model is to transform a correctly spelled query C into a misspelled query Q.
A Phrase-Based Error Model	If we assume a uniform probability over segmentations, then the phrase-based probability can be defined as:
Abstract	Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system.
Abstract	Results show that the system using the phrase-based error model outperforms significantly its baseline systems.
Introduction	Among these models, the most effective one is a phrase-based error model that captures the probability of transforming one multi-term phrase into another multi-term phrase.
Introduction	Comparing to traditional error models that account for transformation probabilities between single characters (Kernighan et al., 1990) or sub-word strings (Brill and Moore, 2000), the phrase-based model is more powerful in that it captures some contextual information by retaining inter-term dependencies.
Introduction	In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems.
Related Work	To this end, inspired by the phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003; Och and Ney, 2004), we propose a phrase-based error model where we assume that query spelling correction is performed at the phrase level.
Related Work	In what follows, before presenting the phrase-based error model, we will first describe the clickthrough data and the query speller system we used in this study.
The Baseline Speller System	Figure 2: Example demonstrating the generative procedure behind the phrase-based error model.

phrase-based is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

4. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system.
Background	The first SMT system is a phrase-based system with two reordering models including the maximum entropy-based lexicalized reordering model proposed by Xiong et al.
Background	The second SMT system is an in-house reim-plementation of the Hiero system which is based on the hierarchical phrase-based model proposed by Chiang (2005).
Background	After 5, 7 and 8 iterations, relatively stable improvements are achieved by the phrase-based system, the Hiero system and the syntaX-based system, respectively.
Introduction	Many SMT frameworks have been developed, including phrase-based SMT (Koehn et al., 2003), hierarchical phrase-based SMT (Chiang, 2005), syntax-based SMT (Eisner, 2003; Ding and Palmer, 2005; Liu et al., 2006; Galley et al., 2006; Cowan et al., 2006), etc.
Introduction	Our experiments are conducted on Chinese-to-English translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based

phrase-based is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

5. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures.
Conclusions	We have presented a novel way to incorporate source syntactic structure in English-to-Turkish phrase-based machine translation by parsing the source sentences and then encoding many local and nonlocal source syntactic structures as additional complex tag factors.
Experimental Setup and Results	We evaluated the impact of the transformations in factored phrase-based SMT with an English-Turkish data set which consists of 52712 parallel sentences.
Experimental Setup and Results	As a baseline system, we built a standard phrase-based system, using the surface forms of the words without any transformations, and with a 3—gram LM in the decoder.
Experimental Setup and Results	Factored phrase-based SMT allows the use of multiple language models for the target side, for different factors during decoding.
Introduction	Once these were identified as separate tokens, they were then used as “words” in a standard phrase-based framework (Koehn et al., 2003).
Introduction	This facilitates the use of factored phrase-based translation that was not previously applicable due to the morphological complexity on the target side and mismatch between source and target morphologies.
Introduction	We assume that the reader is familiar with the basics of phrase-based statistical machine translation (Koehn et al., 2003) and factored statistical machine translation (Koehn and Hoang, 2007).
Related Work	Koehn (2005) applied standard phrase-based SMT to Finnish using the Europarl corpus and reported that translation to Finnish had the worst BLEU scores.
Related Work	Yang and Kirchhoff (2006) have used phrase-based backoff models to translate unknown words by morphologically decomposing the unknown source words.
Related Work	They used both CCG supertags and LTAG su-pertags in Arabic-to-English phrase-based translation and have reported about 6% relative improvement in BLEU scores.

phrase-based is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

6. Training Phrase Translation Models with Leaving-One-Out

Wuebker, Joern and Mauser, Arne and Ney, Hermann

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
Alignment	We apply our normal phrase-based decoder on the source side of the training data and constrain the translations to the corresponding target sentences from the training data.
Conclusion	We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model.
Experimental Evaluation	The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
Introduction	A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).
Introduction	We use a modified version of a phrase-based decoder to perform the forced alignment.
Related Work	For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations.
Related Work	We also include these word lexica, as they are standard components of the phrase-based system.
Related Work	They report improvements over a phrase-based model that uses an inverse phrase model and a language model.

phrase-based is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

7. Hindi-to-Urdu Machine Translation through Transliteration

Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Evaluation	Table 4: Comparing Model-1 and Model-2 with Phrase-based Systems
Evaluation	We also used two methods to incorporate transliterations in the phrase-based system:
Evaluation	Post-process P191: All the 00V words in the phrase-based output are replaced with their top-candidate transliteration as given by our transliteration system.
Introduction	Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores

phrase-based is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. Automatic Generation of Story Highlights

Woodsend, Kristian and Lapata, Mirella

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs.
Experimental Setup	Training We obtained phrase-based salience scores using a supervised machine learning algorithm.
Experimental Setup	The SVM was trained with the same features used to obtain phrase-based salience scores, but with sentence-level labels (labels (1) and (2) positive, (3) negative).
Experimental Setup	Figure 3: ROUGE-l and ROUGE-L results for phrase-based ILP model and two baselines, with error bars showing 95% confidence levels.
Results	F-score is higher for the phrase-based system but not significantly.
Results	The highlights created by the sentence ILP were considered significantly more verbose (0c < 0.05) than those created by the phrase-based system and the CNN abstractors.
Results	Table 5 shows the output of the phrase-based system for the documents in Table 1.

phrase-based is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Bilingual Sense Similarity for Statistical Machine Translation

Chen, Boxing and Foster, George and Kuhn, Roland

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
Bag-of-Words Vector Space Model	In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
Experiments	For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
Hierarchical phrase-based MT system	The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).
Hierarchical phrase-based MT system	Empirically, this method has yielded better performance on language pairs such as Chinese-English than the phrase-based method because it permits phrases with gaps; it generalizes the normal phrase-based models in a way that allows long-distance reordering (Chiang, 2005; Chiang, 2007).
Introduction	We chose a hierarchical phrase-based SMT system as our baseline; thus, the units involved in computation of sense similarities are hierarchical rules.

phrase-based is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
Conclusion	We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation.
Conclusion	Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain.
Experiments and Results	The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
Experiments and Results	We use GIZA++ model 4 for word alignment, use Moses for phrase-based decoding.
Introduction	The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.

phrase-based is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Constituency to Dependency Translation with Forests

Mi, Haitao and Liu, Qun

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	By incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext, they are believed better than phrase-based counterparts in reorderings.
Introduction	In contrast to conventional tree-to-tree approaches (Ding and Palmer, 2005; Quirk et al., 2005; Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009), which only make use of a single type of trees, our model is able to combine two types of trees, outperforming both phrase-based and tree-to-string systems.
Introduction	Current tree-to-tree models (Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009) still have not outperformed the phrase-based system Moses (Koehn et al., 2007) significantly even with the help of forests.1
Related Work	This model shows a significant improvement over the state-of-the-art hierarchical phrase-based system (Chiang, 2005).

phrase-based is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: