Index of papers in Proc. ACL 2010 that mention
  • phrase-based
Feng, Yansong and Lapata, Mirella
Abstractive Caption Generation
Phrase-based Model The model outlined in equation (8) will generate captions with function words.
Abstractive Caption Generation
Search To generate a caption it is necessary to find the sequence of words that maximizes P(w1,w2, ...,wn) for the word-based model (equation (8)) and P(p1, p2, ..., pm) for the phrase-based model (equation (15)).
Experimental Setup
Documents and captions were parsed with the Stanford parser (Klein and Manning, 2003) in order to obtain dependencies for the phrase-based abstractive model.
Experimental Setup
We tuned the caption length parameter on the development set using a range of [5, 14] tokens for the word-based model and [2, 5] phrases for the phrase-based model.
Experimental Setup
For the phrase-based model, we also experimented with reducing the search scope, either by considering only the n most similar sentences to the keywords (range [2,10]), or simply the single most similar sentence and its neighbors (range [2, 5]).
Results
different from phrase-based abstractive system.
Results
Table 4: Captions written by humans (G) and generated by extractive (KL), word-based abstractive (AW), and phrase-based extractive (Ap systems).
Results
It is significantly worse than the phrase-based abstractive system (0c < 0.01), the extractive system (0c < 0.01), and the gold standard (0c < 0.01).
phrase-based is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Abstract
We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT.
Abstract
As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system.
Experiments on Phrase-Based SMT
Moses (Koehn et al., 2007) is used as the baseline phrase-based SMT system.
Experiments on Phrase-Based SMT
6.2 Effect of improved word alignment on phrase-based SMT
Improving Phrase Table
Phrase-based SMT system automatically extracts bilingual phrase pairs from the word aligned bilingual corpus.
Improving Phrase Table
These collocation probabilities are incorporated into the phrase-based SMT system as features.
Introduction
In phrase-based SMT (Koehn et al., 2003), the phrase boundary is usually determined based on the bidirectional word alignments.
Introduction
Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT.
Introduction
Then the phrase collocation probabilities are used as additional features in phrase-based SMT systems.
phrase-based is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
A Phrase-Based Error Model
The goal of the phrase-based error model is to transform a correctly spelled query C into a misspelled query Q.
A Phrase-Based Error Model
If we assume a uniform probability over segmentations, then the phrase-based probability can be defined as:
Abstract
Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system.
Abstract
Results show that the system using the phrase-based error model outperforms significantly its baseline systems.
Introduction
Among these models, the most effective one is a phrase-based error model that captures the probability of transforming one multi-term phrase into another multi-term phrase.
Introduction
Comparing to traditional error models that account for transformation probabilities between single characters (Kernighan et al., 1990) or sub-word strings (Brill and Moore, 2000), the phrase-based model is more powerful in that it captures some contextual information by retaining inter-term dependencies.
Introduction
In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems.
Related Work
To this end, inspired by the phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003; Och and Ney, 2004), we propose a phrase-based error model where we assume that query spelling correction is performed at the phrase level.
Related Work
In what follows, before presenting the phrase-based error model, we will first describe the clickthrough data and the query speller system we used in this study.
The Baseline Speller System
Figure 2: Example demonstrating the generative procedure behind the phrase-based error model.
phrase-based is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Abstract
We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system.
Background
The first SMT system is a phrase-based system with two reordering models including the maximum entropy-based lexicalized reordering model proposed by Xiong et al.
Background
The second SMT system is an in-house reim-plementation of the Hiero system which is based on the hierarchical phrase-based model proposed by Chiang (2005).
Background
After 5, 7 and 8 iterations, relatively stable improvements are achieved by the phrase-based system, the Hiero system and the syntaX-based system, respectively.
Introduction
Many SMT frameworks have been developed, including phrase-based SMT (Koehn et al., 2003), hierarchical phrase-based SMT (Chiang, 2005), syntax-based SMT (Eisner, 2003; Ding and Palmer, 2005; Liu et al., 2006; Galley et al., 2006; Cowan et al., 2006), etc.
Introduction
Our experiments are conducted on Chinese-to-English translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based
phrase-based is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Abstract
We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures.
Conclusions
We have presented a novel way to incorporate source syntactic structure in English-to-Turkish phrase-based machine translation by parsing the source sentences and then encoding many local and nonlocal source syntactic structures as additional complex tag factors.
Experimental Setup and Results
We evaluated the impact of the transformations in factored phrase-based SMT with an English-Turkish data set which consists of 52712 parallel sentences.
Experimental Setup and Results
As a baseline system, we built a standard phrase-based system, using the surface forms of the words without any transformations, and with a 3—gram LM in the decoder.
Experimental Setup and Results
Factored phrase-based SMT allows the use of multiple language models for the target side, for different factors during decoding.
Introduction
Once these were identified as separate tokens, they were then used as “words” in a standard phrase-based framework (Koehn et al., 2003).
Introduction
This facilitates the use of factored phrase-based translation that was not previously applicable due to the morphological complexity on the target side and mismatch between source and target morphologies.
Introduction
We assume that the reader is familiar with the basics of phrase-based statistical machine translation (Koehn et al., 2003) and factored statistical machine translation (Koehn and Hoang, 2007).
Related Work
Koehn (2005) applied standard phrase-based SMT to Finnish using the Europarl corpus and reported that translation to Finnish had the worst BLEU scores.
Related Work
Yang and Kirchhoff (2006) have used phrase-based backoff models to translate unknown words by morphologically decomposing the unknown source words.
Related Work
They used both CCG supertags and LTAG su-pertags in Arabic-to-English phrase-based translation and have reported about 6% relative improvement in BLEU scores.
phrase-based is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Abstract
Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
Alignment
We apply our normal phrase-based decoder on the source side of the training data and constrain the translations to the corresponding target sentences from the training data.
Conclusion
We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model.
Experimental Evaluation
The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
Introduction
A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).
Introduction
We use a modified version of a phrase-based decoder to perform the forced alignment.
Related Work
For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations.
Related Work
We also include these word lexica, as they are standard components of the phrase-based system.
Related Work
They report improvements over a phrase-based model that uses an inverse phrase model and a language model.
phrase-based is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Abstract
We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Evaluation
Table 4: Comparing Model-1 and Model-2 with Phrase-based Systems
Evaluation
We also used two methods to incorporate transliterations in the phrase-based system:
Evaluation
Post-process P191: All the 00V words in the phrase-based output are replaced with their top-candidate transliteration as given by our transliteration system.
Introduction
Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores
phrase-based is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Woodsend, Kristian and Lapata, Mirella
Abstract
The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs.
Experimental Setup
Training We obtained phrase-based salience scores using a supervised machine learning algorithm.
Experimental Setup
The SVM was trained with the same features used to obtain phrase-based salience scores, but with sentence-level labels (labels (1) and (2) positive, (3) negative).
Experimental Setup
Figure 3: ROUGE-l and ROUGE-L results for phrase-based ILP model and two baselines, with error bars showing 95% confidence levels.
Results
F-score is higher for the phrase-based system but not significantly.
Results
The highlights created by the sentence ILP were considered significantly more verbose (0c < 0.05) than those created by the phrase-based system and the CNN abstractors.
Results
Table 5 shows the output of the phrase-based system for the documents in Table 1.
phrase-based is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Abstract
Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
Bag-of-Words Vector Space Model
In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
Experiments
For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
Hierarchical phrase-based MT system
The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).
Hierarchical phrase-based MT system
Empirically, this method has yielded better performance on language pairs such as Chinese-English than the phrase-based method because it permits phrases with gaps; it generalizes the normal phrase-based models in a way that allows long-distance reordering (Chiang, 2005; Chiang, 2007).
Introduction
We chose a hierarchical phrase-based SMT system as our baseline; thus, the units involved in computation of sense similarities are hierarchical rules.
phrase-based is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Duan, Xiangyu and Zhang, Min and Li, Haizhou
Abstract
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
Conclusion
We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation.
Conclusion
Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain.
Experiments and Results
The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
Experiments and Results
We use GIZA++ model 4 for word alignment, use Moses for phrase-based decoding.
Introduction
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
phrase-based is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Introduction
By incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext, they are believed better than phrase-based counterparts in reorderings.
Introduction
In contrast to conventional tree-to-tree approaches (Ding and Palmer, 2005; Quirk et al., 2005; Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009), which only make use of a single type of trees, our model is able to combine two types of trees, outperforming both phrase-based and tree-to-string systems.
Introduction
Current tree-to-tree models (Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009) still have not outperformed the phrase-based system Moses (Koehn et al., 2007) significantly even with the help of forests.1
Related Work
This model shows a significant improvement over the state-of-the-art hierarchical phrase-based system (Chiang, 2005).
phrase-based is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: