Abstract | We propose a log-linear model to compute the paraphrase likelihood of two patterns and exploit feature functions based on maximum likelihood estimation (MLE) and lexical weighting (LW). |
Conclusion | We use a log-linear model to compute the paraphrase likelihood and exploit feature functions based on MLE and LW. |
Conclusion | In addition, the log-linear model with the proposed feature functions significantly outperforms the conventional models. |
Experiments | 4.1 Evaluation of the Log-linear Model |
Experiments | As previously mentioned, in the log-linear model of this paper, we use both MLE based and LW based feature functions. |
Experiments | In this section, we evaluate the log-linear model (LL-Model) and compare it with the MLE based model (MLE-Model) presented by Bannard and Callison-Burch (2005)6. |
Introduction | parsing and English-foreign language word alignment, (2) aligned patterns induction, which produces English patterns along with the aligned pivot patterns in the foreign language, (3) paraphrase patterns extraction, in which paraphrase patterns are extracted based on a log-linear model. |
Introduction | Secondly, we propose a log-linear model for computing the paraphrase likelihood. |
Introduction | Besides, the log-linear model is more effective than the conventional model presented in (Bannard and Callison-Burch, 2005). |
Proposed Method | In order to exploit more and richer information to estimate the paraphrase likelihood, we propose a log-linear model: |
Proposed Method | In this paper, 4 feature functions are used in our log-linear model, which include: |
Discriminative Synchronous Transduction | 3.1 A global log-linear model |
Discriminative Synchronous Transduction | Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence. |
Discriminative Synchronous Transduction | Our findings echo those observed for latent variable log-linear models successfully used in monolingual parsing (Clark and Curran, 2007; Petrov et al., 2007). |
Discussion and Further Work | Such approaches have been shown to be effective in log-linear word-alignment models Where only a small supervised corpus is available (Blunsom and Cohn, 2006). |
Introduction | First, we develop a log-linear model of translation which is globally trained on a significant number of parallel sentences. |
A Generic Phrase Training Procedure | Note that under the log-linear model, applying threshold for filtering is equivalent to comparing the “likelihood” ratio. |
Conclusions | In this paper, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall. |
Discussions | The generic phrase training algorithm follows an information retrieval perspective as in (Venugopal et al., 2003) but aims to improve both precision and recall with the trainable log-linear model. |
Discussions | Under the general framework, one can put as many features as possible together under the log-linear model to evaluate the quality of a phrase and a phase pair. |
Experimental Results | Our decoder is a phrase-based multi-stack implementation of the log-linear model similar to Pharaoh (Koehn et al., 2003). |
Experimental Results | Like other log-linear model based decoders, active features in our translation engine include translation models in two directions, lexicon weights in two directions, language model, lexicalized distortion models, sentence length penalty and other heuristics. |
Experimental Results | Since the translation engine implements a log-linear model, the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process. |
Cohesive Decoding | This count becomes a feature in the decoder’s log-linear model, the weight of which is trained with MERT. |
Experiments | Weights for the log-linear model are set using MERT, as implemented by Venugopal and Vogel (2005). |
Experiments | Since adding features to the decoder’s log-linear model is straightforward, we also experiment with a combined system that uses both the cohesion constraint and a lexical reordering model. |