SciSurf: Index of 'A Sense-Based Translation Model for Statistical Machine Translation'

Topics

word senses (55)
translation model (46)
topic model (17)
MaxEnt (14)
machine translation (12)
translation quality (9)
BLEU (7)
NIST (6)
LDA (6)
sense disambiguation (5)
SMT system (5)
maximum entropy (4)
language model (4)
context information (4)
statistical machine translation (3)
development set (3)
translation task (3)
word alignments (3)
BLEU points (3)

Topics

word senses (55)
translation model (46)
topic model (17)
MaxEnt (14)
machine translation (12)
translation quality (9)
BLEU (7)
NIST (6)
LDA (6)
sense disambiguation (5)
SMT system (5)
maximum entropy (4)
language model (4)
context information (4)
statistical machine translation (3)
development set (3)
translation task (3)
word alignments (3)
BLEU points (3)

A Sense-Based Translation Model for Statistical Machine Translation

Xiong, Deyi and Zhang, Min

Published in Proc. ACL, 2014

Article Structure

Abstract

The sense in which a word is used determines the translation of the word.

Introduction

One of very common phenomena in language is that a plenty of words have multiple meanings.

WSI-Based Broad-Coverage Sense Tagger

In order to obtain word senses for any source words, we build a broad-coverage sense tagger that relies on the nonparametric Bayesian model based word sense induction.

Sense-Based Translation Model

In this section we present our sense-based translation model and describe the features that we use as well as the training process of this model.

Decoding with Sense-Based Translation Model

The sense-based translation model described above is integrated into the log-linear translation model of SMT as a sense-based knowledge source.

Experiments

In this section, we carried out a series of experiments on Chinese-to-English translation using large-scale bilingual training data.

Related Work

In this section we introduce previous studies that are related to our work.

Conclusion

We have presented a sense-based translation model that integrates word senses into machine translation.

Topics

word senses

Appears in 55 sentences as: Word Sense (3) Word sense (1) word sense (13) Word Senses (1) Word senses (1) word senses (44)

In A Sense-Based Translation Model for Statistical Machine Translation

In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation.
Page 1, “Abstract”
Our method is significantly different from preVious word sense disambiguation reformulated for machine translation in that the latter neglects word senses in nature.
Page 1, “Abstract”
Results show that the proposed model substantially outperforms not only the baseline but also the preVious reformulated word sense disambiguation.
Page 1, “Abstract”
Therefore a natural assumption is that word sense disambiguation (WSD) may contribute to statistical machine translation (SMT) by providing appropriate word senses for target translation selection with context features (Carpuat and Wu, 2005).
Page 1, “Introduction”
Carpuat and Wu (2005) adopt a standard formulation of WSD: predicting word senses that are defined on an ontology for ambiguous words.
Page 1, “Introduction”
As they apply WSD to Chinese-to-English translation, they predict word senses from a Chinese ontology HowNet and project the predicted senses to English glosses provided by HowNet.
Page 1, “Introduction”
Although this reformulated WSD has proved helpful for SMT, one question is not answered yet: are pure word senses useful for SMT?
Page 1, “Introduction”
The early WSD for SMT (Carpuat and Wu, 2005) uses projected word senses while the reformulated WSD sidesteps word senses .
Page 1, “Introduction”
In this paper we would like to reinvestigate this question by resorting to word sense induction (WSI) that is related to but different from WSD.1 We use
Page 1, “Introduction”
WSI to obtain word senses for large-scale data.
Page 2, “Introduction”
With these word senses, we study in particular: 1) whether word senses can be directly integrated to SMT to improve translation quality and 2) whether WSI-based model can outperform the reformulated WSD in the context of SMT.
Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention word senses.

See all papers in Proc. ACL that mention word senses.

Back to top.

translation model

Appears in 46 sentences as: Translation Model (1) translation model (46) translation models (1)

In A Sense-Based Translation Model for Statistical Machine Translation

In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation.
Page 1, “Abstract”
The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers.
Page 1, “Abstract”
We test the effectiveness of the proposed sense-based translation model on a large-scale Chinese-to-English translation task.
Page 1, “Abstract”
These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system.
Page 1, “Introduction”
In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers.
Page 2, “Introduction”
We collect training instances from the sense-tagged training data to train the proposed sense-based translation model .
Page 2, “Introduction”
0 Instead of using word senses defined by a prespecified sense inventory as the standard WSD does, we incorporate word senses that are automatically learned from data into our sense-based translation model .
Page 2, “Introduction”
We integrate the proposed sense-based translation model into a state-of-the-art SMT system and conduct experiments on Chines-to-English translation using large-scale training data.
Page 2, “Introduction”
Results show that automatically learned word senses are able to improve translation quality and the sense-based translation model is better than the previous reformulated WSD.
Page 2, “Introduction”
Section 3 presents our sense-based translation model .
Page 2, “Introduction”
Section 4 describes how we integrate the sense-based translation model into SMT.
Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

topic model

Appears in 17 sentences as: Topic model (1) topic model (8) topic modeling (7) topic models (2)

In A Sense-Based Translation Model for Statistical Machine Translation

We build a broad-coverage sense tagger based on a nonparametric Bayesian topic model that automatically learns sense clusters for words in the source language.
Page 1, “Abstract”
We use a nonparametric Bayesian topic model based WSI to infer word senses for source words in our training, development and test set.
Page 2, “Introduction”
Recently, we have also witnessed that WSI is cast as a topic modeling problem where the sense clusters of a word type are considered as underlying topics (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012).
Page 2, “WSI-Based Broad-Coverage Sense Tagger”
We follow this line to tailor a topic modeling framework to induce word senses for our large-scale training data.
Page 2, “WSI-Based Broad-Coverage Sense Tagger”
We can induce topics on this corpus for each pseudo document via topic modeling approaches.
Page 3, “WSI-Based Broad-Coverage Sense Tagger”
For ease of comparison, we roughly divide them into 4 categories: 1) WSD for SMT, 2) topic-based W81, 3) topic model for SMT and 4) lexical selection.
Page 8, “Related Work”
Brody and Lapata (2009)’s work is the first attempt to approach WSI Via topic modeling .
Page 8, “Related Work”
They adapt LDA to word sense induction by building one topic model per word type.
Page 8, “Related Work”
According to them, there are 3 significant differences between topic-based WSI and generic topic modeling .
Page 8, “Related Work”
However generic topic models aim at topic distributions of documents.
Page 8, “Related Work”
0 Second, generic topic modeling explores whole documents for topic inference while topic-based WSI uses much smaller units in a document (e. g., surrounding words of a target word) for word sense induction.
Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

MaxEnt

Appears in 14 sentences as: MaXEnt (1) MaxEnt (16) Maxent (1)

In A Sense-Based Translation Model for Statistical Machine Translation

entropy ( MaxEnt ) based classifier that is used to predict the translation probability p(é|C(c)).
Page 4, “Sense-Based Translation Model”
The MaxEnt classifier can be formulated as follows.
Page 4, “Sense-Based Translation Model”
This is not a issue for the MaxEnt classifier as it can deal with arbitrary overlapping features (Berger et al., 1996).
Page 4, “Sense-Based Translation Model”
where W is a set of words for which we build MaxEnt classifiers (see the next subsection for the discussion on how we build MaxEnt classifiers for our sense-based translation model).
Page 5, “Sense-Based Translation Model”
We can either build an all-in-one MaxEnt classifier that integrates all source word types 0 and their possible target translations 6 or build multiple MaxEnt classifiers.
Page 5, “Sense-Based Translation Model”
Therefore we take the second strategy: building multiple MaxEnt classifiers with one classifier per source word type.
Page 5, “Sense-Based Translation Model”
In practice, we do not build MaxEnt classifiers for source words that occur less than 10 times in the training data and run the MaxEnt toolkit in a parallel manner in order to expedite the training process.
Page 5, “Sense-Based Translation Model”
MaxEnt classifiers
Page 5, “Decoding with Sense-Based Translation Model”
Once we get sense clusters for word tokens in test sentences, we load pre-trained MaXEnt classifiers of the corresponding word types.
Page 5, “Decoding with Sense-Based Translation Model”
We trained our MaxEnt classifiers with the off-the-shelf MaxEnt tool.4 We performed 100 iterations of the L-BFGS algorithm implemented in the training toolkit on the collected training events from the sense-annotated data as described in Section 3.2.
Page 6, “Experiments”
It took an average of 57.5 seconds for training a Maxent classifier.
Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention MaxEnt.

See all papers in Proc. ACL that mention MaxEnt.

Back to top.

machine translation

Appears in 12 sentences as: Machine Translation (1) machine translation (11)

In A Sense-Based Translation Model for Statistical Machine Translation

In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation .
Page 1, “Abstract”
Our method is significantly different from preVious word sense disambiguation reformulated for machine translation in that the latter neglects word senses in nature.
Page 1, “Abstract”
In the context of machine translation , such different meanings normally produce different target translations.
Page 1, “Introduction”
Therefore a natural assumption is that word sense disambiguation (WSD) may contribute to statistical machine translation (SMT) by providing appropriate word senses for target translation selection with context features (Carpuat and Wu, 2005).
Page 1, “Introduction”
'or Statistical Machine Translation
Page 1, “Introduction”
We want to extend this hypothesis to machine translation by building sense-based translation model upon the HDP-based word sense induction: words with the same meanings tend to be translated in the same way.
Page 3, “WSI-Based Broad-Coverage Sense Tagger”
This suggests that automatically induced word senses alone are indeed useful for machine translation .
Page 7, “Experiments”
Xiong and Zhang (2013) employ a sentence-level topic model to capture coherence for document-level machine translation .
Page 9, “Related Work”
The difference between our work and these preVious studies on topic model for SMT lies in that we adopt topic-based WSI to obtain word senses rather than generic topics and integrate induced word senses into machine translation .
Page 9, “Related Work”
We have presented a sense-based translation model that integrates word senses into machine translation .
Page 9, “Conclusion”
0 Word senses automatically induced by the HDP-based WSI on large-scale training data are very useful for machine translation .
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation quality

Appears in 9 sentences as: translation quality (9)

In A Sense-Based Translation Model for Statistical Machine Translation

They report that WSD degenerates the translation quality of SMT.
Page 1, “Introduction”
With these word senses, we study in particular: 1) whether word senses can be directly integrated to SMT to improve translation quality and 2) whether WSI-based model can outperform the reformulated WSD in the context of SMT.
Page 2, “Introduction”
Results show that automatically learned word senses are able to improve translation quality and the sense-based translation model is better than the previous reformulated WSD.
Page 2, “Introduction”
Do word senses automatically induced by the HDP-based WSI improve translation quality ?
Page 6, “Experiments”
We evaluated translation quality with the case-insensitive BLEU-4 (Papineni et al., 2002) and NIST (Doddington, 2002).
Page 6, “Experiments”
Our second group of experiments were carried out to investigate whether the sense-base translation model is able to improve translation quality by comparing the system enhanced with our sense-based translation model against the baseline.
Page 7, “Experiments”
Our experiments show that such word senses are able to improve translation quality .
Page 8, “Related Work”
o The sense-based translation model is able to substantially improve translation quality in terms of both BLEU and NIST.
Page 9, “Conclusion”
To the best of our knowledge, this is the first attempt to empirically verify the positive impact of word senses on translation quality .
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

BLEU

Appears in 7 sentences as: BLEU (7)

In A Sense-Based Translation Model for Statistical Machine Translation

System BLEU (%) NIST STM (i5w) 34.64 9.4346 STM (i10w) 34.76 9.5114 STM (i15w) - -
Page 7, “Experiments”
System BLEU (%) NIST Base 33.53 9.0561 STM (sense) 34.15 9.2596 STM (sense+lexicon) 34.73 9.4184
Page 7, “Experiments”
System BLEU (%) NIST Base 33.53 9.0561 Reformulated WSD 34.16 9.3820 STM 34.73 9.4184
Page 7, “Experiments”
0 Our sense-based translation model achieves a substantial improvement of 1.2 BLEU points over the baseline.
Page 7, “Experiments”
0 If we only integrate sense features into the sense-based translation model, we can still outperform the baseline by 0.62 BLEU points.
Page 7, “Experiments”
From the table, we can find that the sense-based translation model outperforms the reformulated WSD by 0.57 BLEU points.
Page 7, “Experiments”
o The sense-based translation model is able to substantially improve translation quality in terms of both BLEU and NIST.
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

NIST

Appears in 6 sentences as: NIST (7)

In A Sense-Based Translation Model for Statistical Machine Translation

We used the NIST MT03 evaluation test data as our development set, and the NIST MT05 as the test set.
Page 6, “Experiments”
We evaluated translation quality with the case-insensitive BLEU-4 (Papineni et al., 2002) and NIST (Doddington, 2002).
Page 6, “Experiments”
System BLEU(%) NIST STM (i5w) 34.64 9.4346 STM (i10w) 34.76 9.5114 STM (i15w) - -
Page 7, “Experiments”
System BLEU(%) NIST Base 33.53 9.0561 STM (sense) 34.15 9.2596 STM (sense+lexicon) 34.73 9.4184
Page 7, “Experiments”
System BLEU(%) NIST Base 33.53 9.0561 Reformulated WSD 34.16 9.3820 STM 34.73 9.4184
Page 7, “Experiments”
o The sense-based translation model is able to substantially improve translation quality in terms of both BLEU and NIST .
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

LDA

Appears in 6 sentences as: LDA (6)

In A Sense-Based Translation Model for Statistical Machine Translation

We first describe WSI, especially WSI based on the Hierarchical Dirichlet Process (HDP) (Teh et al., 2004), a nonparametric version of Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003).
Page 2, “WSI-Based Broad-Coverage Sense Tagger”
The conventional topic distribution 6j for the j-th pseudo document is taken as the the distribution over senses for the given word type W. The LDA generative process for sense induction is as follows: 1) for each pseudo document Dj, draw a per-document sense distribution 63- from a Dirichlet distribution Dir(a); 2) for each item ww- in the pseudo document Dj, 2.1) draw a sense cluster sm- N Multinomial(6j); and 2.2) draw a word tum- ~ @st where cps“ is the distribution of sense 3M over words drawn from a Dirichlet distribution Dir(6
Page 3, “WSI-Based Broad-Coverage Sense Tagger”
As LDA needs to manually specify the number of senses (topics), a better idea is to let the training data automatically determine the number of senses for each word type.
Page 3, “WSI-Based Broad-Coverage Sense Tagger”
Therefore we resort to the HDP, a natural nonparametric generalization of LDA , for the inference of both sense clusters and the number of sense clusters following Lau et al.
Page 3, “WSI-Based Broad-Coverage Sense Tagger”
They adapt LDA to word sense induction by building one topic model per word type.
Page 8, “Related Work”
Comparing with macro topics of documents inferred by LDA with bag of words from the whole documents, word senses inferred by the HDP-based WSI can be considered as micro topics.
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

sense disambiguation

Appears in 5 sentences as: Sense Disambiguation (1) sense disambiguation (4)

In A Sense-Based Translation Model for Statistical Machine Translation

Our method is significantly different from preVious word sense disambiguation reformulated for machine translation in that the latter neglects word senses in nature.
Page 1, “Abstract”
Results show that the proposed model substantially outperforms not only the baseline but also the preVious reformulated word sense disambiguation .
Page 1, “Abstract”
Therefore a natural assumption is that word sense disambiguation (WSD) may contribute to statistical machine translation (SMT) by providing appropriate word senses for target translation selection with context features (Carpuat and Wu, 2005).
Page 1, “Introduction”
The biggest difference from word sense disambiguation lies in that WSI does not rely on a predefined sense inventory.
Page 2, “WSI-Based Broad-Coverage Sense Tagger”
5.5 Comparison to Word Sense Disambiguation
Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention sense disambiguation.

See all papers in Proc. ACL that mention sense disambiguation.

Back to top.

SMT system

Appears in 5 sentences as: SMT system (6)

In A Sense-Based Translation Model for Statistical Machine Translation

These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system .
Page 1, “Introduction”
We integrate the proposed sense-based translation model into a state-of-the-art SMT system and conduct experiments on Chines-to-English translation using large-scale training data.
Page 2, “Introduction”
Figure 2: Architecture of SMT system with the sense-based translation model.
Page 5, “Decoding with Sense-Based Translation Model”
Figure 2 shows the architecture of the SMT system enhanced with the sense-based translation model.
Page 5, “Decoding with Sense-Based Translation Model”
Our baseline system is a state-of-the-art SMT system which adapts Bracketing Transduction Grammars (Wu, 1997) to phrasal translation and equips itself with a maximum entropy based reordering model (Xiong et al., 2006).
Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

maximum entropy

Appears in 4 sentences as: maximum entropy (4)

In A Sense-Based Translation Model for Statistical Machine Translation

The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers.
Page 1, “Abstract”
In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers.
Page 2, “Introduction”
Our baseline system is a state-of-the-art SMT system which adapts Bracketing Transduction Grammars (Wu, 1997) to phrasal translation and equips itself with a maximum entropy based reordering model (Xiong et al., 2006).
Page 6, “Experiments”
We incorporate these learned word senses as translation evidences into maximum entropy classifiers which form the
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

language model

Appears in 4 sentences as: language model (4)

In A Sense-Based Translation Model for Statistical Machine Translation

error rate training (MERT) (Och, 2003) together with other models such as the language model .
Page 5, “Decoding with Sense-Based Translation Model”
We trained a 5-gram language model on the Xinhua section of the English Gigaword corpus (306 million words) using the SRILM toolkit (Stolcke, 2002) with the modified Kneser—Ney smoothing (Chen and Goodman, 1996).
Page 6, “Experiments”
(2007) also explore a bilingual topic model for translation and language model adaptation.
Page 9, “Related Work”
Additionally, we also want to induce sense clusters for words in the target language so that we can build sense-based language model and integrate it into SMT.
Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

context information

Appears in 4 sentences as: context information (2) contextual information (2)

In A Sense-Based Translation Model for Statistical Machine Translation

A pseudo document is composed of either a bag of neighboring words of a word token, or the Part-to-Speech tags of neighboring words, or other contextual information elements.
Page 2, “WSI-Based Broad-Coverage Sense Tagger”
The sense-based translation model estimates the probability that a source word 0 is translated into a target phrase 6 given contextual information , including word senses that are obtained using the HDP-based WSI as described in the last section.
Page 4, “Sense-Based Translation Model”
Rather than predicting word senses for ambiguous words, the reformulated WSD directly predicts target translations for source words with context information .
Page 8, “Related Work”
Lexical selection Our work is also related to lexical selection in SMT where appropriate target lexical items for source words are selected by a statistical model with context information (Bangalore et al., 2007; Mauser et al., 2009).
Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention context information.

See all papers in Proc. ACL that mention context information.

Back to top.

statistical machine translation

Appears in 3 sentences as: Statistical Machine Translation (1) statistical machine translation (2)

In A Sense-Based Translation Model for Statistical Machine Translation

In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation .
Page 1, “Abstract”
Therefore a natural assumption is that word sense disambiguation (WSD) may contribute to statistical machine translation (SMT) by providing appropriate word senses for target translation selection with context features (Carpuat and Wu, 2005).
Page 1, “Introduction”
'or Statistical Machine Translation
Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.

development set

Appears in 3 sentences as: development set (3)

In A Sense-Based Translation Model for Statistical Machine Translation

We used the NIST MT03 evaluation test data as our development set , and the NIST MT05 as the test set.
Page 6, “Experiments”
Table 4: Experiment results of the sense-based translation model (STM) with lexicon and sense features extracted from a window of size varying from $5 to $15 words on the development set .
Page 7, “Experiments”
Our first group of experiments were conducted to investigate the impact of the window size k on translation performance in terms of BLEUMIST on the development set .
Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

translation task

Appears in 3 sentences as: translation task (3)

In A Sense-Based Translation Model for Statistical Machine Translation

We test the effectiveness of the proposed sense-based translation model on a large-scale Chinese-to-English translation task .
Page 1, “Abstract”
They show that such a reformulated WSD can improve the accuracy of a simplified word translation task .
Page 1, “Introduction”
Section 5 elaborates our experiments on the large-scale Chinese-to-English translation task .
Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention translation task.

See all papers in Proc. ACL that mention translation task.

Back to top.

word alignments

Appears in 3 sentences as: word alignments (3)

In A Sense-Based Translation Model for Statistical Machine Translation

During decoding, we keep word alignments for each translation rule.
Page 5, “Decoding with Sense-Based Translation Model”
Whenever a new source word 0 is translated, we find its translation 6 Via the kept word alignments .
Page 5, “Decoding with Sense-Based Translation Model”
We ran Giza++ on the training data in two directions and applied the “grow-diag-final” refinement rule (Koehn et al., 2003) to obtain word alignments .
Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

BLEU points

Appears in 3 sentences as: BLEU points (3)

In A Sense-Based Translation Model for Statistical Machine Translation

0 Our sense-based translation model achieves a substantial improvement of 1.2 BLEU points over the baseline.
Page 7, “Experiments”
0 If we only integrate sense features into the sense-based translation model, we can still outperform the baseline by 0.62 BLEU points .
Page 7, “Experiments”
From the table, we can find that the sense-based translation model outperforms the reformulated WSD by 0.57 BLEU points .
Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.