Bilingual Sense Similarity for Statistical Machine Translation
Chen, Boxing and Foster, George and Kuhn, Roland

Article Structure

Abstract

This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.)

Introduction

The sense of a term can generally be inferred from its context.

Hierarchical phrase-based MT system

The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).

Bag-of-Words Vector Space Model

To compute the sense similarity via VSM, we follow the previous work (Lin, 1998) and represent the source and target side of a rule by feature vectors.

Similarity Functions

There are many possibilities for calculating similarities between bags-of-words in different languages.

Experiments

We evaluate the algorithm of bilingual sense similarity via machine translation.

Analysis and Discussion

6.1 Effect of Single Features

Related Work

There has been extensive work on incorporating semantics into SMT.

Conclusions and Future Work

In this paper, we have proposed an approach that uses the vector space model to compute the sense

Topics

machine translation

Appears in 11 sentences as: Machine Translation (1) machine translation (10)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs.
    Page 1, “Abstract”
  2. Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
    Page 1, “Abstract”
  3. tatistical Machine Translation
    Page 1, “Introduction”
  4. Is it useful for multilingual applications, such as statistical machine translation (SMT)?
    Page 1, “Introduction”
  5. The source and target sides of the rules with (*) at the end are not semantically equivalent; it seems likely that measuring the semantic similarity from their context between the source and target sides of rules might be helpful to machine translation .
    Page 2, “Introduction”
  6. Second, we use the sense similarities between the source and target sides of a translation rule to improve statistical machine translation performance.
    Page 2, “Introduction”
  7. Our contribution includes proposing new bilingual sense similarity algorithms and applying them to machine translation .
    Page 2, “Introduction”
  8. We evaluate the algorithm of bilingual sense similarity via machine translation .
    Page 5, “Experiments”
  9. For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
    Page 5, “Experiments”
  10. similarity for terms from parallel corpora and applied it to statistical machine translation .
    Page 9, “Conclusions and Future Work”
  11. We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation .
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

NIST

Appears in 11 sentences as: NIST (13)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. The first one is the large data condition, based on training data for the NIST 2 2009 evaluation Chinese-to-English track.
    Page 5, “Experiments”
  2. We first created a development set which used mainly data from the NIST 2005 test set, and also some balanced-genre web-text from the NIST training material.
    Page 5, “Experiments”
  3. Evaluation was performed on the NIST 2006 and 2008 test sets.
    Page 5, “Experiments”
  4. Table 2: Results (BLEU%) of small data Chinese-to-English NIST task.
    Page 6, “Experiments”
  5. Table 3: Results (BLEU%) of large data Chinese-to-English NIST task and German-to—English WMT task.
    Page 6, “Experiments”
  6. The Alg2 cosine similarity function got 0.7 BLEU-score (p<0.01) improvement over the baseline for NIST 2006 test set, and a 0.5 BLEU—score (p<0.05) for NIST 2008 test set.
    Page 6, “Experiments”
  7. Table 3 reports the performance of Alg2 on Chinese-to-English NIST large data condition and German-to-English WMT task.
    Page 6, “Experiments”
  8. CE_LD CE_SD testset ( NIST ) ’06 ’08 ’06 ’08
    Page 7, “Analysis and Discussion”
  9. Table 4: Results (BLEU%) of Chinese—to—English large data (CE_LD) and small data (CE_SD) NIST task by applying one feature.
    Page 7, “Analysis and Discussion”
  10. Table 6: Results (BLEU%) of using simple features based on context on small data NIST task.
    Page 7, “Analysis and Discussion”
  11. For Chinese—to—English NIST task, there are about 1% of the rules that do not have contexts; for German—to—English task, this number is about 0.4%.
    Page 8, “Analysis and Discussion”

See all papers in Proc. ACL 2010 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

similarity scores

Appears in 9 sentences as: similarity score (2) Similarity scores (1) similarity scores (6)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. The sense similarity scores are computed by using the vector space model.
    Page 1, “Abstract”
  2. Similarity scores are used as additional features of the translation model to improve translation performance.
    Page 1, “Abstract”
  3. The sense similarity scores are used as feature functions in the translation model.
    Page 5, “Experiments”
  4. In Alg2, the similarity score consists of three parts as in Equation (14): sim(Cf“”,C;""c) , sim(Cf‘”,C§""c) , and sim(C§°oc,C:""C) ; where sim(CJf.0°C,C:0“) could be computed by IBM model 1 probabilities simIBM(C;0“,C:OOC) or cosine distance similarity function simCOS(C;OOC,C:W) .
    Page 6, “Analysis and Discussion”
  5. The monolingual similarity scores give it the ability to avoid “dangerous” words, and choose alternatives (such as larger phrase translations) when available.
    Page 7, “Analysis and Discussion”
  6. We then combine the two similarity scores by using both of them as features to see if we could obtain further improvement.
    Page 7, “Analysis and Discussion”
  7. Table 5: Results (BLEU%) for combination of two similarity scores .
    Page 7, “Analysis and Discussion”
  8. We assign a uniform number as their bilingual sense similarity score , and this number is tuned through MERT.
    Page 8, “Analysis and Discussion”
  9. X I“ the X of, will have very diffuse “meanings” and therefore lower similarity scores .
    Page 8, “Analysis and Discussion”

See all papers in Proc. ACL 2010 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.

vector space

Appears in 9 sentences as: Vector Space (1) vector space (8)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. The sense similarity scores are computed by using the vector space model.
    Page 1, “Abstract”
  2. Given two terms to be compared, one first extracts various features for each term from their contexts in a corpus and forms a vector space model (VSM); then, one computes their similarity by using similarity functions.
    Page 1, “Introduction”
  3. Use of the vector space model to compute sense similarity has also been adapted to the multilingual condition, based on the assumption that two terms with similar meanings often occur in comparable contexts across languages.
    Page 1, “Introduction”
  4. 4.2 Vector Space Mapping
    Page 3, “Similarity Functions”
  5. A common way to calculate semantic similarity is by vector space cosine distance; we will also
    Page 3, “Similarity Functions”
  6. Fung (1998) and Rapp (1999) map the vector one-dimension-to-one-dimension (a context word is a dimension in each vector space ) from one language to another language via an initial bilingual dictionary.
    Page 4, “Similarity Functions”
  7. We follow (Zhao et al., 2004) to do vector space mapping.
    Page 4, “Similarity Functions”
  8. Therefore, we map all vectors in target language into the vector space in the source language.
    Page 4, “Similarity Functions”
  9. In this paper, we have proposed an approach that uses the vector space model to compute the sense
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

translation model

Appears in 7 sentences as: translation model (7) translation modeling (1)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. Similarity scores are used as additional features of the translation model to improve translation performance.
    Page 1, “Abstract”
  2. the translation probabilities in a translation model , for units from parallel corpora are mainly based on the co-occurrence counts of the two units.
    Page 1, “Introduction”
  3. The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).
    Page 2, “Hierarchical phrase-based MT system”
  4. The sense similarity scores are used as feature functions in the translation model .
    Page 5, “Experiments”
  5. In particular, all the allowed bilingual corpora except the UN corpus and Hong Kong Hansard corpus have been used for estimating the translation model .
    Page 5, “Experiments”
  6. The second one is the small data condition where only the FBIS3 corpus is used to train the translation model .
    Page 5, “Experiments”
  7. For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
    Page 5, “Experiments”

See all papers in Proc. ACL 2010 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (6)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. Our evaluation metric is IBM BLEU (Papineni et al., 2002), which performs case-insensitive matching of n- grams up to n = 4.
    Page 5, “Experiments”
  2. Table 2: Results ( BLEU %) of small data Chinese-to-English NIST task.
    Page 6, “Experiments”
  3. Table 3: Results ( BLEU %) of large data Chinese-to-English NIST task and German-to—English WMT task.
    Page 6, “Experiments”
  4. Table 4: Results ( BLEU %) of Chinese—to—English large data (CE_LD) and small data (CE_SD) NIST task by applying one feature.
    Page 7, “Analysis and Discussion”
  5. Table 5: Results ( BLEU %) for combination of two similarity scores.
    Page 7, “Analysis and Discussion”
  6. Table 6: Results ( BLEU %) of using simple features based on context on small data NIST task.
    Page 7, “Analysis and Discussion”

See all papers in Proc. ACL 2010 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

parallel corpora

Appears in 6 sentences as: parallel corpora (6)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. from parallel corpora .
    Page 1, “Abstract”
  2. However, there is no previous work that uses the VSM to compute sense similarity for terms from parallel corpora .
    Page 1, “Introduction”
  3. the translation probabilities in a translation model, for units from parallel corpora are mainly based on the co-occurrence counts of the two units.
    Page 1, “Introduction”
  4. Therefore, questions emerge: how good is the sense similarity computed via VSM for two units from parallel corpora ?
    Page 1, “Introduction”
  5. similarity for terms from parallel corpora and applied it to statistical machine translation.
    Page 9, “Conclusions and Future Work”
  6. We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention parallel corpora.

See all papers in Proc. ACL that mention parallel corpora.

Back to top.

phrase-based

Appears in 6 sentences as: phrase-based (7)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
    Page 1, “Abstract”
  2. We chose a hierarchical phrase-based SMT system as our baseline; thus, the units involved in computation of sense similarities are hierarchical rules.
    Page 2, “Introduction”
  3. The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).
    Page 2, “Hierarchical phrase-based MT system”
  4. Empirically, this method has yielded better performance on language pairs such as Chinese-English than the phrase-based method because it permits phrases with gaps; it generalizes the normal phrase-based models in a way that allows long-distance reordering (Chiang, 2005; Chiang, 2007).
    Page 2, “Hierarchical phrase-based MT system”
  5. In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
    Page 2, “Bag-of-Words Vector Space Model”
  6. For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
    Page 5, “Experiments”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

semantic similarity

Appears in 6 sentences as: semantic similarity (9)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. The source and target sides of the rules with (*) at the end are not semantically equivalent; it seems likely that measuring the semantic similarity from their context between the source and target sides of rules might be helpful to machine translation.
    Page 2, “Introduction”
  2. A common way to calculate semantic similarity is by vector space cosine distance; we will also
    Page 3, “Similarity Functions”
  3. Therefore, on top of the degree of bilingual semantic similarity between a source and a target translation unit, we have also incorporated the monolingual semantic similarity between all occurrences of a source or target unit, and that unit’s occurrence as part of the given rule, into the sense similarity measure.
    Page 5, “Similarity Functions”
  4. The improved similarity function Alg2 makes it possible to incorporate monolingual semantic similarity on top of the bilingual semantic similarity , thus it may improve the accuracy of the similarity estimate.
    Page 6, “Experiments”
  5. Our aim in this paper is to characterize the semantic similarity of bilingual hierarchical rules.
    Page 8, “Analysis and Discussion”
  6. Our work is different from all the above approaches in that we attempt to discriminate among hierarchical rules based on: 1) the degree of bilingual semantic similarity between source and target translation units; and 2) the monolingual semantic similarity between occurrences of source or target units as part of the given rule, and in general.
    Page 8, “Related Work”

See all papers in Proc. ACL 2010 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

statistical machine translation

Appears in 5 sentences as: statistical machine translation (5)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs.
    Page 1, “Abstract”
  2. Is it useful for multilingual applications, such as statistical machine translation (SMT)?
    Page 1, “Introduction”
  3. Second, we use the sense similarities between the source and target sides of a translation rule to improve statistical machine translation performance.
    Page 2, “Introduction”
  4. similarity for terms from parallel corpora and applied it to statistical machine translation .
    Page 9, “Conclusions and Future Work”
  5. We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation .
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.

phrase pair

Appears in 4 sentences as: phrase pair (4)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005).
    Page 2, “Bag-of-Words Vector Space Model”
  2. Consider a rule with non-terminals on the source and target side; for a given instance of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals.
    Page 2, “Bag-of-Words Vector Space Model”
  3. In turn, the context for the sub-phrases that instantiate the non-terminals will be the words in the remainder of the phrase pair .
    Page 2, “Bag-of-Words Vector Space Model”
  4. have an initial phrase pair fZ/Z $7,?
    Page 3, “Bag-of-Words Vector Space Model”

See all papers in Proc. ACL 2010 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

Significant improvements

Appears in 4 sentences as: significant improvement (1) Significant improvements (1) significant improvements (1) significantly improved (1)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
    Page 1, “Abstract”
  2. Alg2 significantly improved the performance over the baseline.
    Page 6, “Experiments”
  3. We can see that IBM model 1 and cosine distance similarity function both obtained significant improvement on all test sets of the two tasks.
    Page 6, “Experiments”
  4. We saw that the bilingual sense similarity computed by our algorithm led to significant improvements .
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention Significant improvements.

See all papers in Proc. ACL that mention Significant improvements.

Back to top.

language models

Appears in 3 sentences as: language model (1) language models (2)
In Bilingual Sense Similarity for Statistical Machine Translation
  1. We trained two language models : the first one is a 4-gram LM which is estimated on the target side of the texts used in the large data condition.
    Page 5, “Experiments”
  2. Both language models are used for both tasks.
    Page 5, “Experiments”
  3. Only the target-language half of the parallel training data are used to train the language model in this task.
    Page 5, “Experiments”

See all papers in Proc. ACL 2010 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.