Response-based Learning for Grounded Machine Translation
Riezler, Stefan and Simianer, Patrick and Haas, Carolin

Article Structure

Abstract

We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input.

Introduction

In this paper, we propose a novel approach for learning and evaluation in statistical machine translation (SMT) that borrows ideas from response-based learning for grounded semantic parsing.

Related Work

The key idea of grounded language learning is to study natural language in the context of a nonlinguistic environment, in which meaning is grounded in perception and/or action.

Grounding SMT in Semantic Parsing

In this paper, we present a proof-of-concept of our ideas of embedding SMT into simulated world environments as used in semantic parsing.

Response-based Online Learning

Recent approaches to machine learning for SMT formalize the task of discriminating good from bad translations as a structured prediction problem.

Experiments

5.1 Experimental Setup

Conclusion

We presented a proposal for a new learning and evaluation framework for SMT.

Topics

semantic parsing

Appears in 26 sentences as: semantic parse (6) semantic parser (8) semantic parses (5) semantic parsing (9)
In Response-based Learning for Grounded Machine Translation
  1. We show how to generate responses by grounding SMT in the task of executing a semantic parse of a translated query against a database.
    Page 1, “Abstract”
  2. Experiments on the GEOQUERY database show an improvement of about 6 points in Fl-score for response-based learning over learning from references only on returning the correct answer from a semantic parse of a translated query.
    Page 1, “Abstract”
  3. In this paper, we propose a novel approach for learning and evaluation in statistical machine translation (SMT) that borrows ideas from response-based learning for grounded semantic parsing .
    Page 1, “Introduction”
  4. Building on prior work in grounded semantic parsing, we generate translations of queries, and receive feedback by executing semantic parses of translated queries against the database.
    Page 2, “Introduction”
  5. Successful response is defined as receiving the same answer from the semantic parses for the translation and the original query.
    Page 2, “Introduction”
  6. For example, in semantic parsing , the learning goal is to produce and successfully execute a meaning representation.
    Page 2, “Related Work”
  7. Recent attempts to learn semantic parsing from question-answer pairs without recurring to annotated logical forms have been presented by Kwiatowski et al.
    Page 2, “Related Work”
  8. The algorithms presented in these works are variants of structured prediction that take executability of semantic parses into account.
    Page 2, “Related Work”
  9. In this paper, we present a proof-of-concept of our ideas of embedding SMT into simulated world environments as used in semantic parsing .
    Page 3, “Grounding SMT in Semantic Parsing”
  10. Embedding SMT in a semantic parsing scenario means to define translation quality by the ability of a semantic parser to construct a meaning representation from the translated query, which returns the correct answer when executed against the database.
    Page 3, “Grounding SMT in Semantic Parsing”
  11. The diagram in Figure 1 gives a sketch of response-based learning from semantic parsing in the geographical domain.
    Page 3, “Grounding SMT in Semantic Parsing”

See all papers in Proc. ACL 2014 that mention semantic parsing.

See all papers in Proc. ACL that mention semantic parsing.

Back to top.

BLEU

Appears in 14 sentences as: BLEU (16)
In Response-based Learning for Grounded Machine Translation
  1. Computation of distance to the reference translation usually involves cost functions based on sentence-level BLEU (Nakov et al.
    Page 4, “Response-based Online Learning”
  2. In addition, we can use translation-specific cost functions based on sentence-level BLEU in order to boost similarity of translations to human reference translations.
    Page 5, “Response-based Online Learning”
  3. Our cost function c(y(i), y) = (l — BLEU(y(i), is based on a version of sentence-level BLEU Nakov et al.
    Page 5, “Response-based Online Learning”
  4. The cost function can be implemented by different versions of sentence-wise BLEU , or it can be omitted completely so that learning relies on task-based feedback alone, similar to algorithms recently suggested for semantic parsing (Goldwasser and Roth, 2013; Kwiatowski et al., 2013; Berant et al., 2013).
    Page 5, “Response-based Online Learning”
  5. call F1 BLEU 1.21 60.82 46.53 $.57 66.791 48.001 L64 72.4912 56.6412 3.36 78.15123 55.6612
    Page 6, “Experiments”
  6. :call F1 BLEU 7.86 61.48 46.53 1.79 64.07 46.00 3.57 65.56 55.6712 7.14 68.8612 55.6712
    Page 6, “Experiments”
  7. Method 4, named REBOL, implements REsponse-Based Online Learning by instantiating y+ and y‘ to the form described in Section 4: In addition to the model score 3, it uses a cost function 0 based on sentence-level BLEU (Nakov et al., 2012) and tests translation hypotheses for task-based feedback using a binary execution function 6.
    Page 7, “Experiments”
  8. We report BLEU (Papineni et al., 2001) of translation system output measured against the original English queries.
    Page 7, “Experiments”
  9. We present an experimental comparison of the four different systems according to BLEU and
    Page 7, “Experiments”
  10. In terms of BLEU score measured against the original English GEOQUERY queries, the best nominal result is obtained by RAMPION which uses them as reference translations.
    Page 8, “Experiments”
  11. REBOL performs worse since BLEU performance is optimized only implicitly in cases where original English queries function as positive examples.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

SMT system

Appears in 10 sentences as: SMT system (8) SMT systems (2)
In Response-based Learning for Grounded Machine Translation
  1. This avoids the problem of un-reachability of independently generated reference translations by the SMT system .
    Page 1, “Introduction”
  2. in which SMT systems can be trained and evaluated.
    Page 2, “Introduction”
  3. (2012) propose a setup where an SMT system feeds into cross-language information retrieval, and receives feedback from the performance of translated queries with respect to cross-language retrieval performance.
    Page 2, “Related Work”
  4. However, despite offering direct and reliable prediction of translation quality, the cost and lack of reusability has confined task-based evaluations involving humans to testing scenarios, but prevented a use for interactive training of SMT systems as in our work.
    Page 3, “Related Work”
  5. Given a manual German translation of the English query as source sentence, the SMT system produces an English target translation.
    Page 3, “Grounding SMT in Semantic Parsing”
  6. Firstly, update rules that require to compute a feature representation for the reference translation are suboptimal in SMT, because often human-generated reference translations cannot be generated by the SMT system .
    Page 4, “Response-based Online Learning”
  7. Such “un-reachable” gold-standard translations need to be replaced by “surrogate” gold-standard translations that are close to the human-generated translations and still lie within the reach of the SMT system .
    Page 4, “Response-based Online Learning”
  8. The bilingual SMT system used in our experiments is the state-of-the-art SCFG decoder CDEC (Dyer et al., 2010)5.
    Page 6, “Experiments”
  9. We trained the SMT system on the English-German parallel web data provided in the COMMON CRAWL6 (Smith et al., 2013) dataset.
    Page 6, “Experiments”
  10. Method 1 is the baseline system, consisting of the CDEC SMT system trained on the COMMON CRAWL data as described above.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

gold standard

Appears in 7 sentences as: gold standard (7)
In Response-based Learning for Grounded Machine Translation
  1. Since there are many possible correct parses, matching against a single gold standard falls short of grounding in a nonlinguistic environment.
    Page 2, “Related Work”
  2. Positive feedback means that the correct answer is received, i.e., exec(pg) ; exec(ph) indicates that the same answer is received from the gold standard parse pg and the parse for the hypothesis translation ph; negative feedback results in case a different or no answer is received.
    Page 3, “Grounding SMT in Semantic Parsing”
  3. is the possibility to receive positive feedback even from predictions that differ from gold standard reference translations, but yet receive the correct answer when parsed and matched against the database.
    Page 4, “Grounding SMT in Semantic Parsing”
  4. We denote feedback by a binary execution function e(y) 6 {1,0} that tests whether executing the semantic parse for the prediction against the database receives the same answer as the parse for the gold standard reference.
    Page 5, “Response-based Online Learning”
  5. Parser training includes GEOQUERY test data in order to be less dependent on parse and execution failures in the evaluation: If a translation system, response-based or reference-based, translates the German input into the gold standard English query it should be rewarded by positive task feedback.
    Page 6, “Experiments”
  6. Table 3 shows examples where the translation predicted by response-based learning (REBOL) differs from the gold standard reference translation, but yet leads to positive feedback via a parse that returns the correct answer from the database.
    Page 9, “Experiments”
  7. Table 4 shows examples where translations from REBOL and RAMPION differ from the gold standard reference, and predictions by REBOL lead to positive feedback, while predictions by RAMPION lead to negative feedback.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

statistically significant

Appears in 7 sentences as: Statistical significance (1) statistically significant (7)
In Response-based Learning for Grounded Machine Translation
  1. Statistical significance is measured using Approximate Randomization (Noreen, 1989) where result differences with a p-value smaller than 0.05 are considered statistically significant .
    Page 7, “Experiments”
  2. All result differences are statistically significant .
    Page 8, “Experiments”
  3. ever, the result differences between these two systems do not score as statistically significant .
    Page 8, “Experiments”
  4. This result difference is statistically significant .
    Page 8, “Experiments”
  5. However, the respective methods are separated only by 3 or less points in F1 score such that only the result difference of REBOL over the baseline CDEC and over EXEC is statistically significant .
    Page 8, “Experiments”
  6. In terms of BLEU measured against the original queries, the result differences between REBOL and RAMPION are not statistically significant , and neither are the result differences between EXEC and CDEC.
    Page 8, “Experiments”
  7. The result differences between systems of the former group and the systems of latter group are statistically significant .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

sentence-level

Appears in 5 sentences as: sentence-level (5)
In Response-based Learning for Grounded Machine Translation
  1. Computation of distance to the reference translation usually involves cost functions based on sentence-level BLEU (Nakov et al.
    Page 4, “Response-based Online Learning”
  2. In addition, we can use translation-specific cost functions based on sentence-level BLEU in order to boost similarity of translations to human reference translations.
    Page 5, “Response-based Online Learning”
  3. Our cost function c(y(i), y) = (l — BLEU(y(i), is based on a version of sentence-level BLEU Nakov et al.
    Page 5, “Response-based Online Learning”
  4. Method 4, named REBOL, implements REsponse-Based Online Learning by instantiating y+ and y‘ to the form described in Section 4: In addition to the model score 3, it uses a cost function 0 based on sentence-level BLEU (Nakov et al., 2012) and tests translation hypotheses for task-based feedback using a binary execution function 6.
    Page 7, “Experiments”
  5. This can be attributed to the use of sentence-level BLEU as cost function in RAMPION and REBOL.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

gold-standard

Appears in 4 sentences as: gold-standard (5)
In Response-based Learning for Grounded Machine Translation
  1. Such “un-reachable” gold-standard translations need to be replaced by “surrogate” gold-standard translations that are close to the human-generated translations and still lie within the reach of the SMT system.
    Page 4, “Response-based Online Learning”
  2. Applied to SMT, this means that we predict translations and use positive response from acting in the world to create “surrogate” gold-standard translations.
    Page 4, “Response-based Online Learning”
  3. We need to ensure that gold-standard translations lead to positive task-based feedback, that means they can
    Page 4, “Response-based Online Learning”
  4. Define y+ as a surrogate gold-standard translation that receives positive feedback, has a high model
    Page 5, “Response-based Online Learning”

See all papers in Proc. ACL 2014 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

machine translation

Appears in 4 sentences as: machine translation (4)
In Response-based Learning for Grounded Machine Translation
  1. We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input.
    Page 1, “Abstract”
  2. In this paper, we propose a novel approach for learning and evaluation in statistical machine translation (SMT) that borrows ideas from response-based learning for grounded semantic parsing.
    Page 1, “Introduction”
  3. We suggest that in a similar way the preservation of meaning in machine translation should be defined in the context of an interaction in an extrinsic task.
    Page 1, “Introduction”
  4. Interactive scenarios have been used for evaluation purposes of translation systems for nearly 50 years, especially using human reading comprehension testing (Pfafflin, 1965; Fuji, 1999; Jones et al., 2005), and more recently, using face-to-face conversation mediated via machine translation (Sakamoto et al., 2013).
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation system

Appears in 4 sentences as: translation system (3) translation systems (1)
In Response-based Learning for Grounded Machine Translation
  1. Interactive scenarios have been used for evaluation purposes of translation systems for nearly 50 years, especially using human reading comprehension testing (Pfafflin, 1965; Fuji, 1999; Jones et al., 2005), and more recently, using face-to-face conversation mediated via machine translation (Sakamoto et al., 2013).
    Page 3, “Related Work”
  2. Parser training includes GEOQUERY test data in order to be less dependent on parse and execution failures in the evaluation: If a translation system , response-based or reference-based, translates the German input into the gold standard English query it should be rewarded by positive task feedback.
    Page 6, “Experiments”
  3. We report BLEU (Papineni et al., 2001) of translation system output measured against the original English queries.
    Page 7, “Experiments”
  4. Furthermore, we report precision, recall, and Fl-score for executing semantic parses built from translation system outputs against the GEOQUERY database.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

logical forms

Appears in 3 sentences as: logical forms (4)
In Response-based Learning for Grounded Machine Translation
  1. Recent attempts to learn semantic parsing from question-answer pairs without recurring to annotated logical forms have been presented by Kwiatowski et al.
    Page 2, “Related Work”
  2. (2012).3 The dataset includes 880 English questions and their logical forms .
    Page 6, “Experiments”
  3. This parser is itself based on SMT, trained on parallel data consisting of English queries and linearized logical forms, and on a language model trained on linearized logical forms .
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention logical forms.

See all papers in Proc. ACL that mention logical forms.

Back to top.

meaning representation

Appears in 3 sentences as: meaning representation (3)
In Response-based Learning for Grounded Machine Translation
  1. For example, in semantic parsing, the learning goal is to produce and successfully execute a meaning representation .
    Page 2, “Related Work”
  2. Embedding SMT in a semantic parsing scenario means to define translation quality by the ability of a semantic parser to construct a meaning representation from the translated query, which returns the correct answer when executed against the database.
    Page 3, “Grounding SMT in Semantic Parsing”
  3. (2010) or Goldwasser and Roth (2013) describe a response-driven learning framework for the area of semantic parsing: Here a meaning representation is “tried out” by itera-tively generating system outputs, receiving feedback from world interaction, and updating the model parameters.
    Page 4, “Response-based Online Learning”

See all papers in Proc. ACL 2014 that mention meaning representation.

See all papers in Proc. ACL that mention meaning representation.

Back to top.

model parameters

Appears in 3 sentences as: model parameters (3)
In Response-based Learning for Grounded Machine Translation
  1. Here, leam-ing proceeds by “trying out” translation hypotheses, receiving a response from interacting in the task, and converting this response into a supervision signal for updating model parameters .
    Page 1, “Introduction”
  2. (2010) or Goldwasser and Roth (2013) describe a response-driven learning framework for the area of semantic parsing: Here a meaning representation is “tried out” by itera-tively generating system outputs, receiving feedback from world interaction, and updating the model parameters .
    Page 4, “Response-based Online Learning”
  3. This can explain the success of response-based learning: Lexical and structural variants of reference translations can be used to boost model parameters towards translations with positive feedback, while the same translations might be considered as negative examples in standard structured learning.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.

model score

Appears in 3 sentences as: model score (3)
In Response-based Learning for Grounded Machine Translation
  1. (2012), inter alia) and incorporates the current model score , leading to various ramp loss objectives described in Gimpel and Smith (2012).
    Page 4, “Response-based Online Learning”
  2. The opposite of y+ is the translation y‘ that leads to negative feedback, has a high model score , and a high cost.
    Page 5, “Response-based Online Learning”
  3. Method 4, named REBOL, implements REsponse-Based Online Learning by instantiating y+ and y‘ to the form described in Section 4: In addition to the model score 3, it uses a cost function 0 based on sentence-level BLEU (Nakov et al., 2012) and tests translation hypotheses for task-based feedback using a binary execution function 6.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.