Using subcategorization knowledge to improve case prediction for translation to German
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine

Article Structure

Abstract

This paper demonstrates the need and impact of subcategorization information for SMT.

Introduction

When translating from a morphologically poor language to a morphologically rich language we are faced with two major problems: (i) the richness of the target-language morphology causes data sparsity problems, and (ii) information about morphological features on the target side is not sufficiently contained in the source language morphology.

Previous work

Previous work has already introduced the idea of generating inflected forms as a postprocessing step for a translation system that has been stripped of (most) target-language-specific features.

Translation pipeline

This section presents an overview of our two-step translation process.

Using subcategorization information

Within the area of (automatic) lexical acquisition, the definition of lexical verb information has been a major focus, because verbs play a central role for the structure and the meaning of sentences and

Experiments and evaluation

In this section, we present experiments using different feature combinations.

Conclusion

We illustrated the necessity of using external knowledge sources like subcategorization information for modelling case for English to German translation.

Topics

BLEU

Appears in 8 sentences as: BLEU (8)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. We present three types of evaluation: BLEU scores (Papineni et al., 2001), prediction accuracy on clean data and a manual evaluation of the best system in section 5.3.
    Page 7, “Experiments and evaluation”
  2. Table 5 gives results in case-insensitive BLEU .
    Page 7, “Experiments and evaluation”
  3. While the inflection prediction systems (1-4) are significantly12 better than the surface-form system (0), the different versions of the inflection systems are not distinguishable in terms of BLEU ; however, our manual evaluation shows that the new features have a positive impact on translation quality.
    Page 7, “Experiments and evaluation”
  4. 11Using tuples extracted from the target-side parse tree (produced by the decoder) results in a BLEU score of 14.00.
    Page 7, “Experiments and evaluation”
  5. One problem with using BLEU as an evaluation metric is that it is a precision-oriented metric and tends to reward fluency rather than adequacy (see (Wu and Fung, 2009a; Liu and Gildea, 2010)).
    Page 8, “Experiments and evaluation”
  6. As we are working on improving adequacy, this will not be fully reflected by BLEU .
    Page 8, “Experiments and evaluation”
  7. While the case marking of NPs is essential for comprehensibility, one changed word per noun phrase is hardly enough to be reflected by BLEU .
    Page 8, “Experiments and evaluation”
  8. This shows that the additional features improve the model, but also that a gain in prediction accuracy on clean data is not necessarily related to a gain in BLEU .
    Page 8, “Experiments and evaluation”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

CRF

Appears in 8 sentences as: CRF (8)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. For case prediction, they trained a CRF with access to lemmas and POS-tags within a given window.
    Page 3, “Previous work”
  2. While the CRF used for case prediction in Fraser et al.
    Page 3, “Previous work”
  3. Morphological features are predicted on four separate CRF models, one for each feature.
    Page 4, “Translation pipeline”
  4. Instead, we limit the power of the CRF model through experimenting with the removal of features, until we had a system that was robust to this problem.
    Page 4, “Translation pipeline”
  5. subject, object or modifier) are passed on to the system at the level of nouns and integrated into the CRF through the derived probabilities.
    Page 5, “Using subcategorization information”
  6. The classification task of the CRF consists in predicting a sequence of labels: case values for NPsflDPs or no value otherwise, cf.
    Page 6, “Using subcategorization information”
  7. In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,
    Page 6, “Using subcategorization information”
  8. By providing the parts of the tuple as unigrams, bigrams or trigrams to the CRF , all relevant information is available: verb, noun and the probabilities for the potential functions of the noun in the sentence.
    Page 6, “Using subcategorization information”

See all papers in Proc. ACL 2013 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

manual evaluation

Appears in 8 sentences as: Manual evaluation (2) manual evaluation (6)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. A manual evaluation of an English-to-German translation task shows that the subcategorization information has a positive impact on translation quality through better prediction of case.
    Page 1, “Abstract”
  2. We also present a manual evaluation of our best system which shows that the new features improve translation quality.
    Page 7, “Experiments and evaluation”
  3. We present three types of evaluation: BLEU scores (Papineni et al., 2001), prediction accuracy on clean data and a manual evaluation of the best system in section 5.3.
    Page 7, “Experiments and evaluation”
  4. While the inflection prediction systems (1-4) are significantly12 better than the surface-form system (0), the different versions of the inflection systems are not distinguishable in terms of BLEU; however, our manual evaluation shows that the new features have a positive impact on translation quality.
    Page 7, “Experiments and evaluation”
  5. 5.3 Manual evaluation of the best system
    Page 8, “Experiments and evaluation”
  6. In order to provide a better understanding of the impact of the presented features, in particular to see whether there is an improvement in adequacy, we carried out a manual evaluation comparing sys-
    Page 8, “Experiments and evaluation”
  7. Table 6: Manual evaluation of 46 sentences: without (a) and with (b) access to EN input, and the annotators’ agreement in the second part (c).
    Page 8, “Experiments and evaluation”
  8. We showed in a manual evaluation that the proposed features have a positive impact on translation quality.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention manual evaluation.

See all papers in Proc. ACL that mention manual evaluation.

Back to top.

translation system

Appears in 8 sentences as: translation system (7) translation systems (1)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. We first replace inflected forms by their stems or lemmas: building a translation system on a stemmed representation of the target side leads to a simpler translation task, and the morphological information contained in the source and target language parts of the translation model is more balanced.
    Page 1, “Introduction”
  2. Previous work has already introduced the idea of generating inflected forms as a postprocessing step for a translation system that has been stripped of (most) target-language-specific features.
    Page 3, “Previous work”
  3. (2010) built translation systems that predict inflected word forms based on a large array of morphological and syntactic features, obtained from both source and target side.
    Page 3, “Previous work”
  4. as a hierarchical machine translation system using a string-to-tree setup.
    Page 3, “Previous work”
  5. We use a hierarchical translation system .
    Page 4, “Translation pipeline”
  6. We use the hierarchical translation system that comes with the Moses SMT-package and GIZA++ to compute the word alignment, using the “grow-diag-final-and” heuristics.
    Page 7, “Experiments and evaluation”
  7. We report results of two types of systems (table 5): first, a regular translation system built on surface forms (i.e., normal text) and second, four inflection prediction systems.
    Page 7, “Experiments and evaluation”
  8. We presented a translation system making use of a subcategorization database together with source-side features.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

translation quality

Appears in 7 sentences as: translation quality (7)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. A manual evaluation of an English-to-German translation task shows that the subcategorization information has a positive impact on translation quality through better prediction of case.
    Page 1, “Abstract”
  2. Integrating semantic role information into SMT has been demonstrated by various researchers to improve translation quality (cf.
    Page 2, “Introduction”
  3. This outcome, in particular that adding the lemma of the preposition to the PP node helps to improve translation quality , has been observed before in tree restructuring work for improving translation (Huang and Knight, 2006).
    Page 4, “Translation pipeline”
  4. 3 Preliminary experiments showed that larger windows do not improve translation quality .
    Page 4, “Translation pipeline”
  5. We also present a manual evaluation of our best system which shows that the new features improve translation quality .
    Page 7, “Experiments and evaluation”
  6. While the inflection prediction systems (1-4) are significantly12 better than the surface-form system (0), the different versions of the inflection systems are not distinguishable in terms of BLEU; however, our manual evaluation shows that the new features have a positive impact on translation quality .
    Page 7, “Experiments and evaluation”
  7. We showed in a manual evaluation that the proposed features have a positive impact on translation quality .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

noun phrases

Appears in 6 sentences as: noun phrases (8)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. In this paper, we focus on improving case prediction for noun phrases (NPs) in German translations.
    Page 1, “Introduction”
  2. German sentences exhibit a freer constituent order, and thus case is an important indicator of the grammatical functions of noun phrases .
    Page 1, “Introduction”
  3. In all four examples, the verb and the participating noun phrases Mitarbeiter (employee), Kollege (colleague) and Bericht (report) are identical, and the noun phrases are assigned the same case.
    Page 2, “Introduction”
  4. However, given that the stemmed output of the translation does not tell us anything about case features, in order to predict the appropriate cases of the three noun phrases , we either rely on ordering heuristics (such that the nominative NP is more likely to be in the beginning of the sentence (the German Vorfeld) than the accusative or dative NP, even though all three of these would be grammatical), or we need fine-grained subcategorization information beyond pure syntax.
    Page 2, “Introduction”
  5. Verb—noun tuples referring to specific syntactic functions within verb subcategorization (verb—noun subcat case prediction) are integrated with an associated probability for accusative (direct object), dative (indirect object) and nominative (subject).6 Further to the subject and object noun phrases , the subcategorization information provides quantitative triples for verb—preposition—noun pairs, thus predicting the case of NPs within prepositional phrases (we do this only when the prepositions are ambiguious, i.e., they could subcategorize either a dative or an accusative NP).
    Page 5, “Using subcategorization information”
  6. In addition to modelling subcategorization information, it is also important to differentiate between subcategorized noun phrases (such as object or subject), and noun phrases
    Page 5, “Using subcategorization information”

See all papers in Proc. ACL 2013 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

Machine Translation

Appears in 4 sentences as: Machine Translation (2) machine translation (2)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. as a hierarchical machine translation system using a string-to-tree setup.
    Page 3, “Previous work”
  2. (2012) evaluated user reactions to different error types in machine translation and came to the result that morphological
    Page 3, “Previous work”
  3. 9English/German data released for the 2009 ACL Workshop on Machine Translation shared task.
    Page 7, “Experiments and evaluation”
  4. This work was funded by the DFG Research Project Distributional Approaches to Semantic Relatedness (Marion Weller), the DFG Heisenberg Fellowship SCHU-25 80/ 1-1 (Sabine Schulte im Walde), as well as by the Deutsche Forschungsge-meinschaft grant Models of Morphosyntax for Statistical Machine Translation (Alexander Fraser).
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention Machine Translation.

See all papers in Proc. ACL that mention Machine Translation.

Back to top.

translation model

Appears in 4 sentences as: translation model (4)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. We first replace inflected forms by their stems or lemmas: building a translation system on a stemmed representation of the target side leads to a simpler translation task, and the morphological information contained in the source and target language parts of the translation model is more balanced.
    Page 1, “Introduction”
  2. We use this representation to create the stemmed representation for training the translation model .
    Page 3, “Translation pipeline”
  3. In addition to the translation model , the target-side language model, as well as the reference data for parameter tuning use this representation.
    Page 3, “Translation pipeline”
  4. 3.2 Building a stemmed translation model
    Page 4, “Translation pipeline”

See all papers in Proc. ACL 2013 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

word alignment

Appears in 4 sentences as: word alignment (4)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. We consider gender as part of the stem, whereas the value for number is derived from the source-side: if marked for number, singular/plural nouns are distinguished during word alignment and then translated accordingly.
    Page 3, “Translation pipeline”
  2. Figure 1: Deriving features from dependency-parsed English data via the word alignment .
    Page 6, “Using subcategorization information”
  3. to the SMT output via word alignment .
    Page 7, “Using subcategorization information”
  4. We use the hierarchical translation system that comes with the Moses SMT-package and GIZA++ to compute the word alignment , using the “grow-diag-final-and” heuristics.
    Page 7, “Experiments and evaluation”

See all papers in Proc. ACL 2013 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

bigrams

Appears in 3 sentences as: bigrams (3)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. The model has access to the basic features stem and tag, as well as the new features based on subcat-egorizaion information (explained below), using unigrams within a Window of up to four positions to the right and the left of the current position, as well as bigrams and trigrams for stems and tags (current item + left and/or right item).
    Page 6, “Using subcategorization information”
  2. In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,
    Page 6, “Using subcategorization information”
  3. By providing the parts of the tuple as unigrams, bigrams or trigrams to the CRF, all relevant information is available: verb, noun and the probabilities for the potential functions of the noun in the sentence.
    Page 6, “Using subcategorization information”

See all papers in Proc. ACL 2013 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

co-occurrence

Appears in 3 sentences as: co-occurrence (3)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. In contrast, noun modification in noun-nounGen construction is represented by co-occurrence frequencies.7
    Page 5, “Using subcategorization information”
  2. andalsof =0: this representation allows for a more fine- grained distinction in the low-to-mid frequency range, providing a good basis for the decision of whether a given noun-noun pair is a true noun-nounaen structure or just a random co-occurrence of two nouns.
    Page 5, “Using subcategorization information”
  3. The word Technologie (technology) has been marked as a candidate for a genitive in a noun-nounGen constructions; the co-occurrence frequency of the tuple Einfi‘ihrung-Technologie (introduction - technology) lies in the bucket 11. .
    Page 6, “Using subcategorization information”

See all papers in Proc. ACL 2013 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

im

Appears in 3 sentences as: im (3)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. Briscoe and Carroll (1997) for English; Sarkar and Zeman (2000) for Czech; Schulte im Walde (2002a) for German; Messiant (2008) for French.
    Page 5, “Using subcategorization information”
  2. (2013); the newspaper data (HGC - Huge German Corpus) was parsed with Schmid (2000), and subcategorization information was extracted as described in Schulte im Walde (2002b).
    Page 7, “Experiments and evaluation”
  3. This work was funded by the DFG Research Project Distributional Approaches to Semantic Relatedness (Marion Weller), the DFG Heisenberg Fellowship SCHU-25 80/ 1-1 (Sabine Schulte im Walde), as well as by the Deutsche Forschungsge-meinschaft grant Models of Morphosyntax for Statistical Machine Translation (Alexander Fraser).
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention im.

See all papers in Proc. ACL that mention im.

Back to top.

semantic role

Appears in 3 sentences as: semantic role (3) semantic roles (1)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. The combination of our two models approaches a simplified level of semantic role definition but only relies on dependency information that is considerably easier and cheaper to define and obtain than a very high quality semantic parser and/or a corpus annotated with semantic role information.
    Page 2, “Introduction”
  2. Integrating semantic role information into SMT has been demonstrated by various researchers to improve translation quality (cf.
    Page 2, “Introduction”
  3. Wu and Fung (2009b) who demonstrated that on the one hand 84% of verb syntactic functions in a 50-sentence test corpus projected from Chinese to English, and that on the other hand about 15% of the subjects were not translated into subjects, but their semantic roles were preserved across language.
    Page 3, “Introduction”

See all papers in Proc. ACL 2013 that mention semantic role.

See all papers in Proc. ACL that mention semantic role.

Back to top.

SMT system

Appears in 3 sentences as: SMT system (3)
In Using subcategorization knowledge to improve case prediction for translation to German
  1. The contribution of this paper is to improve the prediction of case in our SMT system by implementing and combining two alternative routes to integrate subcategorization information from the syntax-semantic interface: (i) We regard the translation as a function of the source language input, and project the syntactic functions of the English nouns to their German translations in the
    Page 2, “Introduction”
  2. Table 2 illustrates the different steps of the inflection process: the markup (number and gender on nouns) in the stemmed output of the SMT system is part of the input to the respective feature prediction.
    Page 4, “Translation pipeline”
  3. In contrast, the SMT system often produces more isomorphic translations, which is helpful for annotating source-side features on the target language.
    Page 7, “Using subcategorization information”

See all papers in Proc. ACL 2013 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.