Underspecifying and Predicting Voice for Surface Realisation Ranking
Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas

Article Structure

Abstract

This paper addresses a data-driven surface realisation model based on a large-scale reversible grammar of German.

Introduction

This paper1 presents work on modelling the usage of voice and word order alternations in a free word order language.

Related Work

2.1 Generation Background

Generation Architecture

Our data-driven methodology for investigating factors relevant to surface realisation uses a regeneration setup2 with two main components: a) a grammar-based component used to parse a corpus sentence and map it to all its meaning-equivalent surface realisations, b) a statistical ranking component used to select the correct, i.e.

Experimental Setup

Data To obtain a sizable set of realistic corpus examples for our experiments on voice alternations, we created our own dataset of input sentences and representations, instead of building on treebank examples as Cahill et al.

Experiments

5.1 Exp.

Human Evaluation

Our experiment in Section 5.3 has shown that the accuracy of our linguistically informed ranking model dramatically increases when we consider the three

Conclusion

We have presented a grammar-based generation architecture which implements the surface realisation of meaning representations abstracting from voice and word order.

Topics

word order

Appears in 27 sentences as: Word Order (1) word order (28) word ordering (1) word orders (2)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. We extend a syntactic surface realisation system, which can be trained to choose among word order variants, such that the candidate set includes active and passive variants.
    Page 1, “Abstract”
  2. This allows us to study the interaction of voice and word order alternations in realistic German corpus data.
    Page 1, “Abstract”
  3. This paper1 presents work on modelling the usage of voice and word order alternations in a free word order language.
    Page 1, “Introduction”
  4. Thus it has been demonstrated that for free word order languages like German, word order prediction quality can be improved with carefully designed, linguistically informed models capturing information-structural strategies (Filippova and Strube, 2007; Cahill and Riester, 2009).
    Page 1, “Introduction”
  5. Quite obviously, word order is only one of the means at a speaker’s disposal for expressing some content in a contextually appropriate form; we add systematic alternations like the voice alternation (active vs. passive) to the picture.
    Page 1, “Introduction”
  6. As an alternative way of promoting or demoting the prominence of a syntactic argument, its interaction with word ordering strategies in real corpus data is of high theoretical interest (Aissen, 1999; Aissen, 2003; Bresnan et al., 2001).
    Page 1, “Introduction”
  7. Our main goals are (i) to establish a corpus-based surface realisation framework for empirically investigating interactions of voice and word order in German, (ii) to design an input representation for generation capturing voice alternations in a variety of contexts, (iii) to better understand the relationship between the performance of a generation ranking model and the type of realisation candidates available in its input.
    Page 1, “Introduction”
  8. Interestingly, the properties that have been used to model argument alternations in strict word order languages like English have been identified as factors that influence word order in free word order languages like German, see Filippova and Strube (2007) for a number of pointers.
    Page 3, “Related Work”
  9. Cahill and Riester (2009) implement a model for German word order variation that approximates the information status of constituents through morphological features like definiteness, pronominalisation etc.
    Page 3, “Related Work”
  10. We are not aware of any corpus-based generation studies investigating how these properties relate to argument alternations in free word order languages.
    Page 3, “Related Work”
  11. An f-structure abstracts away from word order , i.e.
    Page 3, “Generation Architecture”

See all papers in Proc. ACL 2011 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

meaning representation

Appears in 13 sentences as: meaning representation (8) meaning representations (6)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. To obtain a more abstract underlying representation (in the pipeline on the right-hand side of Figure 1), the present work uses an additional semantic construction component (Crouch and King, 2006; ZarrieB, 2009) to map LFG f-structures to meaning representations .
    Page 3, “Generation Architecture”
  2. For the reverse direction, the meaning representations are mapped to f-structures which can then be mapped to surface strings by the XLE generator (ZarrieB and Kuhn, 2010).
    Page 3, “Generation Architecture”
  3. The subject of the passive and the object of the active f-structure are mapped to the same role (patient) in the meaning representation .
    Page 4, “Generation Architecture”
  4. So, when combined with the grammar, the meaning representation for (2) in Figure 2 contains implicit information about the voice of the original corpus sentence; the candidate set will not include any passive realisations.
    Page 4, “Generation Architecture”
  5. Given the standard, “analysis-oriented” meaning representation for Sentence (4) in Figure 2, the realiser will not generate an active realisation since the agent role cannot be instantiated by any phrase in the grammar.
    Page 4, “Generation Architecture”
  6. Ideally, one would like to account for these phenomena in a meaning representation that under-specifies the lexicalisation of discourse referents, and also captures the reference of implicit arguments.
    Page 4, “Generation Architecture”
  7. By means of the extended underspecification rules for voice, the sentences in (2) and (3) receive an identical meaning representation .
    Page 4, “Generation Architecture”
  8. Table 1 summarises the distribution of the voices for the heuristic meaning representation SEMh on the dataset we will introduce in Section 4, with the distribution for the naive representation SEMn in parentheses.
    Page 4, “Generation Architecture”
  9. The resulting f-structure parses are transferred to meaning representations and mapped back to f-structure charts.
    Page 5, “Experimental Setup”
  10. In our system, this correlation is modelled by a combination of linguistic properties that can be extracted from the f-structure or meaning representation and of the surface order that is read off the sentence string.
    Page 5, “Experimental Setup”
  11. We built 3 datasets from our alternation data: FS - candidates generated from the f-structure; SEMn - realisations from the naive meaning representations; SEMh - candidates from the heuristically underspecified meaning representation .
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention meaning representation.

See all papers in Proc. ACL that mention meaning representation.

Back to top.

BLEU

Appears in 7 sentences as: BLEU (8)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. Match 15.45 15.04 11.89 LM BLEU 0.68 0.68 0.65
    Page 6, “Experimental Setup”
  2. Model BLEU 0.764 0.759 0.747 NIST 13.18 13.14 13.01
    Page 6, “Experimental Setup”
  3. use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST: modification of BLEU giving more weight to less frequent n-grams.
    Page 6, “Experimental Setup”
  4. The differences in BLEU between the candidate sets and models are
    Page 6, “Experiments”
  5. Its BLEU score and match accuracy decrease only slightly (though statistically significantly).
    Page 6, “Experiments”
  6. Features | Match BLEU | Voice Prec.
    Page 7, “Experiments”
  7. This strategy leads to a better balanced distribution of the alternations in the training data, such that our linguistically informed generation ranking model achieves high BLEU scores and accurately predicts active and passive.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

LM

Appears in 7 sentences as: LM (9)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. Match 15.45 15.04 11.89 LM BLEU 0.68 0.68 0.65
    Page 6, “Experimental Setup”
  2. In Table 2, we compare the performance of the linguistically informed model described in Section 4 on the candidates sets against a random choice and a language model ( LM ) baseline.
    Page 6, “Experiments”
  3. statistically significant.5 In general, the linguistic model largely outperforms the LM and is less sensitive to the additional confusion introduced by the SEMh input.
    Page 6, “Experiments”
  4. Table 5 also reports the performance of an unlabelled model that additionally integrates LM scores.
    Page 8, “Experiments”
  5. The n-best evaluations even suggest that the LM scores negatively impact the ranker: the accuracy for the top 3 sentences increases much less as compared to the model that does not integrate LM scores.6
    Page 8, “Experiments”
  6. 6(Nakanishi et al., 2005) also note a negative effect of including LM scores in their model, pointing out that the LM was not trained on enough data.
    Page 8, “Human Evaluation”
  7. The corpus used for training our LM might also have been too small or distinct in genre.
    Page 8, “Human Evaluation”

See all papers in Proc. ACL 2011 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

model trained

Appears in 3 sentences as: model trained (2) models trained (1)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. We show that with an appropriately underspecified input, a linguistically informed realisation model trained to regenerate strings from the underlying semantic representation achieves 91.5% accuracy (over a baseline of 82.5%) in the prediction of the original voice.
    Page 1, “Abstract”
  2. In Table 4, we report the performance of ranking models trained on the different feature subsets introduced in Section 4.
    Page 7, “Experiments”
  3. The union of the features corresponds to the model trained on SEMh in Experiment 1.
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention model trained.

See all papers in Proc. ACL that mention model trained.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (3)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. The first widely known data-driven approach to surface realisation, or tactical generation, (Langk—ilde and Knight, 1998) used language-model n-gram statistics on a word lattice of candidate realisations to guide a ranker.
    Page 2, “Related Work”
  2. Standard n-gram features are also used as features.4 The feature model is built as follows: for every lemma in the f-structure, we extract a set of morphological properties (definiteness, person, pronominal status etc.
    Page 5, “Experimental Setup”
  3. use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST: modification of BLEU giving more weight to less frequent n-grams.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

NIST

Appears in 3 sentences as: NIST (3)
In Underspecifying and Predicting Voice for Surface Realisation Ranking
  1. NIST 13.01 12.95 12.69 Match 27.91 27.66 26.38 Ling.
    Page 6, “Experimental Setup”
  2. Model BLEU 0.764 0.759 0.747 NIST 13.18 13.14 13.01
    Page 6, “Experimental Setup”
  3. use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST : modification of BLEU giving more weight to less frequent n-grams.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2011 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.