Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
Zarriess, Sina and Kuhn, Jonas

Article Structure

Abstract

We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization.

Introduction

Generating well-formed linguistic utterances from an abstract nonlinguistic input involves making a multitude of conceptual, discourse-level as well as sentence-level, lexical and syntactic decisions.

Related Work

Despite the common view of NLG as a pipeline process, it is a well-known problem that high-level, conceptual knowledge and low-level linguistic knowledge are tightly interleaved (Danlos, 1984; Mellish et al., 2000).

The Data Set

The data set for our generation experiments consists of 200 newspaper articles about robbery events.

Generation Systems

Our main goal is to investigate different architectures for combined surface realization and referring expression generation.

Experiments

In this experimental section, we provide a corpus-based evaluation of the generation components and architectures introduced in Section 4.

Conclusion

We have presented a data-driven approach for investigating generation architectures that address discourse-level reference and sentence-level syntax and word order.

Topics

BLEU

Appears in 16 sentences as: BLEU (18) BLEU.. (2)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. BLEU , sentence-level geometric mean of 1- to 4-gram precision, as in (Belz et al., 2011)
    Page 7, “Experiments”
  2. BLEUT, sentence-level BLEU computed on post-processed output where predicted referring expressions for victim and perp are replaced in the sentences (both gold and predicted) by their original role label, this score doeS not penalize lexical mismatches between corpus and system RES
    Page 7, “Experiments”
  3. When REG and linearization are applied on shallowSyn_re with gold shallow trees, the BLEU score is lower (60.57) as compared to the system that applies syntax and linearization on deepSynJrre, deep trees with gold REs ( BLEU score of 63.9).
    Page 7, “Experiments”
  4. However, the BLEU ?
    Page 7, “Experiments”
  5. Moreover, the BLEU ?
    Page 7, “Experiments”
  6. score for the REG—>LIN system comes close to the upper bound that applies linearization on linSynJflae, gold shallow trees with gold REs (BLEUT of 72.4), whereas the difference in standard BLEU and NIST is high.
    Page 7, “Experiments”
  7. This effect indicates that the RE prediction mostly decreases BLEU due to lexical mismatches, whereas the syntax prediction is more likely to have a negative impact on final linearization.
    Page 7, “Experiments”
  8. Input System BLEU NIST BLEUT
    Page 8, “Experiments”
  9. The fact that this architecture significantly improves the BLEU, NIST and the BLEU ?
    Page 8, “Experiments”
  10. The fact that also the BLEU ?
    Page 8, “Experiments”
  11. For the first pipeline, the system with a separate treatment of implicit referents significantly outperforms the joint system in terms of BLEU .
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

word order

Appears in 9 sentences as: word order (9)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. Our main goal is to investigate how different architectural setups account for interactions between generation decisions at the level of referring expressions (REs), syntax and word order .
    Page 1, “Introduction”
  2. (ZarrieB et al., 2012) have recently argued that the good performance of these linguistically motivated word order models, which exploit morpho-syntactic features of noun phrases (i.e.
    Page 3, “Related Work”
  3. REG is carried out prior to surface realization such that the RE component does not have access to surface syntax or word order whereas the SYN component has access to fully specified RE slots.
    Page 5, “Generation Systems”
  4. In this case, REG has access to surface syntax without word order but the surface realization is trained and applied on trees with underspecified RE slots.
    Page 6, “Generation Systems”
  5. The error propagation effects that we find in the first and second pipeline architecture clearly show that decisions at the levels of syntax, reference and word order interact, otherwise their predic-
    Page 7, “Experiments”
  6. Table 4 shows the performance of the REG module on varying input layers, providing a more detailed analysis of the interaction between RE, syntax and word order .
    Page 8, “Experiments”
  7. These results strengthen the evidence from the previous experiment that decisions at the level of syntax, reference and word order are interleaved.
    Page 8, “Experiments”
  8. The results presented in the preceding evaluations consistenly show the tight connections between decisions at the level of reference, syntax and word order .
    Page 9, “Experiments”
  9. We have presented a data-driven approach for investigating generation architectures that address discourse-level reference and sentence-level syntax and word order .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

sentence-level

Appears in 7 sentences as: sentence-level (7)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization.
    Page 1, “Abstract”
  2. Generating well-formed linguistic utterances from an abstract nonlinguistic input involves making a multitude of conceptual, discourse-level as well as sentence-level , lexical and syntactic decisions.
    Page 1, “Introduction”
  3. We integrate a discourse-level approach to REG with sentence-level surface realization in a data-driven framework.
    Page 1, “Introduction”
  4. BLEU, sentence-level geometric mean of 1- to 4-gram precision, as in (Belz et al., 2011)
    Page 7, “Experiments”
  5. NIST, sentence-level n- gram overlap weighted in favour of less frequent n- grams, as in (Belz et al., 2011)
    Page 7, “Experiments”
  6. BLEUT, sentence-level BLEU computed on post-processed output where predicted referring expressions for victim and perp are replaced in the sentences (both gold and predicted) by their original role label, this score doeS not penalize lexical mismatches between corpus and system RES
    Page 7, “Experiments”
  7. We have presented a data-driven approach for investigating generation architectures that address discourse-level reference and sentence-level syntax and word order.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

NIST

Appears in 6 sentences as: NIST (6)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. NIST , sentence-level n- gram overlap weighted in favour of less frequent n- grams, as in (Belz et al., 2011)
    Page 7, “Experiments”
  2. score for the REG—>LIN system comes close to the upper bound that applies linearization on linSynJflae, gold shallow trees with gold REs (BLEUT of 72.4), whereas the difference in standard BLEU and NIST is high.
    Page 7, “Experiments”
  3. Input System BLEU NIST BLEUT
    Page 8, “Experiments”
  4. The fact that this architecture significantly improves the BLEU, NIST and the BLEU?
    Page 8, “Experiments”
  5. This has a small positive effect on the BLEU.. score and a small negative effect on the plain BLEU and NIST score.
    Page 8, “Experiments”
  6. Joint System BLEU NIST BLEU, + 1st pipeline 54.65 11.30 59.95 - 1st pipeline 55.38 11.48 59.52 + Revision 56.31 11.42 61.30 - Revision 56.42 11.54 60.52 - Parallel+Revision 56.29 11.51 60.63
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

BLEU score

Appears in 5 sentences as: BLEU score (3) BLEU scores (1) BLEU.. score (2)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. When REG and linearization are applied on shallowSyn_re with gold shallow trees, the BLEU score is lower (60.57) as compared to the system that applies syntax and linearization on deepSynJrre, deep trees with gold REs ( BLEU score of 63.9).
    Page 7, “Experiments”
  2. The revision-based system with disjoint modelling of implicits shows a slight, nonsignificant increase in BLEU score .
    Page 8, “Experiments”
  3. By contrast, the BLEU.. score is signficantly better for the joint approach.
    Page 8, “Experiments”
  4. This has a small positive effect on the BLEU.. score and a small negative effect on the plain BLEU and NIST score.
    Page 8, “Experiments”
  5. It is likely that the BLEU scores do not capture the magnitude of the differences in text quality illustrated by the Examples (5-6).
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

subtrees

Appears in 5 sentences as: subtrees (5)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. In the final representation of our data set, we integrate the RE and deep syntax annotation by replacing subtrees corresponding to an RE span.
    Page 4, “The Data Set”
  2. All RE subtrees for a referent in a text are collected in a candidate list which is initialized with three default RES: (i) a pronoun, (ii) a default nominal (e. g. “the Victim”), (iii) the empty RE.
    Page 4, “The Data Set”
  3. In contrast to the GREC data sets, our RE candidates are not represented as the original surface strings, but as non-linearized subtrees .
    Page 4, “The Data Set”
  4. (shallowSyn_m) 3. unordered RE subtrees 4. linearized, fully specified surface trees (linSynJrre)
    Page 4, “The Data Set”
  5. Similar to the syntax component, the REG module is implemented as a ranker that selects surface RE subtrees for a given referential slot in a deep or shallow dependency tree.
    Page 5, “Generation Systems”

See all papers in Proc. ACL 2013 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

lexicalized

Appears in 4 sentences as: lexicalization (1) lexicalized (3)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. Applying a strictly sequential pipeline on our data, we observe incoherent system output that is related to an interaction of generation levels, very similar to the interleaving between sentence planning and lexicalization in Example (1).
    Page 2, “Introduction”
  2. Nominalizations are mapped to their verbal base forms on the basis of lexicalized rules for the nominalized lemmas observed in the corpus.
    Page 4, “The Data Set”
  3. This set subdivides into non-lexicalized and lexicalized transformations.
    Page 5, “Generation Systems”
  4. Most transformation rules (335 out of 374 on average) are lexicalized for a specific verb lemma and mostly transform nominalizations as in rule (4-b) and particles (see Section 3.2).
    Page 5, “Generation Systems”

See all papers in Proc. ACL 2013 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.

rule-based

Appears in 3 sentences as: rule-based (3)
In Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
  1. Work on rule-based natural language generation (NLG) has explored a number of ways to combine these decisions in an architecture, ranging from integrated systems where all decisions happen jointly (Appelt, 1982) to strictly sequential pipelines (Reiter and Dale, 1997).
    Page 1, “Introduction”
  2. Such a system is reminiscent of earlier work in rule-based generation that implements an interactive or revision-based feedback between discourse-level planning and linguistic realisation (Hovy, 1988; Robin, 1993).
    Page 2, “Introduction”
  3. In rule-based , strictly sequential generators these interactions can lead to a so-called generation gap, where a downstream module cannot realize a text or sentence plan generated by the preceding modules (Meteer, 1991; Wanner, 1994).
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.