A Statistical NLG Framework for Aggregated Planning and Realization
Kondadadi, Ravi and Howald, Blake and Schilder, Frank

Article Structure

Abstract

We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process.

Introduction

NLG is the process of generating natural-sounding text from nonlinguistic inputs.

Background

Typically, knowledge-based NLG systems are implemented by rules and, as mentioned above, have a pipelined architecture for the document and sentence planning stages and surface realization (Hovy, 1993; Moore and Paris, 1993).

Methodology

In order to generate text for a given domain our system runs input data through a statistical ranking model to select a sequence of templates that best fit the input data (E).

Evaluation and Discussion

In this section, we first discuss the corpus data used to train and generate texts.

Conclusions and Future Work

We have presented a hybrid (template-based and statistical), single—staged NLG system that generates natural sounding texts and is domain—adaptable.

Topics

named entities

Appears in 9 sentences as: named entities (7) named entity (5)
In A Statistical NLG Framework for Aggregated Planning and Realization
  1. (2013) where, in a given corpus, a combination of domain specific named entity tagging and clustering sentences (based on semantic predicates) were used to generate templates.
    Page 2, “Background”
  2. The DRS consists of semantic predicates and named entity tags.
    Page 3, “Methodology”
  3. In parallel, domain specific named entity tags are identified and, in conjunction with the semantic predicates, are used to create templates.
    Page 3, “Methodology”
  4. For example, in (2), using the templates in (le-f), the identified named entities are assigned to a clustered CuId (2ab).
    Page 3, “Methodology”
  5. To generate the training data, we first filter the templates that have named entity tags not specified in the input data.
    Page 4, “Methodology”
  6. o Overlap of named entities : Number of common entities between current CuId and most likely CuId for the position
    Page 4, “Methodology”
  7. 0 Difference in number of named entities: Absolute difference between the number of named entities in the current template and the average number of named entities for the current position
    Page 4, “Methodology”
  8. 0 Average number of entities: Ratio of number of named entities in the generated text so far to the average number of named entities .
    Page 4, “Methodology”
  9. We first filter out those templates that contain a named entity tag not present in the input data.
    Page 4, “Methodology”

See all papers in Proc. ACL 2013 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

statistically significant

Appears in 7 sentences as: statistical significance (2) statistically significant (7)
In A Statistical NLG Framework for Aggregated Planning and Realization
  1. There is no statistically significant difference between DocSys and DocBase generations for METEOR and BLEU—4.4 However, there is a statistically significant difference in the syntactic variability metric for both domains (weather - X2=l37.16, d.f.=1, p<.0001; biography - X2=96.641, d.f.=1, p<.
    Page 6, “Evaluation and Discussion”
  2. In terms of significance, there are no statistically significant differences between the systems for weather (DocOrig vs. DocSyS - X2=.347, d.f.=l, p=.555; DocOrig vs. DocBase - X2=.090, d.f.=l, p=.764; DocSyS vs. DocBase - X2=.790, d.f.=l, p=.373).
    Page 7, “Evaluation and Discussion”
  3. For biography, the trend fits nicely both numerically and in terms of statistical significance (DocOrig vs. DocSys -X2=5 .094, d.f.=l, p=.
    Page 7, “Evaluation and Discussion”
  4. In terms of significance, there are no statistically significant differences between the systems for weather (DOCOrig vs. DocSyS - X2=6.48, d.f.=l, p=.011; DOCOrig vs. DocBase - X2=.720, d.f.=l, p=.396; DocSys vs. DocBase - X2=.720, d.f.=l, p=.396).
    Page 8, “Evaluation and Discussion”
  5. The trend is different compared to the fluency metric above in that the DocBase system is outperforming the DOCOrig generations to an almost statistically significant difference - the remaining comparisons follow the trend.
    Page 8, “Evaluation and Discussion”
  6. Here there is a statistically significant difference between the DocSys and DOCOrig and no statistically significant difference between the DocSys and DocBase generations (DOCOrig vs. DocSys - X2=76.880, d.f.=l, p<.0001; DOCOrig vs. DocBase - X2=38.720, d.f.=l, p<.0001; DocSyS vs. DocBase - X2=.720, d.f.=l, p=.396).
    Page 8, “Evaluation and Discussion”
  7. Again, this distribution of preferences is numerically similar to the trend we would like to see, but the statistical significance indicates that there is some ground to make up.
    Page 8, “Evaluation and Discussion”

See all papers in Proc. ACL 2013 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

Edit distance

Appears in 4 sentences as: Edit distance (2) edit distance (2)
In A Statistical NLG Framework for Aggregated Planning and Realization
  1. We then rank templates according to the Levenshtein edit distance (Levenshtein, 1966) from the template corresponding to the current sentence in the training document (using the top 10 ranked templates in training for ease of processing effort).
    Page 4, “Methodology”
  2. We obtained better results with edit distance .
    Page 4, “Methodology”
  3. o Similarity between the most likely template in CuId and current template: Edit distance between the current template and the most likely template for the current CuId.
    Page 4, “Methodology”
  4. o Similarity between the most likely template in CuId given position and current template: Edit distance between the current template and the most likely template for the current CuId at the current position.
    Page 4, “Methodology”

See all papers in Proc. ACL 2013 that mention Edit distance.

See all papers in Proc. ACL that mention Edit distance.

Back to top.

rule-based

Appears in 3 sentences as: rule-based (3)
In A Statistical NLG Framework for Aggregated Planning and Realization
  1. It follows that approaches to document planning are rule-based as well and, concomitantly, are usually domain specific.
    Page 2, “Background”
  2. Further, statistical approaches should be more adaptable to different domains than their rule-based equivalents (Angeli et al., 2012).
    Page 2, “Background”
  3. This is an encouraging result considering that no experts were involved in the development of the system -a key contrast to many other existing (especially rule-based ) NLG systems.
    Page 8, “Evaluation and Discussion”

See all papers in Proc. ACL 2013 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.