Surface Realisation from Knowledge-Bases
Gyawali, Bikash and Gardent, Claire

Article Structure

Abstract

We present a simple, data-driven approach to generation from knowledge bases (KB).

Introduction

In this paper we present a grammar based approach for generating from knowledge bases (KB) which is linguistically principled and conceptually simple.

Related Work

Our work is related to work on concept to text generation.

The KBGen Task

The KB Gen task was introduced as a new shared task at Generation Challenges 2013 (Banik et al., 2013)1 and aimed to compare different generation systems on KB data.

Generating from the KBGen Knowledge-Base

To generate from the KBGen data, we induce a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG, (Vijay-Shanker and J oshi, 1988)) augmented with a unification-based semantics (Gardent and Kallmeyer, 2003) from the training data.

Experimental Setup

We evaluate our approach on the KB Gen data and compare it with the KB Gen reference and two other systems having taken part to the KB Gen challenge.

Results and Discussion

Conclusion

In Tree Adjoining Grammar, the extended domain of locality principle ensures that TAG trees group together in a single structure a syntactic predicate and its arguments.

Topics

knowledge base

Appears in 9 sentences as: knowledge base (6) knowledge bases (6)
In Surface Realisation from Knowledge-Bases
  1. We present a simple, data-driven approach to generation from knowledge bases (KB).
    Page 1, “Abstract”
  2. In this paper we present a grammar based approach for generating from knowledge bases (KB) which is linguistically principled and conceptually simple.
    Page 1, “Introduction”
  3. To evaluate our approach, we use the benchmark provided by the KBGen challenge (Banik et al., 2012; Banik et al., 2013), a challenge designed to evaluate generation from knowledge bases ; where the input is a KB subset; and where the expected output is a complex sentence conveying the meaning represented by the input.
    Page 1, “Introduction”
  4. With the development of the semantic web and the proliferation of knowledge bases, generation from knowledge bases has attracted increased interest and so called ontology verbalisers have been proposed which support the generation of text from (parts of) knowledge bases .
    Page 1, “Related Work”
  5. strand of work maps each axiom in the knowledge base to a clause.
    Page 2, “Related Work”
  6. The MIAKT project (Bontcheva and Wilks., 2004) and the ONTOGENERATION project (Aguado et al., 1998) use symbolic NLG techniques to produce textual descriptions from some semantic information contained in a knowledge base .
    Page 2, “Related Work”
  7. Specifically, the task is to verbalise a subset of a knowledge base .
    Page 3, “The KBGen Task”
  8. The KB subsets forming the KB Gen input data were preselected from the AURA biology knowledge base (Gunning et al., 2010), a knowledge base about biology which was manually encoded by biology teachers and encodes knowledge about events, entities, properties and relations where relations include event-to-entity, event-to-event,
    Page 3, “The KBGen Task”
  9. Using the KBGen benchmark, we then showed that the resulting induced FB-LTAG compares favorably with competing symbolic and statistical approaches when used to generate from knowledge base data.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (6)
In Surface Realisation from Knowledge-Bases
  1. Figure 6: BLEU scores and Grammar Size (Number of Elementary TAG trees
    Page 8, “Results and Discussion”
  2. The average BLEU score is given with respect to all input (All) and to those inputs for which the systems generate at least one sentence (Covered).
    Page 8, “Results and Discussion”
  3. In terms of BLEU score, the best version of our system (AUTEXP) outperforms the probabilistic approach of IMS by a large margin (+0.17) and produces results similar to the fully handcrafted UDEL system (-().
    Page 9, “Results and Discussion”
  4. In sum, our approach permits obtaining BLEU scores and a coverage which are similar to that obtained by a hand crafted system and outperforms a probabilistic approach.
    Page 9, “Results and Discussion”
  5. system and before the statistical IMS one thus confirming the ranking based on BLEU .
    Page 9, “Results and Discussion”
  6. We observed that this often fails to return the best output in terms of BLEU score, fluency, grammaticality and/or meaning.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

BLEU score

Appears in 5 sentences as: BLEU score (3) BLEU scores (2)
In Surface Realisation from Knowledge-Bases
  1. Figure 6: BLEU scores and Grammar Size (Number of Elementary TAG trees
    Page 8, “Results and Discussion”
  2. The average BLEU score is given with respect to all input (All) and to those inputs for which the systems generate at least one sentence (Covered).
    Page 8, “Results and Discussion”
  3. In terms of BLEU score , the best version of our system (AUTEXP) outperforms the probabilistic approach of IMS by a large margin (+0.17) and produces results similar to the fully handcrafted UDEL system (-().
    Page 9, “Results and Discussion”
  4. In sum, our approach permits obtaining BLEU scores and a coverage which are similar to that obtained by a hand crafted system and outperforms a probabilistic approach.
    Page 9, “Results and Discussion”
  5. We observed that this often fails to return the best output in terms of BLEU score , fluency, grammaticality and/or meaning.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

grammar induction

Appears in 5 sentences as: grammar induced (1) grammar induction (4)
In Surface Realisation from Knowledge-Bases
  1. A key feature of this approach is that grammar induction is driven by the extended domain of locality principle of TAG (Tree Adjoining Grammar); and that it takes into account both syntactic and semantic information.
    Page 1, “Abstract”
  2. A key feature of this approach is that grammar induction is driven by the extended domain of locality principle of TAG (Tree Adjoining Grammar) and takes into account both syntactic and semantic information.
    Page 1, “Introduction”
  3. They induce a probabilistic Tree Adjoining Grammar from a training set aligning frames and sentences using the grammar induction technique of (Chiang, 2000) and use a beam search that uses weighted features learned from the training data to rank alternative expansions at each step.
    Page 2, “Related Work”
  4. We use a simple mainly symbolic approach whereas they use a generative approach for grammar induction and a discriminative approach for sentence generation.
    Page 3, “Related Work”
  5. We compare the results obtained with those obtained by two other systems participating in the KBGen challenge, namely the UDEL system, a symbolic rule based system developed by a group of students at the University of Delaware; and the IMS system, a statistical system using a probabilistic grammar induced from the training data.
    Page 8, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

IMS

Appears in 5 sentences as: IMS (5)
In Surface Realisation from Knowledge-Bases
  1. We compare the results obtained with those obtained by two other systems participating in the KBGen challenge, namely the UDEL system, a symbolic rule based system developed by a group of students at the University of Delaware; and the IMS system, a statistical system using a probabilistic grammar induced from the training data.
    Page 8, “Experimental Setup”
  2. System All Covered Coverage # Trees IMS 0.12 0.12 100%
    Page 8, “Results and Discussion”
  3. While both the IMS and the UDEL system have full coverage, our BASE system strongly un-dergenerates failing to account for 69.5% of the test data.
    Page 8, “Results and Discussion”
  4. In terms of BLEU score, the best version of our system (AUTEXP) outperforms the probabilistic approach of IMS by a large margin (+0.17) and produces results similar to the fully handcrafted UDEL system (-().
    Page 9, “Results and Discussion”
  5. system and before the statistical IMS one thus confirming the ranking based on BLEU.
    Page 9, “Results and Discussion”

See all papers in Proc. ACL 2014 that mention IMS.

See all papers in Proc. ACL that mention IMS.

Back to top.

parse tree

Appears in 4 sentences as: parse tree (2) parse trees (2)
In Surface Realisation from Knowledge-Bases
  1. To extract a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) from the KB Gen data, we parse the sentences of the training corpus; project the entity and event variables to the syntactic projection of the strings they are aligned with; and extract the elementary trees of the resulting FB-LTAG from the parse tree using semantic information.
    Page 5, “Generating from the KBGen Knowledge-Base”
  2. After alignment, the entity and event variables occurring in the input semantics are associated with substrings of the yield of the syntactic parse tree .
    Page 5, “Generating from the KBGen Knowledge-Base”
  3. Once entity and event variables have been projected up the parse trees , we extract elementary FB-LTAG trees and their semantics from the input scenario as follows.
    Page 5, “Generating from the KBGen Knowledge-Base”
  4. The resulting grammar extracted from the parse trees (cf.
    Page 7, “Generating from the KBGen Knowledge-Base”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

subtrees

Appears in 4 sentences as: subtrees (4)
In Surface Realisation from Knowledge-Bases
  1. First, the subtrees whose root node is indexed with an entity variable are extracted.
    Page 5, “Generating from the KBGen Knowledge-Base”
  2. Second, the subtrees capturing relations between variables are extracted.
    Page 5, “Generating from the KBGen Knowledge-Base”
  3. The minimal tree containing all and only the dependent variables D(X) of a variable X is then extracted and associated with the set of literals (I) such that (l) = {R(Y,Z) | (Y = X/\Z E D(X))\/(Y,Z E D(X This procedure extracts the subtrees relating the argument variables of a semantic func-tors such as an event or a role e.g., a tree describing a verb and its arguments as shown in the top
    Page 5, “Generating from the KBGen Knowledge-Base”
  4. To reduce over-fitting, we generalise the extracted grammar by extracting from each event tree, subtrees that capture structures with fewer arguments and optional modifiers.
    Page 7, “Generating from the KBGen Knowledge-Base”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

language model

Appears in 3 sentences as: language model (3)
In Surface Realisation from Knowledge-Bases
  1. They intersect the grammar with a language model to improve fluency; use a weighted hypergraph to pack the derivations; and find the best derivation tree using Viterbi algorithm.
    Page 3, “Related Work”
  2. To rank the generator output, we train a language model on the GeniA corpus 4, a corpus of 2000 MEDLINE asbtracts about biology containing more than 400000 words (Kim et al., 2003) and use this model to rank the generated sentences by decreasing probability.
    Page 7, “Generating from the KBGen Knowledge-Base”
  3. In the current version of the generator, the output is ranked using a simple language model trained on the GENIA corpus.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

Lexicalised

Appears in 3 sentences as: Lexicalised (3)
In Surface Realisation from Knowledge-Bases
  1. To generate from the KBGen data, we induce a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG, (Vijay-Shanker and J oshi, 1988)) augmented with a unification-based semantics (Gardent and Kallmeyer, 2003) from the training data.
    Page 4, “Generating from the KBGen Knowledge-Base”
  2. 4.1 Feature-Based Lexicalised Tree Adjoining Grammar
    Page 4, “Generating from the KBGen Knowledge-Base”
  3. To extract a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) from the KB Gen data, we parse the sentences of the training corpus; project the entity and event variables to the syntactic projection of the strings they are aligned with; and extract the elementary trees of the resulting FB-LTAG from the parse tree using semantic information.
    Page 5, “Generating from the KBGen Knowledge-Base”

See all papers in Proc. ACL 2014 that mention Lexicalised.

See all papers in Proc. ACL that mention Lexicalised.

Back to top.

logical forms

Appears in 3 sentences as: logic formalism (1) logical form (1) logical forms (2)
In Surface Realisation from Knowledge-Bases
  1. Earlier work on concept to text generation mainly focuses on generation from logical forms using rule-based methods.
    Page 1, “Related Work”
  2. (Wang, 1980) uses handwritten rules to generate sentences from an extended predicate logic formalism; (Shieber et al., 1990) introduces a head-driven algorithm for generating from logical forms ; (Kay, 1996) defines a chart based algorithm which enhances efficiency by minimising the number of semantically incomplete phrases being built; and (Shemtov, 1996) presents an extension of the chart based generation algorithm presented in (Kay, 1996) which supports the generation of multiple paraphrases from underspecified semantic input.
    Page 1, “Related Work”
  3. (Lu and Ng, 2011) focuses on generating natural language sentences from logical form (i.e., lambda terms) using a synchronous context-free grammar.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention logical forms.

See all papers in Proc. ACL that mention logical forms.

Back to top.

meaning representation

Appears in 3 sentences as: meaning representation (1) meaning representations (1) meaning represented (1)
In Surface Realisation from Knowledge-Bases
  1. To evaluate our approach, we use the benchmark provided by the KBGen challenge (Banik et al., 2012; Banik et al., 2013), a challenge designed to evaluate generation from knowledge bases; where the input is a KB subset; and where the expected output is a complex sentence conveying the meaning represented by the input.
    Page 1, “Introduction”
  2. (Wong and Mooney, 2007) uses synchronous grammars to transform a variable free tree structured meaning representation into sentences.
    Page 2, “Related Work”
  3. dom Field to generate from the same meaning representations .
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention meaning representation.

See all papers in Proc. ACL that mention meaning representation.

Back to top.

natural language

Appears in 3 sentences as: natural language (3)
In Surface Realisation from Knowledge-Bases
  1. In all these approaches, grammar and lexicon are developed manually and it is assumed that the lexicon associates semantic sub-formulae with natural language expressions.
    Page 1, “Related Work”
  2. As discussed in (Power and Third, 2010), one important limitation of these approaches is that they assume a simple deterministic mapping between knowledge representation languages and some controlled natural language (CNL).
    Page 2, “Related Work”
  3. (Lu and Ng, 2011) focuses on generating natural language sentences from logical form (i.e., lambda terms) using a synchronous context-free grammar.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.