Automatically Generating Wikipedia Articles: A Structure-Aware Approach
Sauper, Christina and Barzilay, Regina

Article Structure

Abstract

In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet.

Introduction

In this paper, we consider the task of automatically creating a multi-paragraph overview article that provides a comprehensive summary of a subject of interest.

Related Work

Concept-to-text generation and text-to-text generation take very different approaches to content selection.

Method

The goal of our system is to produce a comprehensive overview article given a title — e.g., Cancer.

Experimental Setup

We evaluate our method by observing the quality of automatically created articles in different domains.

Rank(eij1...€ij7~,Wj)

7 :131 .

Topics

ILP

Appears in 9 sentences as: ILP (9) ILPs (1)
In Automatically Generating Wikipedia Articles: A Structure-Aware Approach
  1. We estimate the parameters of our model using the perceptron algorithm augmented with an integer linear programming ( ILP ) formulation, run over a training set of example articles in the given domain.
    Page 1, “Introduction”
  2. Using the perceptron framework augmented with an ILP formulation for global optimization, the system is trained to select the best excerpt for each document d, and each topic tj.
    Page 3, “Method”
  3. .wk, and the same ILP formulation for global optimization as in training.
    Page 3, “Method”
  4. To select the optimal excerpts, we employ integer linear programming ( ILP ).
    Page 4, “Method”
  5. Solving the ILP Solving an integer linear program is NP-hard (Cormen et al., 1992); however, in practice there exist several strategies for solving certain ILPs efficiently.
    Page 5, “Method”
  6. On a larger scale, there are several alternatives to approximate the ILP results, such as a dynamic programming approximation to the knapsack problem (McDonald, 2007).
    Page 5, “Method”
  7. Next, decoding through ILP optimization is performed over all ranked lists of candidate excerpts, selecting one excerpt for each topic (line 7).
    Page 6, “Method”
  8. The use of ILP during each step of training sets this algorithm apart from previous work.
    Page 6, “Method”
  9. In prior research, ILP was used as a postprocessing step to remove redundancy and make other global decisions about parameters (McDonald, 2007; Marciniak and Strube, 2005; Clarke and Lapata, 2007).
    Page 6, “Method”

See all papers in Proc. ACL 2009 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

perceptron

Appears in 7 sentences as: perceptron (7)
In Automatically Generating Wikipedia Articles: A Structure-Aware Approach
  1. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview.
    Page 1, “Abstract”
  2. We estimate the parameters of our model using the perceptron algorithm augmented with an integer linear programming (ILP) formulation, run over a training set of example articles in the given domain.
    Page 1, “Introduction”
  3. Using the perceptron framework augmented with an ILP formulation for global optimization, the system is trained to select the best excerpt for each document d, and each topic tj.
    Page 3, “Method”
  4. We implement this algorithm using the perceptron framework, as it can be easily modified for structured prediction while preserving convergence guarantees (Daume III and Marcu, 2005; Snyder and Barzilay, 2007).
    Page 4, “Method”
  5. Training Procedure Our algorithm is a modification of the perceptron ranking algorithm (Collins, 2002), which allows for joint learning across several ranking problems (Daumé III and Marcu, 2005; Snyder and Barzilay, 2007).
    Page 5, “Method”
  6. For each topic (line 8), if the selected excerpt is not similar enough to the gold excerpt (line 9), the parameters for that topic are updated using a standard perceptron update rule (line 10).
    Page 6, “Method”
  7. Our third baseline, Disjoint, uses the ranking perceptron framework as in our full system; however, rather than perform an optimization step during training and decoding, we simply select the highest-ranked excerpt for each topic.
    Page 7, “Rank(eij1...€ij7~,Wj)”

See all papers in Proc. ACL 2009 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

cosine similarity

Appears in 5 sentences as: cosine similarity (5)
In Automatically Generating Wikipedia Articles: A Structure-Aware Approach
  1. As a similarity function, we use cosine similarity weighted with TF*IDF.
    Page 3, “Method”
  2. We define sim(efl, ej/lx) as the cosine similarity between excerpts 63-; from topic 253- and ej/l/ from topic if.
    Page 5, “Method”
  3. If excerpts ejl and ej/l/ have cosine similarity
    Page 5, “Method”
  4. 1 Defined as excerpts with cosine similarity > 0.5
    Page 5, “Method”
  5. For each topic present in the human-authored article, the Oracle selects the excerpt from our full model’s candidate excerpts with the highest cosine similarity to the human-authored text.
    Page 7, “Rank(eij1...€ij7~,Wj)”

See all papers in Proc. ACL 2009 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

linear programming

Appears in 4 sentences as: linear program (1) linear programming (3)
In Automatically Generating Wikipedia Articles: A Structure-Aware Approach
  1. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview.
    Page 1, “Abstract”
  2. We estimate the parameters of our model using the perceptron algorithm augmented with an integer linear programming (ILP) formulation, run over a training set of example articles in the given domain.
    Page 1, “Introduction”
  3. To select the optimal excerpts, we employ integer linear programming (ILP).
    Page 4, “Method”
  4. Solving the ILP Solving an integer linear program is NP-hard (Cormen et al., 1992); however, in practice there exist several strategies for solving certain ILPs efficiently.
    Page 5, “Method”

See all papers in Proc. ACL 2009 that mention linear programming.

See all papers in Proc. ACL that mention linear programming.

Back to top.

new article

Appears in 3 sentences as: new article (2) new articles (1)
In Automatically Generating Wikipedia Articles: A Structure-Aware Approach
  1. Then, it produces a new article by selecting content from the Internet for each part of this template.
    Page 1, “Introduction”
  2. Our model constructs a new article by following these two steps: Ranking First, we attempt to rank candidate excerpts based on how representative they are of each individual topic.
    Page 4, “Method”
  3. 5 We are continually submitting new articles ; however, we report results on those that have at least a 6 month history at time of writing.
    Page 7, “Rank(eij1...€ij7~,Wj)”

See all papers in Proc. ACL 2009 that mention new article.

See all papers in Proc. ACL that mention new article.

Back to top.