Automatically Evaluating Text Coherence Using Discourse Relations
Lin, Ziheng and Ng, Hwee Tou and Kan, Min-Yen

Article Structure

Abstract

We present a novel model to represent and assess the discourse coherence of text.

Introduction

The coherence of a text is usually reflected by its discourse structure and relations.

Related Work

The study of coherence in discourse has led to many linguistic theories, of which we only discuss algorithms that have been reduced to practice.

Using Discourse Relations

To utilize discourse relations of a text, we first apply automatic discourse parsing on the input text.

A Refined Approach

The central problem with the basic approach is in its sparse modeling of discourse relations.

Experiments

We evaluate our coherence model on the task of text ordering ranking, a standard coherence evaluation task used in both (Barzilay and Lapata, 2005) and (Elsner et al., 2007).

Analysis and Discussion

When we compare the accuracies of the full model in the three data sets (Row 2), the accuracy in the Accidents data is the highest (89.38%), followed by

Conclusion

We have proposed a new model for discourse coherence that leverages the observation that coherent texts preferentially follow certain discourse structures.

Topics

discourse parser

Appears in 7 sentences as: discourse parser (4) discourse parser’s (1) discourse parsing (2)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. To the best our knowledge, this is also the first study in which we show output from an automatic discourse parser helps in coherence modeling.
    Page 2, “Introduction”
  2. This task, discourse parsing , has been a recent focus of study in the natural language processing (NLP) community, largely enabled by the availability of large-scale discourse annotated corpora (Wellner and Pustejovsky, 2007; Elwell and Baldridge, 2008; Lin et al., 2009; Pitler et al., 2009; Pitler and Nenkova, 2009; Lin et al., 2010; Wang et al., 2010).
    Page 2, “Related Work”
  3. To utilize discourse relations of a text, we first apply automatic discourse parsing on the input text.
    Page 2, “Using Discourse Relations”
  4. In developing an improved model, we need to better exploit the discourse parser’s output to provide more circumstantial evidence to support the system’s coherence decision.
    Page 3, “A Refined Approach”
  5. We must also be careful in using the automatic discourse parser .
    Page 5, “Experiments”
  6. We note that the discourse parser of Lin et a1.
    Page 5, “Experiments”
  7. Since the discourse parser utilizes paragraph boundaries but a permuted text does not have such boundaries, we ignore paragraph boundaries and treat the source text as if it has only one paragraph.
    Page 5, “Experiments”

See all papers in Proc. ACL 2011 that mention discourse parser.

See all papers in Proc. ACL that mention discourse parser.

Back to top.

significantly outperforms

Appears in 6 sentences as: significantly outperform (1) significantly outperforms (5)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. The experimental results demonstrate that our model is able to significantly outperform the state-of-the-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upper bounds.
    Page 1, “Abstract”
  2. Double (**) and single (*) asterisks indicate that the respective model significantly outperforms the baseline at p < 0.01 and p < 0.05, respectively.
    Page 7, “Experiments”
  3. Comparing these accuracies to the baseline, our model significantly outperforms the baseline with p < 0.01 in the WSJ and Earthquakes data sets with accuracy increments of 2.35% and 2.91%, respectively.
    Page 7, “Experiments”
  4. The combined model in all three data sets gives the highest performance in comparison to all single models, and it significantly outperforms the baseline model with p < 0.01.
    Page 7, “Experiments”
  5. From the curves, our model consistently performs better than the baseline with a significant gap, and the combined model also consistently and significantly outperforms the other two.
    Page 8, “Analysis and Discussion”
  6. When applied to distinguish a source text from a sentence-reordered permutation, our model significantly outperforms the previous state-of-the-art,
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention significantly outperforms.

See all papers in Proc. ACL that mention significantly outperforms.

Back to top.

gold standard

Appears in 5 sentences as: gold standard (5)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text.
    Page 3, “Using Discourse Relations”
  2. Figure 1 shows a text and its gold standard PDTB discourse relations.
    Page 3, “A Refined Approach”
  3. Figure 1: An excerpt with four contiguous sentences from wsj .0437, showing five gold standard discourse relations.
    Page 3, “A Refined Approach”
  4. Also, since our subjects’ judgments correlate highly with the gold standard , the assumption that the original text is always more coherent than the permuted text is supported.
    Page 6, “Experiments”
  5. This phenomenon has been observed in several natural language synthesis tasks such as generation and summarization, in which a single gold standard is inadequate to fully assess performance.
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

error rate

Appears in 4 sentences as: error rate (4) error rates (1)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. The experimental results demonstrate that our model is able to significantly outperform the state-of-the-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upper bounds.
    Page 1, “Abstract”
  2. For the combined model, the error rates are significantly reduced in all three data sets.
    Page 7, “Experiments”
  3. The average error rate reductions against 100% are 9.57% for the full model and 26.37% for the combined model.
    Page 7, “Experiments”
  4. If we compute the average error rate reductions against the human upper bounds (rather than an oracular 100%), the average error rate reduction for the full model is 29% and that for the combined model is 73%.
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

discourse structure

Appears in 3 sentences as: discourse structure (3)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. The coherence of a text is usually reflected by its discourse structure and relations.
    Page 1, “Introduction”
  2. In this paper, we detail our model to capture the coherence of a text based on the statistical distribution of the discourse structure and relations.
    Page 1, “Introduction”
  3. While the entity-based model captures repetitive mentions of entities, our discourse relation-based model gleans its evidence from the argumentative and discourse structure of the text.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention discourse structure.

See all papers in Proc. ACL that mention discourse structure.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (3)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text.
    Page 3, “Using Discourse Relations”
  2. In our pilot work where we implemented such a basic model with n-gram features for relation transitions, the performance was very poor.
    Page 3, “Using Discourse Relations”
  3. In our approach, n-gram sub-sequences of transitions per term in the discourse role matrix then constitute the more fine-grained evidence used in our model to distinguish coherence from incoherence.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

natural language

Appears in 3 sentences as: natural language (3)
In Automatically Evaluating Text Coherence Using Discourse Relations
  1. This notion of preferential ordering of discourse relations is observed in natural language in general,
    Page 1, “Introduction”
  2. This task, discourse parsing, has been a recent focus of study in the natural language processing (NLP) community, largely enabled by the availability of large-scale discourse annotated corpora (Wellner and Pustejovsky, 2007; Elwell and Baldridge, 2008; Lin et al., 2009; Pitler et al., 2009; Pitler and Nenkova, 2009; Lin et al., 2010; Wang et al., 2010).
    Page 2, “Related Work”
  3. This phenomenon has been observed in several natural language synthesis tasks such as generation and summarization, in which a single gold standard is inadequate to fully assess performance.
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.