Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
Joty, Shafiq and Carenini, Giuseppe and Ng, Raymond and Mehdad, Yashar

Article Structure

Abstract

We propose a novel approach for developing a two-stage document-level discourse parser.

Introduction

Discourse of any kind is not formed by independent and isolated textual units, but by related and structured units.

Related work

The idea of staging document-level discourse parsing on top of sentence-level discourse parsing was investigated in (Marcu, 2000a; LeThanh et al., 2004).

Our Discourse Parsing Framework

Given a document with sentences already segmented into EDUs, the discourse parsing problem is determining which discourse units (EDUs or larger units) to relate (i.e., the structure), and how to relate them (i.e., the labels or the discourse relations) in the resulting DT.

Parsing Models and Parsing Algorithm

The job of our intra-sentential and multi-sentential parsing models is to assign a probability to each of the constituents of all possible DTs at the sentence level and at the document level, respectively.

Document-level Parsing Approaches

Now that we have presented our intra-sentential and our multi-sentential parsers, we are ready to describe how they can be effectively combined to perform document-level discourse analysis.

Experiments

6.1 Corpora

Conclusion

In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing.

Topics

discourse parsing

Appears in 22 sentences as: discourse parser (8) discourse parsers (2) Discourse parsing (1) discourse parsing (13)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. We propose a novel approach for developing a two-stage document-level discourse parser .
    Page 1, “Abstract”
  2. We present two approaches to combine these two stages of discourse parsing effectively.
    Page 1, “Abstract”
  3. A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the state-of-the-art, often by a wide margin.
    Page 1, “Abstract”
  4. Discourse analysis in RST involves two subtasks: discourse segmentation is the task of identifying the EDUs, and discourse parsing is the task of linking the discourse units into a labeled tree.
    Page 1, “Introduction”
  5. While recent advances in automatic discourse segmentation and sentence-level discourse parsing have attained accuracies close to human performance (Fisher and Roark, 2007; J oty et al., 2012), discourse parsing at the document-level still poses significant challenges (Feng and Hirst, 2012) and the performance of the existing document-level parsers (Hemault et al., 2010; Subba and Di-Eugenio, 2009) is still considerably inferior compared to human gold-standard.
    Page 1, “Introduction”
  6. This paper aims to reduce this performance gap and take discourse parsing one step further.
    Page 1, “Introduction”
  7. First, existing discourse parsers typically model the structure and the labels of a DT separately in a pipeline fashion, and also do not consider the sequential dependencies between the DT constituents, which has been recently shown to be critical (Feng and Hirst, 2012).
    Page 1, “Introduction”
  8. To address this limitation, as the first contribution, we propose a novel document-level discourse parser based on probabilistic discriminative parsing models, represented as Conditional Random Fields (CRFs) (Sutton et al., 2007), to infer the probability of all possible DT constituents.
    Page 1, “Introduction”
  9. Third, existing discourse parsers do not discriminate between intra-sentential (i.e., building the DTs for the individual sentences) and multi-sentential parsing (i.e., building the DT for the document).
    Page 2, “Introduction”
  10. In order to develop a complete and robust discourse parser , we combine our intra-sentential and multi-sentential parsers in two different ways.
    Page 2, “Introduction”
  11. Our final result compares very favorably to the result of state-of-the-art models in document-level discourse parsing .
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention discourse parsing.

See all papers in Proc. ACL that mention discourse parsing.

Back to top.

EDUs

Appears in 22 sentences as: EDUs (24)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. Rhetorical Structure Theory (RST) (Mann and Thompson, 1988), one of the most influential theories of discourse, represents texts by labeled hierarchical structures, called Discourse Trees (DTs), as exemplified by a sample DT in Figure l. The leaves of a DT correspond to contiguous Elementary Discourse Units ( EDUs ) (six in the example).
    Page 1, “Introduction”
  2. Adjacent EDUs are connected by rhetorical relations (e.g., Elaboration, Contrast), forming larger discourse units (represented by internal
    Page 1, “Introduction”
  3. Discourse analysis in RST involves two subtasks: discourse segmentation is the task of identifying the EDUs , and discourse parsing is the task of linking the discourse units into a labeled tree.
    Page 1, “Introduction”
  4. It does not have a well-formed subtree because the unit containing EDUs 2 and 3 merges with the next sentence and only then is the resulting unit merged with EDU 1.
    Page 2, “Introduction”
  5. Given the EDUs in a doc-
    Page 2, “Related work”
  6. Given a document with sentences already segmented into EDUs, the discourse parsing problem is determining which discourse units ( EDUs or larger units) to relate (i.e., the structure), and how to relate them (i.e., the labels or the discourse relations) in the resulting DT.
    Page 3, “Our Discourse Parsing Framework”
  7. Note that the number of valid trees grows exponentially with the number of EDUs in a document.1 Therefore, an exhaustive search over the valid trees is often unfeasible, even for relatively small documents.
    Page 3, “Our Discourse Parsing Framework”
  8. 1For n —|— 1 EDUs , the number of valid discourse trees is actually the Catalan number Cn.
    Page 3, “Our Discourse Parsing Framework”
  9. segmented —) _) discourse tree into EDUs m Intra-sentential Multi-sentential parser parser
    Page 3, “Our Discourse Parsing Framework”
  10. Following (Joty et al., 2012), a DT can be formally represented as a set of constituents of the form R[i, m, j], referring to a rhetorical relation R between the discourse unit containing EDUs 2' through m and the unit containing EDUs m+1 through j.
    Page 3, “Our Discourse Parsing Framework”
  11. The observed nodes Uj in a sequence represent the discourse units ( EDUs or larger units).
    Page 4, “Parsing Models and Parsing Algorithm”

See all papers in Proc. ACL 2013 that mention EDUs.

See all papers in Proc. ACL that mention EDUs.

Back to top.

parsing model

Appears in 18 sentences as: Parsing Model (2) parsing model (9) Parsing Models (1) parsing models (6)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. To address this limitation, as the first contribution, we propose a novel document-level discourse parser based on probabilistic discriminative parsing models , represented as Conditional Random Fields (CRFs) (Sutton et al., 2007), to infer the probability of all possible DT constituents.
    Page 1, “Introduction”
  2. Two separate parsing models could exploit the fact that rhetorical relations are distributed differently intra-sententially vs. multi-sententially.
    Page 2, “Introduction”
  3. Both of our parsers have the same two components: a parsing model assigns a probability to every possible DT, and a parsing algorithm identifies the most probable DT among the candidate DTs in that scenario.
    Page 3, “Our Discourse Parsing Framework”
  4. Before describing our parsing models and the parsing algorithm, we introduce some terminology that we will use throughout the paper.
    Page 3, “Our Discourse Parsing Framework”
  5. The job of our intra-sentential and multi-sentential parsing models is to assign a probability to each of the constituents of all possible DTs at the sentence level and at the document level, respectively.
    Page 4, “Parsing Models and Parsing Algorithm”
  6. Formally, given the model parameters 9, for each possible constituent R[z’, m, j] in a candidate DT at the sentence or document level, the parsing model estimates P(R[z’, m, j] |@), which specifies a joint distribution over the label R and the structure [i, m, j] of the constituent.
    Page 4, “Parsing Models and Parsing Algorithm”
  7. 4.1 Intra-Sentential Parsing Model
    Page 4, “Parsing Models and Parsing Algorithm”
  8. Recently, we proposed a novel parsing model for sentence-level discourse parsing (J oty et al., 2012), that outperforms previous approaches by effectively modeling sequential dependencies along with structure and labels jointly.
    Page 4, “Parsing Models and Parsing Algorithm”
  9. Below we briefly describe the parsing model , and show how it is applied to obtain the probabilities of all possible DT constituents at the sentence level.
    Page 4, “Parsing Models and Parsing Algorithm”
  10. Figure 4 shows the intra-sentential parsing model expressed as a Dynamic Conditional Random Field (DCRF) (Sutton et al., 2007).
    Page 4, “Parsing Models and Parsing Algorithm”
  11. To obtain the probability of the constituents of all candidate DTs for a sentence, we apply the parsing model recursively at different levels of the DT and compute the posterior marginals over the relation-structure pairs.
    Page 4, “Parsing Models and Parsing Algorithm”

See all papers in Proc. ACL 2013 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

sentence-level

Appears in 13 sentences as: sentence-level (13)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. While recent advances in automatic discourse segmentation and sentence-level discourse parsing have attained accuracies close to human performance (Fisher and Roark, 2007; J oty et al., 2012), discourse parsing at the document-level still poses significant challenges (Feng and Hirst, 2012) and the performance of the existing document-level parsers (Hemault et al., 2010; Subba and Di-Eugenio, 2009) is still considerably inferior compared to human gold-standard.
    Page 1, “Introduction”
  2. Since most sentences have a well-formed discourse subtree in the full document-level DT (for example, the second sentence in Figure 1), our first approach constructs a DT for every sentence using our intra-sentential parser, and then runs the multi-sentential parser on the resulting sentence-level DTs.
    Page 2, “Introduction”
  3. Our second approach, in an attempt of dealing with these cases, builds sentence-level sub-trees by applying the intra-sentential parser on a sliding window covering two adjacent sentences and by then consolidating the results produced by over-
    Page 2, “Introduction”
  4. After that, the multi-sentential parser takes all these sentence-level sub-trees and builds a full rhetorical parse for the document.
    Page 2, “Introduction”
  5. The idea of staging document-level discourse parsing on top of sentence-level discourse parsing was investigated in (Marcu, 2000a; LeThanh et al., 2004).
    Page 2, “Related work”
  6. Since we already have an accurate sentence-level discourse parser (J oty et al., 2012), a straightforward approach to document-level parsing could be to simply apply this parser to the whole document.
    Page 3, “Our Discourse Parsing Framework”
  7. For example, syntactic features like dominance sets (Soricut and Marcu, 2003) are extremely useful for sentence-level parsing, but are not even applicable in multi-sentential case.
    Page 3, “Our Discourse Parsing Framework”
  8. Recently, we proposed a novel parsing model for sentence-level discourse parsing (J oty et al., 2012), that outperforms previous approaches by effectively modeling sequential dependencies along with structure and labels jointly.
    Page 4, “Parsing Models and Parsing Algorithm”
  9. The connections between adjacent nodes in a hidden layer encode sequential dependencies between the respective hidden nodes, and can enforce constraints such as the fact that a S]: 1 must not follow a Sj_1= l. The connections between the two hidden layers model the structure and the relation of a DT ( sentence-level ) constituent jointly.
    Page 4, “Parsing Models and Parsing Algorithm”
  10. Figure 52 Our parsing model applied to the sequences at different levels of a sentence-level DT.
    Page 4, “Parsing Models and Parsing Algorithm”
  11. A key finding from several previous studies on sentence-level discourse analysis is that most sentences have a well-formed discourse subtree in the full document-level DT (Joty et al., 2012; Fisher and Roark, 2007).
    Page 7, “Document-level Parsing Approaches”

See all papers in Proc. ACL 2013 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

parsing algorithm

Appears in 9 sentences as: Parsing Algorithm (1) parsing algorithm (7) parsing algorithms (1)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. Our parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing.
    Page 1, “Abstract”
  2. Second, existing parsers apply greedy and suboptimal parsing algorithms to build the DT for a document.
    Page 1, “Introduction”
  3. Both of our parsers have the same two components: a parsing model assigns a probability to every possible DT, and a parsing algorithm identifies the most probable DT among the candidate DTs in that scenario.
    Page 3, “Our Discourse Parsing Framework”
  4. While the two models are rather different, the same parsing algorithm is shared by the two modules.
    Page 3, “Our Discourse Parsing Framework”
  5. Before describing our parsing models and the parsing algorithm , we introduce some terminology that we will use throughout the paper.
    Page 3, “Our Discourse Parsing Framework”
  6. Once we obtain the probability of all possible DT constituents, the discourse sub-trees for the sentences are built by applying an optimal probabilistic parsing algorithm (Section 4.4) using one of the methods described in Section 5.
    Page 5, “Parsing Models and Parsing Algorithm”
  7. 4.4 Parsing Algorithm
    Page 6, “Parsing Models and Parsing Algorithm”
  8. Given the probability of all possible DT constituents in the intra-sentential and multi-sentential scenarios, the job of the parsing algorithm is to find the most probable DT for that scenario.
    Page 6, “Parsing Models and Parsing Algorithm”
  9. In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention parsing algorithm.

See all papers in Proc. ACL that mention parsing algorithm.

Back to top.

CRF

Appears in 5 sentences as: CRF (5)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. The CRF models effectively represent the structure and the label of a DT constituent jointly, and whenever possible, capture the sequential dependencies between the constituents.
    Page 1, “Introduction”
  2. To cope with this limitation, our CRF models support a probabilistic bottom-up parsing
    Page 1, “Introduction”
  3. Figure 62 A CRF as a multi—sentential parsing model.
    Page 5, “Parsing Models and Parsing Algorithm”
  4. It becomes a CRF if we directly model the hidden (output) variables by conditioning its clique potential (or factor) gb on the observed (input) variables:
    Page 5, “Parsing Models and Parsing Algorithm”
  5. In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

dynamic programming

Appears in 4 sentences as: dynamic programming (4)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. Following (Joty et al., 2012), we implement a probabilistic CKY—like bottom-up algorithm for computing the most likely parse using dynamic programming .
    Page 6, “Parsing Models and Parsing Algorithm”
  2. Specifically, with n discourse units, we use the upper-triangular portion of the n><n dynamic programming table D. Given U$(0) and U$(1) are the start and end EDU Ids of unit U95:
    Page 6, “Parsing Models and Parsing Algorithm”
  3. We pick the subtree which has the higher probability in the two dynamic programming tables.
    Page 7, “Document-level Parsing Approaches”
  4. If the sentence has the same number of sub-trees in both DTp and DTn, we pick the one with higher probability in the dynamic programming tables.
    Page 7, “Document-level Parsing Approaches”

See all papers in Proc. ACL 2013 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

semantic similarity

Appears in 4 sentences as: semantic similarity (2) semantically similar (2)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. In general, the errors are produced by two different causes acting together: (i) imbalanced distribution of the relations, and (ii) semantic similarity between the relations.
    Page 9, “Experiments”
  2. The most frequent relation Elaboration tends to mislead others especially, the ones which are semantically similar (e.g., Explanation, Background) and less frequent (e.g., Summary, Evaluation).
    Page 9, “Experiments”
  3. The relations which are semantically similar mislead each other (e.g., Temporal:Background, Cause:Explanation).
    Page 9, “Experiments”
  4. We would like to employ a more robust method (e.g., ensemble methods with bagging) to deal with the imbalanced distribution of relations, along with taking advantage of a richer semantic knowledge (e.g., compositional semantics) to cope with the errors caused by semantic similarity between the rhetorical relations.
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

F-score

Appears in 3 sentences as: F-score (4)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. To evaluate the parsing performance, we use the standard unlabeled (i.e., hierarchical spans) and labeled (i.e., nuclearity and relation) precision, recall and F-score as described in (Marcu, 2000b).
    Page 8, “Experiments”
  2. Table 2 presents F-score parsing results for our parsers and the existing systems on the two corpora.2 On both corpora, our parser, namely, lS-lS (TSP 1-1) and sliding window (TSP SW), outperform existing systems by a wide margin (p<7.le-05).3 On RST—DT, our parsers achieve absolute F-score improvements of 8%, 9.4% and 11.4% in span, nuclearity and relation, respectively, over HILDA.
    Page 8, “Experiments”
  3. On the Instructional genre, our parsers deliver absolute F-score improvements of 10.5%, 13.6% and 8.14% in span, nuclearity and relations, respectively, over the ILP-based approach.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

news articles

Appears in 3 sentences as: news articles (3)
In Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
  1. While previous approaches have been tested on only one corpus, we evaluate our approach on texts from two very different genres: news articles and instructional how-to-do manuals.
    Page 2, “Introduction”
  2. They evaluate their approach on the RST—DT corpus (Carlson et al., 2002) of news articles .
    Page 3, “Related work”
  3. For example, this is true for 75% cases in our development set containing 20 news articles from RST—DT and for 79% cases in our development set containing 20 how-to-do manuals from the Instructional corpus.
    Page 7, “Document-level Parsing Approaches”

See all papers in Proc. ACL 2013 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.