Abstract | In this paper, we develop an RST—style text-level discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). |
Abstract | We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourse-parsing performance under different discourse conditions. |
Introduction | Research in discourse parsing aims to unmask such relations in text, which is helpful for many downstream applications such as summarization, information retrieval, and question answering. |
Introduction | However, most existing discourse parsers operate on individual sentences alone, whereas discourse parsing is more powerful for text-level analysis. |
Introduction | Therefore, in this work, we aim to develop a text-level discourse parser . |
Related work | Discourse parsing was first brought to prominence by Marcu (1997). |
Related work | Here we briefly review two fully implemented text-level discourse parsers with the state-of-the-art performance. |
Related work | The HILDA discourse parser of Hemault and his colleagues (duVerle and Prendinger, 2009; Hernault et al., 2010b) is the first fully-implemented feature-based discourse parser that works at the full text level. |
Abstract | We propose a novel approach for developing a two-stage document-level discourse parser . |
Abstract | We present two approaches to combine these two stages of discourse parsing effectively. |
Abstract | A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the state-of-the-art, often by a wide margin. |
Introduction | Discourse analysis in RST involves two subtasks: discourse segmentation is the task of identifying the EDUs, and discourse parsing is the task of linking the discourse units into a labeled tree. |
Introduction | While recent advances in automatic discourse segmentation and sentence-level discourse parsing have attained accuracies close to human performance (Fisher and Roark, 2007; J oty et al., 2012), discourse parsing at the document-level still poses significant challenges (Feng and Hirst, 2012) and the performance of the existing document-level parsers (Hemault et al., 2010; Subba and Di-Eugenio, 2009) is still considerably inferior compared to human gold-standard. |
Introduction | This paper aims to reduce this performance gap and take discourse parsing one step further. |
Abstract | Text-level discourse parsing remains a challenge. |
Introduction | Discourse parsing is the task of identifying the presence and the type of the discourse relations between discourse units. |
Introduction | While research in discourse parsing can be partitioned into several directions according to different theories and frameworks, Rhetorical Structure Theory (RST) (Mann and Thompson, 1988) is probably the most ambitious one, because it aims to identify not only the discourse relations in a small local context, but also the hierarchical tree structure for the full text: from the relations relating the smallest discourse units (called elementary discourse units, EDUs), to the ones connecting paragraphs. |
Introduction | Conventionally, there are two major subtasks related to text-level discourse parsing : (l) EDU segmentation: to segment the raw text into EDUs, and (2) tree-building: to build a discourse tree from EDUs, representing the discourse relations in the text. |
Related work | 2.1 HILDA discourse parser |
Related work | The HILDA discourse parser by Hernault et al. |
Related work | (2010) is the first attempt at RST-style text-level discourse parsing . |
Abstract | Text-level discourse parsing is notoriously difficult, as distinctions between discourse relations require subtle semantic judgments that are not easily captured using standard features. |
Abstract | In this paper, we present a representation learning approach, in which we transform surface features into a latent space that facilitates RST discourse parsing . |
Abstract | The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank. |
Introduction | Unfortunately, the performance of discourse parsing is still relatively weak: the state-of-the-art F—measure for text-level relation detection in the RST Treebank is only slightly above 55% (Joty |
Introduction | In this paper, we present a representation leam-ing approach to discourse parsing . |
Introduction | Our method is implemented as a shift-reduce discourse parser (Marcu, 1999; Sagae, 2009). |
Model | The core idea of this paper is to project lexical features into a latent space that facilitates discourse parsing . |
Model | Thus, we name the approach DPLP: Discourse Parsing from Linear Projection. |
Model | We apply transition-based (incremental) structured prediction to obtain a discourse parse , training a predictor to make the correct incremental moves to match the annotations of training data in the RST Treebank. |
Abstract | Previous researches on Text-level discourse parsing mainly made use of constituency structure to parse the whole document into one discourse tree. |
Abstract | In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units (EDUs). |
Abstract | Experiments show that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing . |
Add arc <eC,ej> to GC with | The third feature type (Position) is also very helpful to discourse parsing . |
Discourse Dependency Parsing | Figure 5 shows the details of the Chu-Liu/Edmonds algorithm for discourse parsing . |
Discourse Dependency Structure and Tree Bank | Section 3 presents the discourse parsing approach based on the Eisner and MST algorithms. |
Introduction | Researches in discourse parsing aim to acquire such relations in text, which is fundamental to many natural language processing applications such as question answering, automatic summarization and so on. |
Introduction | One important issue behind discourse parsing is the representation of discourse structure. |
Introduction | 1 EDU segmentation is a relatively trivial step in discourse parsing . |
Building a Discourse Parser | In our work, we focused exclusively on the second step of the discourse parsing problem, i.e., constructing the RST tree from a sequence of edus that have been segmented beforehand. |
Building a Discourse Parser | The motivation for leaving aside segmenting were both practical — previous discourse parsing efforts (Soricut and Marcu, 2003; LeThanh et al., 2004) already provide alternatives for standalone segmenting tools — and scientific, namely, the greater need for improvements in labeling. |
Conclusions and Future Work | In this paper, we have shown that it is possible to build an accurate automatic text-level discourse parser based on supervised machine-learning algorithms, using a feature-driven approach and a manually annotated corpus. |
Conclusions and Future Work | A complete online discourse parser , incorporating the parsing tool presented above combined with a new segmenting method has since been made freely available at http: / /nlp . |
Evaluation | To the best of our knowledge, only two fully functional text-level discourse parsing algorithms for general text have published their results: Marcu’s decision-tree-based parser (Marcu, 2000) and the multilevel rule-based system built by LeThanh et al. |
Introduction | The goal of discourse parsing is to extract this high-level, rhetorical structure. |
Introduction | Discourse parsing , on the other hand, focuses on a higher-level view of text, allowing some flexibility in the choice of formal representation while providing a wide range of applications in both analytical and computational linguistics. |
Introduction | Several attempts to automate discourse parsing have been made. |
Abstract | We first design two discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory. |
Conclusions and Future Work | First, we defined two simple discourse-aware similarity metrics (lexicalized and un-lexicalized), which use the all-subtree kernel to compute similarity between discourse parse trees in accordance with the Rhetorical Structure Theory. |
Introduction | One possible reason could be the unavailability of accurate discourse parsers . |
Introduction | We first design two discourse-aware similarity measures, which use DTs generated by a publicly-available discourse parser (J oty et al., 2012); then, we show that they can help improve a number of MT evaluation metrics at the segment- and at the system-level in the context of the WMT11 and the WMT12 metrics shared tasks (Callison-Burch et al., 2011; Callison-Burch et al., 2012). |
Our Discourse-Based Measures | In order to develop a discourse-aware evaluation metric, we first generate discourse trees for the reference and the system-translated sentences using a discourse parser , and then we measure the similarity between the two discourse trees. |
Our Discourse-Based Measures | In Rhetorical Structure Theory, discourse analysis involves two subtasks: (i) discourse segmentation, or breaking the text into a sequence of EDUs, and (ii) discourse parsing , or the task of linking the units (EDUs and larger discourse units) into labeled discourse trees. |
Our Discourse-Based Measures | (2012) proposed discriminative models for both discourse segmentation and discourse parsing at the sentence level. |
Related Work | Compared to the previous work, (i) we use a different discourse representation (RST), (ii) we compare discourse parses using all-subtree kernels (Collins and Duffy, 2001), (iii) we evaluate on much larger datasets, for several language pairs and for multiple metrics, and (iv) we do demonstrate better correlation with human judgments. |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | This is a motivating result for discourse analysis, especially considering that the discourse parser was trained on a domain different from the corpora used here. |
Experiments | Due to the speed limitations of the discourse parser , we randomly drew 10,000 QA pairs from the corpus of how questions described by Surdeanu et al. |
Models and Features | 4.2 Discourse Parser Model |
Models and Features | The discourse parser model (DPM) is based on the RST discourse framework (Mann and Thompson, 1988). |
Models and Features | However, this also introduces noise because discourse analysis is a complex task and discourse parsers are not perfect. |
Related Work | In terms of discourse parsing , Verberne et al. |
Related Work | Discourse Parser (deep) |
Related Work | They later concluded that while discourse parsing appears to be useful for QA, automated discourse parsing tools are required before this approach can be tested at scale (Verbeme et al., 2010). |
A Refined Approach | In developing an improved model, we need to better exploit the discourse parser’s output to provide more circumstantial evidence to support the system’s coherence decision. |
Experiments | We must also be careful in using the automatic discourse parser . |
Experiments | We note that the discourse parser of Lin et a1. |
Experiments | Since the discourse parser utilizes paragraph boundaries but a permuted text does not have such boundaries, we ignore paragraph boundaries and treat the source text as if it has only one paragraph. |
Introduction | To the best our knowledge, this is also the first study in which we show output from an automatic discourse parser helps in coherence modeling. |
Related Work | This task, discourse parsing , has been a recent focus of study in the natural language processing (NLP) community, largely enabled by the availability of large-scale discourse annotated corpora (Wellner and Pustejovsky, 2007; Elwell and Baldridge, 2008; Lin et al., 2009; Pitler et al., 2009; Pitler and Nenkova, 2009; Lin et al., 2010; Wang et al., 2010). |
Using Discourse Relations | To utilize discourse relations of a text, we first apply automatic discourse parsing on the input text. |
Abstract | Segmentation is the first step in a discourse parser , a system that constructs discourse trees from elementary discourse units. |
Discussion | Besides its use in automatic discourse parsing , the system could |
Introduction* | Since segmentation is the first stage of discourse parsing , quality discourse segments are critical to building quality discourse representations (Soricut and Marcu, 2003). |
Introduction* | Most parsers can break down a sentence into constituent clauses, approaching the type of output that we need as input to a discourse parser . |
Related Work | Soricut and Marcu (2003) construct a statistical discourse segmenter as part of their sentence-level discourse parser (SPADE), the only implementation available for our comparison. |