Introduction | Rhetorical Structure Theory (RST) (Mann and Thompson, 1988), one of the most influential theories of discourse, represents texts by labeled hierarchical structures, called Discourse Trees (DTs), as exemplified by a sample DT in Figure l. The leaves of a DT correspond to contiguous Elementary Discourse Units ( EDUs ) (six in the example). |
Introduction | Adjacent EDUs are connected by rhetorical relations (e.g., Elaboration, Contrast), forming larger discourse units (represented by internal |
Introduction | Discourse analysis in RST involves two subtasks: discourse segmentation is the task of identifying the EDUs , and discourse parsing is the task of linking the discourse units into a labeled tree. |
Our Discourse Parsing Framework | Given a document with sentences already segmented into EDUs, the discourse parsing problem is determining which discourse units ( EDUs or larger units) to relate (i.e., the structure), and how to relate them (i.e., the labels or the discourse relations) in the resulting DT. |
Our Discourse Parsing Framework | Note that the number of valid trees grows exponentially with the number of EDUs in a document.1 Therefore, an exhaustive search over the valid trees is often unfeasible, even for relatively small documents. |
Our Discourse Parsing Framework | 1For n —|— 1 EDUs , the number of valid discourse trees is actually the Catalan number Cn. |
Parsing Models and Parsing Algorithm | The observed nodes Uj in a sequence represent the discourse units ( EDUs or larger units). |
Related work | Given the EDUs in a doc- |
Introduction | In other words, for two adjacent EDUs not connected by any of the above three relations, the prior probability of staying at the same topic and sentiment level is higher than picking a new topic and sentiment level (i.e. |
Introduction | Drawing model parameters First, at the corpus level, we draw a distribution 95 over four discourse relations: three relations as defined in Table 1 and an additional dummy relation 4 to indicate that there is no relation between two adjacent EDUs (N oRel atz'on). |
Introduction | These parameters encode the intuition that most pairs of EDUs do not exhibit a discourse relation relevant for the task (i.e. |