Discourse-annotated corpora | In the framework of RST, a coherent text can be represented as a discourse tree whose leaves are non-overlapping text spans called elementary discourse units ( EDUs ); these are the minimal text units of discourse trees. |
Discourse-annotated corpora | The example text fragment shown in Figure 1 consists of four EDUs (e1-e4), segmented by square brackets. |
Discourse-annotated corpora | The two EDUs e1 and eg are related by a mononuclear relation ATTRIBUTION, where e1 is the more salient span; the span (e1-e2) and the EDU e3 are related by a multi-nuclear relation SAME-UNIT, where they are equally salient. |
Method | Following the methodology of HILDA, an input text is first segmented into EDUs . |
Method | Then, from the EDUs , a bottom-up approach is applied to build a discourse tree for the full text. |
Method | Initially, a binary Structure classifier evaluates whether a discourse relation is likely to hold between consecutive EDUs . |
Text-level discourse parsing | The two EDUs associated with each sentence are coherent themselves, whereas the combination of the two sentences is not coherent at the sentence boundary. |