Coordination Structures in Dependency Treebanks
Popel, Martin and Mareċek, David and Štěpánek, Jan and Zeman, Daniel and Żabokrtský, Zděněk

Article Structure

Abstract

Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms.

Introduction

In the last decade, dependency parsing has gradually been receiving visible attention.

Related work

Let us first recall the basic well-known characteristics of CSs.

Variations in representing coordination structures

Our analysis of variations in representing coordination structures is based on observations from a set of dependency treebanks for 26 languages.7

Coordination Structures in Treebanks

In this section, we identify the CS styles defined in the previous section as used in the primary treebank data sources; statistical observations (such as the amount of annotated shared modifiers) presented here, as well as experiments on CS-style convertibility presented in Section 5.2, are based on the normalized shapes of the treebanks as contained in the HamleDT 1.0 treebank collection (Zeman et al., 2012).15

Topics

treebanks

Appears in 47 sentences as: TreeBank (1) Treebank (20) treebank (15) treebanking (1) treebanks (32)
In Coordination Structures in Dependency Treebanks
  1. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages.
    Page 1, “Abstract”
  2. One of the reasons is the increased availability of dependency treebanks, be they results of genuine dependency annotation projects or converted automatically from previously existing phrase-structure treebanks .
    Page 1, “Introduction”
  3. In both cases, a number of decisions have to be made during the construction or conversion of a dependency treebank .
    Page 1, “Introduction”
  4. The dominating solution in treebank design is to introduce artificial rules for the encoding of coordination structures within dependency trees using the same means that express dependencies, i.e., by using edges and by labeling of nodes or edges.
    Page 1, “Introduction”
  5. However, as there is no obvious linguistic intuition telling us which tree-shaped CS encoding is better and since the degree of freedom has several dimensions, one can find a number of distinct conventions introduced in particular dependency treebanks .
    Page 1, “Introduction”
  6. The main goal of this paper is to give a systematic survey of the solutions adopted in these treebanks .
    Page 1, “Introduction”
  7. Section 4 lists treebanks whose CS conventions we studied.
    Page 2, “Introduction”
  8. 0 PS = Prague Dependency Treebank (PDT) style: all conjuncts are attached under the coordinating conjunction (along with shared modifiers, which are distinguished by a special attribute) (Hajic et al., 2006),
    Page 2, “Related work”
  9. Moreover, particular treebanks vary in their contents even more than in their format, i.e.
    Page 3, “Related work”
  10. each treebank has its own way of representing prepositions or different granularity of syntactic labels.
    Page 3, “Related work”
  11. Our analysis of variations in representing coordination structures is based on observations from a set of dependency treebanks for 26 languages.7
    Page 3, “Variations in representing coordination structures”

See all papers in Proc. ACL 2013 that mention treebanks.

See all papers in Proc. ACL that mention treebanks.

Back to top.

dependency relation

Appears in 6 sentences as: dependency relation (4) dependency relations (2)
In Coordination Structures in Dependency Treebanks
  1. For example, if a noun is modified by two coordinated adjectives, there is a (symmetric) coordination relation between the two conjuncts and two (asymmetric) dependency relations between the conjuncts and the noun.
    Page 1, “Introduction”
  2. Besides that, there should be a label classifying the dependency relation between the CS and its parent.
    Page 5, “Variations in representing coordination structures”
  3. The dependency relation of the whole CS to its parent is represented by the label of the conjunction, while the conjuncts are marked with a special label for conjuncts (e. g. cco f in the Hyderabad Dependency Treebank).
    Page 5, “Variations in representing coordination structures”
  4. Subsequently, each conjunct has its own label that reflects the dependency relation towards the parent of the whole CS, therefore, conjuncts of the same CS can have different labels, e.g.
    Page 5, “Variations in representing coordination structures”
  5. it is not possible to represent nested CSs in the Moscow and Stanford families without significantly changing the number of possible labels.17 The dL style (which is most easily applicable to the Prague family) can represent coordination of different dependency relations .
    Page 7, “Coordination Structures in Treebanks”
  6. (2012), the normalization procedures used in HamleDT embrace many other phenomena as well (not only those related to coordination), and involve both structural transformation and dependency relation relabeling.
    Page 8, “Coordination Structures in Treebanks”

See all papers in Proc. ACL 2013 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

dependency parsing

Appears in 5 sentences as: dependency parsers (1) dependency parsing (4)
In Coordination Structures in Dependency Treebanks
  1. In the last decade, dependency parsing has gradually been receiving visible attention.
    Page 1, “Introduction”
  2. MTT possesses a complex set of linguistic criteria for identifying the governor of a relation (see Mazziotta (2011) for an overview), which lead to MS. MS is preferred in a rule-based dependency parsing system of Lombardo and Lesmo (1998).
    Page 3, “Related work”
  3. The primitive format used for CoNLL shared tasks is widely used in dependency parsing , but its weaknesses have already been pointed out (cf.
    Page 3, “Related work”
  4. Most state-of-the-art dependency parsers can produce labeled edges.
    Page 5, “Variations in representing coordination structures”
  5. Some of the treebanks were downloaded individually from the web, but most of them came from previously published collections for dependency parsing campaigns: six languages from CoNLL-2006 (Buchholz and Marsi, 2006), seven languages from CoNLL-2007 (Nivre et al., 2007), two languages from CoNLL-2009 (Hajic and others, 2009), three languages from ICON-2010 (Husain et al., 2010).
    Page 6, “Coordination Structures in Treebanks”

See all papers in Proc. ACL 2013 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

dependency trees

Appears in 5 sentences as: dependency tree (2) dependency trees (3)
In Coordination Structures in Dependency Treebanks
  1. The dominating solution in treebank design is to introduce artificial rules for the encoding of coordination structures within dependency trees using the same means that express dependencies, i.e., by using edges and by labeling of nodes or edges.
    Page 1, “Introduction”
  2. Even this simplest case is difficult to represent within a dependency tree because, in the words of Lombardo and Lesmo (1998): Dependency paradigms exhibit obvious difi‘iculties with coordination because, diflerently from most linguistic structures, it is not possible to characterize the coordination construct with a general schema involving a head and some modifiers of it.
    Page 2, “Related work”
  3. In accordance with the usual conventions, we assume that each sentence is represented by one dependency tree , in which each node corresponds to one token (word or punctuation mark).
    Page 3, “Variations in representing coordination structures”
  4. Apart from that, we deliberately limit ourselves to CS representations that have shapes of connected subgraphs of dependency trees .
    Page 3, “Variations in representing coordination structures”
  5. We limit our inventory of means of expressing CSs within dependency trees to (i) tree topology (presence or absence of a directed edge between two nodes, Section 3.1), and (ii) node labeling (additional attributes stored insided nodes, Section 3.2).8 Further, we expect that the set of possible variations can be structured along several dimensions, each of which corresponds to a certain simple characteristic (such as choosing the leftmost conjunct as the CS head, or attaching shared modifiers below the nearest conjunct).
    Page 3, “Variations in representing coordination structures”

See all papers in Proc. ACL 2013 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

CoNLL

Appears in 4 sentences as: CoNLL (4)
In Coordination Structures in Dependency Treebanks
  1. The primitive format used for CoNLL shared tasks is widely used in dependency parsing, but its weaknesses have already been pointed out (cf.
    Page 3, “Related work”
  2. 7The primary data sources are the following: Ancient Greek: Ancient Greek Dependency Treebank (B amman and Crane, 2011), Arabic: Prague Arabic Dependency Treebank 1.0 (Smri et al., 2008), Basque: Basque Dependency Treebank (larger version than CoNLL 2007 generously pro-
    Page 3, “Variations in representing coordination structures”
  3. Obviously, there is a certain risk that the CS-related information contained in the source treebanks was slightly biased by the properties of the CoNLL format upon conversion.
    Page 6, “Coordination Structures in Treebanks”
  4. the 2nd column of Table 1), but some were originally based on constituents and thus specific converters to the CoNLL format had to be created (for instance, the Spanish phrase-structure trees were converted to dependencies using a procedure described by Civit et al.
    Page 6, “Coordination Structures in Treebanks”

See all papers in Proc. ACL 2013 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.