ParGramBank: The ParGram Parallel Treebank
Sulger, Sebastian and Butt, Miriam and King, Tracy Holloway and Meurer, Paul and Laczkó, Tibor and Rákosi, György and Dione, Cheikh Bamba and Dyvik, Helge and Rosén, Victoria and De Smedt, Koenraad and Patejuk, Agnieszka and Cetinoglu, Ozlem and Arka, I Wayan and Mistica, Meladel

Article Structure

Abstract

This paper discusses the construction of a parallel treebank currently involving ten languages from six language families.

Introduction

This paper discusses the construction of a parallel treebank currently involving ten languages that represent several different language families, including non-Indo-European.

Related Work

There have been several efforts in parallel treebanking across theories and annotation schemes.

ParGram and its Feature Space

The ParGram grammars use the LFG formalism which produces c(onstituent)-structures (trees) and f(unctional)-structures as the syntactic analysis.

Treebank Design and Construction

For the initial seeding of the treebank, we focused on 50 sentences which were constructed manually to cover a diverse range of phenomena (transitivity, voice alternations, interrogatives, embedded clauses, copula constructions, control/raising verbs, etc.).

Challenges for Parallelism

We detail some challenges in maintaining parallelism across typologically distinct languages.

Linguistically Motivated Alignment

The treebank is automatically aligned on the sentence level, the top level of alignment within ParGramBank.

Discussion and Future Work

We have discussed the construction of ParGramBank, a parallel treebank for ten typologically different languages.

Topics

treebank

Appears in 52 sentences as: treebank (40) treebanking (9) treebanks (9) ‘Treebank (1) ‘treebank’ (1)
In ParGramBank: The ParGram Parallel Treebank
  1. This paper discusses the construction of a parallel treebank currently involving ten languages from six language families.
    Page 1, “Abstract”
  2. The treebank is based on deep LFG (Lexical-Functional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort.
    Page 1, “Abstract”
  3. This output forms the basis of a parallel treebank covering a diverse set of phenomena.
    Page 1, “Abstract”
  4. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs.
    Page 1, “Abstract”
  5. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks , that represents
    Page 1, “Abstract”
  6. This paper discusses the construction of a parallel treebank currently involving ten languages that represent several different language families, including non-Indo-European.
    Page 1, “Introduction”
  7. The treebank is based on the output of individual deep LFG (Lexical-Functional Grammar) grammars that were developed independently at different sites but within the overall framework of ParGram (the Parallel Grammar project) (Butt et al., 1999a; Butt et al., 2002).
    Page 1, “Introduction”
  8. This output forms the basis of the ParGramBank parallel treebank discussed here.
    Page 2, “Introduction”
  9. An obvious application for parallel treebanking is machine translation, where treebank size is a deciding factor for whether a particular treebank can support a particular kind of research project.
    Page 2, “Introduction”
  10. The treebanking effort reported on in this paper supports work of the latter focus, including efforts at multilingual dependency parsing (Naseem et al., 2012).
    Page 2, “Introduction”
  11. 1Throughout this paper ‘treebank’ refers to both phrase-
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

word order

Appears in 3 sentences as: word order (3)
In ParGramBank: The ParGram Parallel Treebank
  1. In contrast, c-structures encode language particular differences in linear word order , surface morphological vs. syntactic structures, and constituency (Dalrymple, 2001).
    Page 3, “ParGram and its Feature Space”
  2. The left/upper c- and f-structures show the parse from the English ParGram grammar, the right/lower ones from Urdu ParGram grammar.4’5 The c-structures encode linear word order and constituency and thus look very different; e.g., the English structure is rather hierarchical while the Urdu structure is flat (Urdu is a free word-order language with no evidence for a VP; Butt (1995)).
    Page 4, “ParGram and its Feature Space”
  3. The representations offer information about dependency relations as well as word order , constituency and part-of-speech.
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2013 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.