Index of papers in Proc. ACL 2013 that mention
  • treebank
Zhang, Yuan and Barzilay, Regina and Globerson, Amir
Abstract
We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank .
Abstract
While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features.
Introduction
The standard solution to this bottleneck has relied on manually crafted transformation rules that map readily available syntactic annotations (e.g, the Penn Treebank ) to the desired formalism.
Introduction
In addition, designing these rules frequently requires external resources such as Wordnet, and even involves correction of the existing treebank .
Introduction
A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank , while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar (CCG) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994).
Related Work
For instance, mappings may specify how to convert traces and functional tags in Penn Treebank to the f-structure in LFG (Cahill, 2004).
treebank is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Liu, Kai and Lü, Yajuan and Jiang, Wenbin and Liu, Qun
Bilingual Projection of Dependency Grammar
Therefore, we can hardly obtain a treebank with complete trees through direct projection.
Bilingual Projection of Dependency Grammar
So we extract projected discrete dependency arc instances instead of treebank as training set for the projected grammar induction model.
Bilingually-Guided Dependency Grammar Induction
Then we incorporate projection model into our iterative unsupervised framework, and jointly optimize unsupervised and projection objectives with evolving treebank and constant projection information respectively.
Introduction
A randomly-initialized monolingual treebank evolves in a self-training iterative procedure, and the grammar parameters are tuned to simultaneously maximize both the monolingual likelihood and bilingually-projected likelihood of the evolving treebank .
Unsupervised Dependency Grammar Induction
And the framework of our unsupervised model builds a random treebank on the monolingual corpus firstly for initialization and trains a discriminative parsing model on it.
Unsupervised Dependency Grammar Induction
Then we use the parser to build an evolved treebank with the l-best result for the next iteration run.
Unsupervised Dependency Grammar Induction
In this way, the parser and treebank evolve in an iterative way until convergence.
treebank is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Mareċek, David and Straka, Milan
Abstract
By incorporating this knowledge into Dependency Model with Valence, we managed to considerably outperform the state-of-the-art results in terms of average attachment score over 20 treebanks from CoNLL 2006 and 2007 shared tasks.
Experiments
The first type are CoNLL treebanks from the year 2006 (Buchholz and Marsi, 2006) and 2007 (Nivre et al., 2007), which we use for inference and for evaluation.
Experiments
The Wikipedia texts were automatically tokenized and segmented to sentences so that their tokenization was similar to the one in the CoNLL evaluation treebanks .
Experiments
To evaluate the quality of our estimations, we compare them with P301), the stop probabilities computed directly on the evaluation treebanks .
Introduction
This is still far below the supervised approaches, but their indisputable advantage is the fact that no annotated treebanks are needed and the induced structures are not burdened by any linguistic conventions.
Introduction
supervised parsers always only simulate the treebanks they were trained on, whereas unsupervised parsers have an ability to be fitted to different particular applications.
Model
Finally, we obtain the probability of the whole generated treebank as a product over the trees:
Model
no matter how the trees are ordered in the treebank , the Ptreebank is always the same.
STOP-probability estimation
stop words in the treebank should be 2/3.
treebank is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
McDonald, Ryan and Nivre, Joakim and Quirmbach-Brundage, Yvonne and Goldberg, Yoav and Das, Dipanjan and Ganchev, Kuzman and Hall, Keith and Petrov, Slav and Zhang, Hao and Täckström, Oscar and Bedini, Claudia and Bertomeu Castelló, Núria and Lee, Jungmee
Abstract
We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean.
Abstract
This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
Introduction
Research in dependency parsing — computational methods to predict such representations — has increased dramatically, due in large part to the availability of dependency treebanks in a number of languages.
Introduction
While these data sets are standardized in terms of their formal representation, they are still heterogeneous treebanks .
Introduction
That is to say, despite them all being dependency treebanks , which annotate each sentence with a dependency tree, they subscribe to different annotation schemes.
Towards A Universal Treebank
(2004) for multilingual syntactic treebank construction.
Towards A Universal Treebank
The second, used only for English and Swedish, is to automatically convert existing treebanks , as in Zeman et al.
Towards A Universal Treebank
For English, we used the Stanford parser (v1.6.8) (Klein and Manning, 2003) to convert the Wall Street J our-nal section of the Penn Treebank (Marcus et al., 1993) to basic dependency trees, including punctuation and with the copula verb as head in copula constructions.
treebank is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Popel, Martin and Mareċek, David and Štěpánek, Jan and Zeman, Daniel and Żabokrtský, Zděněk
Abstract
We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages.
Introduction
One of the reasons is the increased availability of dependency treebanks, be they results of genuine dependency annotation projects or converted automatically from previously existing phrase-structure treebanks .
Introduction
In both cases, a number of decisions have to be made during the construction or conversion of a dependency treebank .
Introduction
The dominating solution in treebank design is to introduce artificial rules for the encoding of coordination structures within dependency trees using the same means that express dependencies, i.e., by using edges and by labeling of nodes or edges.
Related work
0 PS = Prague Dependency Treebank (PDT) style: all conjuncts are attached under the coordinating conjunction (along with shared modifiers, which are distinguished by a special attribute) (Hajic et al., 2006),
Related work
Moreover, particular treebanks vary in their contents even more than in their format, i.e.
Related work
each treebank has its own way of representing prepositions or different granularity of syntactic labels.
Variations in representing coordination structures
Our analysis of variations in representing coordination structures is based on observations from a set of dependency treebanks for 26 languages.7
treebank is mentioned in 47 sentences in this paper.
Topics mentioned in this paper:
Sulger, Sebastian and Butt, Miriam and King, Tracy Holloway and Meurer, Paul and Laczkó, Tibor and Rákosi, György and Dione, Cheikh Bamba and Dyvik, Helge and Rosén, Victoria and De Smedt, Koenraad and Patejuk, Agnieszka and Cetinoglu, Ozlem and Arka, I Wayan and Mistica, Meladel
Abstract
This paper discusses the construction of a parallel treebank currently involving ten languages from six language families.
Abstract
The treebank is based on deep LFG (Lexical-Functional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort.
Abstract
This output forms the basis of a parallel treebank covering a diverse set of phenomena.
Introduction
This paper discusses the construction of a parallel treebank currently involving ten languages that represent several different language families, including non-Indo-European.
Introduction
The treebank is based on the output of individual deep LFG (Lexical-Functional Grammar) grammars that were developed independently at different sites but within the overall framework of ParGram (the Parallel Grammar project) (Butt et al., 1999a; Butt et al., 2002).
Introduction
This output forms the basis of the ParGramBank parallel treebank discussed here.
treebank is mentioned in 52 sentences in this paper.
Topics mentioned in this paper:
Uematsu, Sumire and Matsuzaki, Takuya and Hanaoka, Hiroki and Miyao, Yusuke and Mima, Hideki
Background
Yoshida (2005) proposed methods for extracting a wide-coverage lexicon based on HPSG from a phrase structure treebank of Japanese.
Background
Their treebanks are annotated with dependencies of words, the conversion of which into phrase structures is not a big concern.
Conclusion
Our method integrates multiple dependency-based resources to convert them into an integrated phrase structure treebank .
Conclusion
The obtained treebank is then transformed into CCG derivations.
Corpus integration and conversion
As we have adopted the method of CCGbank, which relies on a source treebank to be converted into CCG derivations, a critical issue to address is the absence of a Japanese counterpart to PTB.
Corpus integration and conversion
Our solution is to first integrate multiple dependency-based resources and convert them into a phrase structure treebank that is independent
Corpus integration and conversion
Next, we translate the treebank into CCG derivations (Step 2).
Introduction
Our work is basically an extension of a seminal work on CCGbank (Hockenmaier and Steedman, 2007), in which the phrase structure trees of the Penn Treebank (PTB) (Marcus et al., 1993) are converted into CCG derivations and a wide-coverage CCG lexicon is then extracted from these derivations.
treebank is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Tratz, Stephen and Hovy, Eduard
Dataset Creation
21,938 total examples, 15,330 come from sections 2—21 of the Penn Treebank (Marcus et al., 1993).
Dataset Creation
For the Penn Treebank , we extracted the examples using the provided gold standard parse trees, whereas, for the latter cases, we used the output of an open source parser (Tratz and Hovy, 2011).
Experiments
The accuracy figures for the test instances from the Penn Treebank , The Jungle Book, and The History of the Decline and Fall of the Roman Empire were 88.8%, 84.7%, and 80.6%, respectively.
Related Work
The NomBank project (Meyers et al., 2004) provides coarse annotations for some of the possessive constructions in the Penn Treebank , but only those that meet their criteria.
Semantic Relation Inventory
Penn Treebank , respectively.
Semantic Relation Inventory
portion of the Penn Treebank .
Semantic Relation Inventory
The Penn Treebank and The History of the Decline and Fall of the R0-man Empire were substantially more similar, although there are notable differences.
treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Abstract
Empty categories (EC) are artificial elements in Penn Treebanks motivated by the govemment-binding (GB) theory to explain certain language phenomena such as pro-drop.
Chinese Empty Category Prediction
The empty categories in the Chinese Treebank (CTB) include trace markers for A’- and A-movement, dropped pronoun, big PRO etc.
Chinese Empty Category Prediction
Our effort of recovering ECs is a two-step process: first, at training time, ECs in the Chinese Treebank are moved and preserved in the portion of the tree structures pertaining to surface words only.
Experimental Results
We use Chinese Treebank (CTB) V7.0 to train and test the EC prediction model.
Introduction
In order to account for certain language phenomena such as pro-drop and wh-movement, a set of special tokens, called empty categories (EC), are used in Penn Treebanks (Marcus et al., 1993; Bies and Maamouri, 2003; Xue et al., 2005).
treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Özbal, Gözde and Pighin, Daniele and Strapparava, Carlo
Architecture of BRAINSUP
These constraints are learned from relation-head-modifier co-occurrence counts estimated from a dependency treebank £.
Architecture of BRAINSUP
Algorithm 1 SentenceGeneration(U, 9, 73, [2): U is the user specification, 8 is a set of meta-parameters; 73 and £3 are two dependency treebanks .
Architecture of BRAINSUP
We estimate the probability of a modifier word m and its head h to be in the relation r as Mb, m) = Cr(h, m)/(Zh, Em, CAM» 77%)), where cr(-) is the number of times that m depends on h in the dependency treebank £ and hi, m,- are all the head/modifier pairs observed in £.
Conclusion
BRAINSUP makes heavy use of dependency parsed data and statistics collected from dependency treebanks to ensure the grammaticality of the generated sentences, and to trim the search space while seeking the sentences that maximize the user satisfaction.
Evaluation
As discussed in Section 3 we use two different treebanks to learn the syntactic patterns (’P) and the dependency operators (£).
treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Swanson, Ben and Yamangil, Elif and Charniak, Eugene and Shieber, Stuart
Abstract
We perform parsing experiments the Penn Treebank and draw comparisons to Tree-Substitution Grammars and between different variations in probabilistic model design.
Experiments
As a proof of concept, we investigate OSTAG in the context of the classic Penn Treebank statistical parsing setup; training on section 2-21 and testing on section 23.
Experiments
Furthermore, the various parameteri-zations of adjunction with OSTAG indicate that, at least in the case of the Penn Treebank , the finer grained modeling of a full table of adjunction probabilities for each Goodman index OSTAG3 overcomes the danger of sparse data estimates.
Introduction
We evaluate OSTAG on the familiar task of parsing the Penn Treebank .
TAG and Variants
We propose a simple but empirically effective heuristic for grammar induction for our experiments on Penn Treebank data.
treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Experiments
Labeled English data employed in this paper were derived from the Wall Street Journal (WSJ) corpus of the Penn Treebank (Marcus et al., 1993).
Experiments
For labeled Chinese data, we used the version 5 .1 of the Penn Chinese Treebank (CTB) (Xue et al., 2005).
Experiments
In addition, we removed from the unlabeled English data the sentences that appear in the WSJ corpus of the Penn Treebank .
Introduction
On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank , our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser.
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Rappoport, Ari
A UCCA-Annotated Corpus
For instance, both the PTB and the Prague Dependency Treebank (Bo'hmova et al., 2003) employed annotators with extensive linguistic background.
Introduction
In fact, the annotations of (a) and (c) are identical under the most widely-used schemes for English, the Penn Treebank (PTB) (Marcus et al., 1993) and CoNLL-style dependencies (Surdeanu et al., 2008) (see Figure l).
Related Work
The most prominent annotation scheme in NLP for English syntax is the Penn Treebank .
Related Work
Examples include the Groningen Meaning bank (Basile et al., 2012), Treebank Semantics (Butler and Yoshi-moto, 2012) and the Lingo Redwoods treebank (Oepen et al., 2004).
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sogaard, Anders
Experiments
The first group comes from the English Web Treebank (EWT),4 also used in the Parsing the Web shared task (Petrov and McDonald, 2012).
Experiments
We train our tagger on Sections 2—21 of the WSJ data in the Penn-III Treebank (PTB), Ontonotes 4.0 release.
Experiments
Finally we do experiments with the Danish section of the Copenhagen Dependency Treebank (CDT).
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cirik, Volkan
Algorithm
Lastly, in step (i) of Figure 1, we run k-means clustering method on the S-CODE sphere and split word-substitute word pairs into 45 clusters because the treebank we worked on uses 45 part-of—speech tags.
Experiments
The experiments are conducted on Penn Treebank Wall Street Journal corpus.
Experiments
Because we are trying to improve (Yatbaz et al., 2012), we select the experiment on Penn Treebank Wall Street Journal corpus in that work as our baseline and replicate it.
Introduction
For instance,the gold tag perplexity of word “offers” in the Penn Treebank Wall Street Journal corpus we worked on equals to 1.966.
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim
Experimental Assessment
Performance evaluation is carried out on the Penn Treebank (Marcus et al., 1993) converted to Stanford basic dependencies (De Marneffe etal., 2006).
Introduction
This development is probably due to many factors, such as the increased availability of dependency treebanks and the perceived usefulness of dependency structures as an interface to downstream applications, but a very important reason is also the high efficiency offered by dependency parsers, enabling web-scale parsing with high throughput.
Introduction
While the classical approach limits training data to parser states that result from oracle predictions (derived from a treebank ), these novel approaches allow the classifier to explore states that result from its own (sometimes erroneous) predictions (Choi and Palmer, 2011; Goldberg and Nivre, 2012).
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Manshadi, Mehdi and Gildea, Daniel and Allen, James
Introduction
For example, Higgins and Sadock (2003) find fewer than 1000 sentences with two or more explicit quantifiers in the Wall Street journal section of Penn Treebank .
Introduction
Plurals form 18% of the NPs in our corpus and 20% of the nouns in Penn Treebank .
Introduction
Explicit universals, on the other hand, form less than 1% of the determiners in Penn Treebank .
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Experiments
We use the Penn Chinese Treebank 5.0 (CTB) (Xue et al., 2005) as the existing annotated corpus for Chinese word segmentation.
Introduction
Taking Chinese word segmentation for example, the state-of-the-art models (Xue and Shen, 2003; Ng and Low, 2004; Gao et al., 2005; Nakagawa and Uchimoto, 2007; Zhao and Kit, 2008; J iang et al., 2009; Zhang and Clark, 2010; Sun, 2011b; Li, 2011) are usually trained on human-annotated corpora such as the Penn Chinese Treebank (CTB) (Xue et al., 2005), and perform quite well on corresponding test sets.
Related Work
In parsing, Pereira and Schabes (1992) proposed an extended inside-outside algorithm that infers the parameters of a stochastic CFG from a partially parsed treebank .
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Biran, Or and McKeown, Kathleen
Abstract
We present a reformulation of the word pair features typically used for the task of disambiguating implicit relations in the Penn Discourse Treebank .
Other Features
Previous work has relied on features based on the gold parse trees of the Penn Treebank (which overlaps with PDTB) and on contextual information from relations preceding the one being disambiguated.
Related Work
More recently, implicit relation prediction has been evaluated on annotated implicit relations from the Penn Discourse Treebank (Prasad et al., 2008).
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bergen, Leon and Gibson, Edward and O'Donnell, Timothy J.
Results
We trained our model on sections 2—21 of the WSJ part of the Penn Treebank (Marcus et al., 1999).
Results
Unfortunately, marking for argument/modifiers in the Penn Treebank is incomplete, and is limited to certain adverbials, e.g.
Results
This corpus adds annotations indicating, for each node in the Penn Treebank , whether that node is a modifier.
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: