Index of papers in Proc. ACL 2014 that mention
  • treebank
Ma, Xuezhe and Xia, Fei
Abstract
We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages.
Data and Tools
Our experiments rely on two kinds of data sets: (i) Monolingual Treebanks with consistent annotation schema — English treebank is used to train the English parsing model, and the Treebanks for target languages are used to evaluate the parsing performance of our approach.
Data and Tools
The monolingual treebanks in our experiments are from the Google Universal Dependency Treebanks (McDonald et al., 2013), for the reason that the treebanks of different languages in Google Universal Dependency Treebanks have consistent syntactic representations.
Data and Tools
The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice.
Introduction
Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013).
Introduction
However, the manually annotated treebanks that these parsers rely on are highly expensive to create, in particular when we want to build treebanks for resource-poor languages.
Introduction
However, most bilingual text parsing approaches require bilingual treebanks — treebanks that have manually annotated tree structures on both sides of source and target languages (Smith and Smith, 2004; Burkett and Klein, 2008), or have tree structures on the source side and translated sentences in the target languages (Huang et
Our Approach
Table 1: Data statistics of two versions of Google Universal Treebanks for the target languages.
treebank is mentioned in 28 sentences in this paper.
Topics mentioned in this paper:
Zhu, Xiaodan and Guo, Hongyu and Mohammad, Saif and Kiritchenko, Svetlana
Abstract
We use a sentiment treebank to show that these existing heuristics are poor estimators of sentiment.
Experiment setup
Data As described earlier, the Stanford Sentiment Treebank (Socher et al., 2013) has manually annotated, real-valued sentiment values for all phrases in parse trees.
Experiment setup
We search these negators in the Stanford Sentiment Treebank and normalize the same negators to a single form; e.g., “is n’t”, “isn’t”, and “is not” are all normalized to “is_not”.
Experiment setup
Each occurrence of a negator and the phrase it is directly composed with in the treebank , i.e., (7,071,217), is considered a data point in our study.
Experimental results
Table 1: Mean absolute errors (MAE) of fitting different models to Stanford Sentiment Treebank .
Experimental results
The figure includes five most frequently used negators found in the sentiment treebank .
Experimental results
Below, we take a closer look at the fitting errors made at different depths of the sentiment treebank .
Introduction
Figure 1: Effect of a list of common negators in modifying sentiment values in Stanford Sentiment Treebank .
Introduction
Each dot in the figure corresponds to a text span being modified by (composed with) a negator in the treebank .
Introduction
The recently available Stanford Sentiment Treebank (Socher et al., 2013) renders manually annotated, real-valued sentiment scores for all phrases in parse trees.
treebank is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Abstract
We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled).
Experiments
The monolingual segmented data, trainTB, is extracted from the Penn Chinese Treebank (CTB-7) (Xue et al., 2005), containing 51,447 sentences.
Experiments
0 Supervised Monolingual Segmenter (SMS): this model is trained by CRFs on treebank training data (trainTB).
Introduction
The practice in state-of-the-art MT systems is that Chinese sentences are tokenized by a monolingual supervised word segmentation model trained on the hand-annotated treebank data, e.g., Chinese treebank
Introduction
But one outstanding problem is that these models may leave out some crucial segmentation features for SMT, since the output words conform to the treebank segmentation standard designed for monolingually linguistic intuition, rather than specific to the SMT task.
Introduction
Crucially, the GP expression with the bilingual knowledge is then used as side information to regularize a CRFs (conditional random fields) model’s learning over treebank and bitext data, based on the posterior regularization (PR) framework (Ganchev et al., 2010).
Methodology
The input data requires two types of training resources, segmented Chinese sentences from treebank ’ch and parallel unsegmented sentences of Chinese and foreign language “Di and D5.
Methodology
Algorithm 1 CWS model induction with bilingual constraints Require: Segmented Chinese sentences from treebank ’ch; Parallel sentences of Chinese and foreign
Methodology
As in conventional GP examples (Das and Smith, 2012), a similarity graph Q = (V, E) is constructed over N types extracted from Chinese training data, including treebank ’ch and bitexts “Di.
treebank is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh
Abstract
In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation.
Abstract
In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank .
Introduction
In recent years, many efforts have been made to annotate parallel corpora with syntactic structure to build parallel treebanks .
Introduction
A parallel treebank is a parallel corpus where the sentences in each language are syntactically (if necessary morphologically) annotated, and the sentences and words are aligned.
Introduction
In the parallel treebanks , the syntactic annotation usually follows constituent and/or dependency structure.
treebank is mentioned in 34 sentences in this paper.
Topics mentioned in this paper:
Skjaerholt, Arne
Introduction
However, most evaluations of syntactic treebanks use simple accuracy measures such as bracket F1 scores for constituent trees (NEGRA, Brants, 2000; TIGER, Brants and Hansen, 2002; Cat3LB, Civit et al., 2003; The Arabic Treebank, Maamouri et al., 2008) or labelled or unlabelled attachment scores for dependency syntax (PDT, Hajic, 2004; PCEDT Mikulova and 8tepanek, 2010; Norwegian Dependency Treebank , Skjaerholt, 2013).
Introduction
In grammar-driven treebanking (or parsebank-ing), the problems encountered are slightly different.
Introduction
In HPSG and LPG treebanking annotators do not annotate structure directly.
Real-world corpora
Three of the data sets are dependency treebanks
Real-world corpora
7We contacted a number of treebank projects, among them the Penn Treebank and the Prague Dependency Treebank , but not all of them had data available.
Real-world corpora
(NDT, CDT, PCEDT) and one phrase structure treebank (SSD), and of the dependency treebanks the PCEDT contains semantic dependencies, while the other two have traditional syntactic dependencies.
Synthetic experiments
An already annotated corpus, in our case 100 randomly selected sentences from the Norwegian Dependency Treebank (Solberg et al., 2014), are taken as correct and then permuted to produce “annotations” of different quality.
treebank is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie
Add arc <eC,ej> to GC with
We use the syntactic trees from the Penn Treebank to find the dominating nodes,.
Add arc <eC,ej> to GC with
But we think that MST algorithm has more potential in discourse dependency parsing, because our converted discourse dependency treebank contains only projective trees and somewhat suppresses the MST algorithm to exhibit its advantage of parsing non-projective trees.
Add arc <eC,ej> to GC with
In fact, we observe that some non-projective dependencies produced by the MST algorithm are even reasonable than what they are in the dependency treebank .
Discourse Dependency Structure and Tree Bank
Section 2 formally defines discourse dependency structure and introduces how to build a discourse dependency treebank from the existing RST corpus.
Discourse Dependency Structure and Tree Bank
2.2 Our Discourse Dependency Treebank
Discourse Dependency Structure and Tree Bank
To automatically conduct discourse dependency parsing, constructing a discourse dependency treebank is fundamental.
treebank is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Annotations
Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Annotations
Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank .
Annotations
Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank .
Features
Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Features
Because constituents in the treebank can be quite long, we bin our length features into 8 buckets, of
Introduction
Nai've context-free grammars, such as those embodied by standard treebank annotations, do not parse well because their symbols have too little context to constrain their syntactic behavior.
Introduction
Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects.
Other Languages
Historically, many annotation schemes for parsers have required language-specific engineering: for example, lexicalized parsers require a set of head rules and manually-annotated grammars require detailed analysis of the treebank itself (Klein and Manning, 2003).
Parsing Model
Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task.
Surface Feature Framework
Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages.
Surface Feature Framework
There are a great number of spans in a typical treebank ; extracting features for every possible combination of span and rule is prohibitive.
treebank is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Bengoetxea, Kepa and Agirre, Eneko and Nivre, Joakim and Zhang, Yue and Gojenola, Koldo
Abstract
We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank .
Abstract
In addition, we explore parser combinations, showing that the semantically enhanced parsers yield a small significant gain only on the more semantically oriented LTH treebank conversion.
Experimental Framework
3.1 Treebank conversions
Experimental Framework
PermZMalt1 performs a simple and direct conversion from the constituency-based PTB to a dependency treebank .
Experimental Framework
supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech.
Introduction
Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank .
Introduction
tion 3 describes the treebank conversions, parsers and semantic features.
Related work
The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing.
Related work
They demonstrated its effectiveness in dependency parsing experiments on the PTB and the Prague Dependency Treebank .
Results
Looking at table 2, we can say that the differences in baseline parser performance are accentuated when using the LTH treebank conversion, as ZPar clearly outperforms the other two parsers by more than 4 absolute points.
Results
We can also conclude that automatically acquired clusters are specially effective with the MST parser in both treebank conversions, which suggests that the type of semantic information has a direct relation to the parsing algorithm.
treebank is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Ji, Yangfeng and Eisenstein, Jacob
Abstract
The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank .
Experiments
We evaluate DPLP on the RST Discourse Treebank (Carlson et al., 2001), comparing against state-of-the-art results.
Experiments
Dataset The RST Discourse Treebank (RST-DT) consists of 385 documents, with 347 for train-
Implementation
We consider the values K E {30,60,90, 150}, A E {1,10,50, 100} and 7' E {1.0, 0.1, 0.01, 0.001}, and search over this space using a development set of thirty document randomly selected from within the RST Treebank training data.
Introduction
Unfortunately, the performance of discourse parsing is still relatively weak: the state-of-the-art F—measure for text-level relation detection in the RST Treebank is only slightly above 55% (Joty
Introduction
In addition, we show that the latent representation coheres well with the characterization of discourse connectives in the Penn Discourse Treebank (Prasad et al., 2008).
Model
(2010) show that there is a long tail of alternative lexicalizations for discourse relations in the Penn Discourse Treebank , posing obvious challenges for approaches based on directly matching lexical features observed in the training data.
Model
We apply transition-based (incremental) structured prediction to obtain a discourse parse, training a predictor to make the correct incremental moves to match the annotations of training data in the RST Treebank .
Related Work
(2009) in the context of the Penn Discourse Treebank (Prasad et al., 2008).
treebank is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Abstract
We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
Experiments
To compare with prior approaches that use semantic supervision for grammar induction, we utilize Section 23 of the WSJ portion of the Penn Treebank (Marcus et al., 1993).
Experiments
We contrast low-resource (D) and high-resource settings (E), where latter uses a treebank .
Experiments
We therefore turn to an analysis of other approaches to grammar induction in Table 8, evaluated on the Penn Treebank .
Introduction
However, richly annotated data such as that provided in parsing treebanks is expensive to produce, and may be tied to specific domains (e.g., newswire).
Related Work
(2012) observe that syntax may be treated as latent when a treebank is not available.
Related Work
(2011) require an oracle CCG tag dictionary extracted from a treebank .
Related Work
There has not yet been a comparison of techniques for SRL that do not rely on a syntactic treebank , and no exploration of probabilistic models for unsupervised grammar induction within an SRL pipeline that we have been able to find.
treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Candito, Marie and Constant, Matthieu
Abstract
In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Conclusion
We experimented strategies to predict both MWE analysis and dependency structure, and tested them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Data: MWEs in Dependency Trees
It contains projective dependency trees that were automatically derived from the latest status of the French Treebank (Abeille and Barrier, 2004), which consists of constituency trees for sentences from the
Data: MWEs in Dependency Trees
For instance, in the French Treebank , population active (lit.
Introduction
The French dataset is the only one containing MWEs: the French treebank has the particularity to contain a high ratio of tokens belonging to a MWE (12.7% of non numerical tokens).
Related work
Our representation also resembles that of light-verb constructions (LVC) in the hungarian dependency treebank (Vincze et al., 2010): the construction has regular syntax, and a suffix is used on labels to express it is a LVC (Vincze et al., 2013).
Use of external MWE resources
In order to compare the MWEs present in the lexicons and those encoded in the French treebank , we applied the following procedure (hereafter called lexicon
Use of external MWE resources
We had to convert the DELA POS tagset to that of the French Treebank .
treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kulick, Seth and Bies, Ann and Mott, Justin and Kroch, Anthony and Santorini, Beatrice and Liberman, Mark
Abstract
This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees.
Abstract
We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank .
Analysis of parsing results
The high coverage (%) reinforces the point that there is a limited number of core structures in the treebank .
Framework for analyzing parsing performance
1We refer only to the WSJ treebank portion of OntoNotes, which is roughly a subset of the Penn Treebank (Marcus et al., 1999) with annotation revisions including the addition of NML nodes.
Framework for analyzing parsing performance
We derived the regexes via an iterative process of inspection of tree decomposition on dataset (a), together with taking advantage of the treebanking experience from some of the coauthors.
Introduction
Second, we use a set of regular expressions (henceforth “regexes”) that categorize the possible structures in the treebank .
Introduction
After describing in more detail the basic framework, we show some aspects of the resulting analysis of the performance of the Berkeley parser (Petrov et al., 2008) on three datasets: (a) OntoNotes WSJ sections 2-21 (Weischedel et al., 2011)1, (b) OntoNotes WSJ section 22, and (c) the “Answers” section of the English Web Treebank (Bies et al., 2012).
treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Smith, Noah A.
Experiments
Our sentiment analysis datasets consist of movie reviews from the Stanford sentiment treebank (Socher et al., 2013),11 and floor speeches by US.
Experiments
Congressmen alongside “yea”/“nay” votes on the bill under discussion (Thomas et al., 2006).12 For the Stanford sentiment treebank , we only predict binary classifications (positive or negative) and exclude neutral reviews.
Structured Regularizers for Text
Figure 1: An example of a parse tree from the Stanford sentiment treebank , which annotates sentiment at the level of every constituent (indicated here by —|— and ++; no marking indicates neutral sentiment).
Structured Regularizers for Text
The Stanford sentiment treebank has an annotation of sentiments at the constituent level.
Structured Regularizers for Text
Figure 1 illustrates the group structures derived from an example sentence from the Stanford sentiment treebank (Socher et al., 2013).
treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Berg-Kirkpatrick, Taylor and Klein, Dan
Analyzing System Performance
6We replicated the Treebank for the 100,000 sentences pass.
Anatomy of a Dense GPU Parser
Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank .
Introduction
As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993).
Minimum Bayes risk parsing
Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank .
Minimum Bayes risk parsing
Therefore, in the fine pass, we normalize the inside scores at the leaves to sum to 1.0.5 Using this slight modification, no sentences from the Treebank under- or overflow.
Minimum Bayes risk parsing
We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank .
treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Du, Yantao and Kou, Xin and Ding, Shuoyang and Wan, Xiaojun
GB-grounded GR Extraction
structure treebank , namely CTB.
GB-grounded GR Extraction
Our treebank conversion algorithm borrows key insights from Lexical Functional Grammar (LFG; Bresnan and Kaplan, 1982; Dalrymple, 2001).
GB-grounded GR Extraction
There are two sources of errors in treebank conversion: (1) inadequate conversion rules and (2) wrong or inconsistent original annotations.
Introduction
To acquire high-quality GR corpus, we propose a linguistically-motivated algorithm to translate a Government and Binding (GB; Chomsky, 1981; Camie, 2007) grounded phrase structure treebank , i.e.
Introduction
Chinese Treebank (CTB; Xue et al., 2005) to a deep dependency bank where GRs are explicitly represented.
Transition-based GR Parsing
The availability of large-scale treebanks has contributed to the blossoming of statistical approaches to build accurate shallow constituency and dependency parsers.
treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Andreas, Jacob and Klein, Dan
Experimental setup
Experiments are conducted on the Wall Street Journal portion of the English Penn Treebank .
Experimental setup
We prepare three training sets: the complete training set of 39,832 sentences from the treebank (sections 2 through 21), a smaller training set, consisting of the first 3000 sentences, and an even smaller set of the first 300.
Results
test on the French treebank (the “French” column).
Three possible benefits of word embeddings
Example: the infrequently-occurring treebank tag UH dominates greetings (among other interjections).
Three possible benefits of word embeddings
Example: individual first names are also rare in the treebank , but tend to cluster together in distributional representations.
treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Duan, Manjuan and White, Michael
Abstract
Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make.
Analysis and Discussion
A limitation of the experiments reported in this paper is that OpenCCG’s input semantic dependency graphs are not the same as the Stanford dependencies used with the Treebank parsers, and thus we have had to rely on the gold parses in the PTB to derive gold dependencies for measuring accuracy of parser dependency recovery.
Introduction
With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model.
Simple Reranking
We ran two OpenCCG surface realization models on the CCGbank dev set (derived from Section 00 of the Penn Treebank ) and obtained n-best (n = 10) realizations.
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Le and Han, Xianpei
Introduction
In a syntactic tree, each node indicates a clause/phrase/word and is only labeled with a Treebank tag (Marcus et al., 1993).
Introduction
The Treebank tag, unfortunately, is usually too coarse or too general to capture semantic information.
Introduction
where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as:
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.
Abstract
3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04.
Abstract
For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing.
Abstract
For German and Chinese we use the Ne-gra treebank and the Chinese treebank respectively and the first 80% of the sentences are used for training and the last 20% for testing.
treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set, and Articles 271-300 were used
Experiment
To check whether more labeled data can further improve our parsing system, we evaluated our N0nlocal&Cluster system on the Chinese TreeBank version 6.0 (CTB6), which is a super set of CTB5 and contains more annotated data.
Transition-based Constituent Parsing
However, parse trees in Treebanks often contain an arbitrary number of branches.
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cortes, Corinna and Kuznetsov, Vitaly and Mohri, Mehryar
Experiments
5.4 Penn Treebank data set
Experiments
The Penn Treebank 2 data set is available through LDC license at http: / /WWW .
Experiments
edu/ ~treebank/ and contains 251,854 sentences with a total of 6,080,493 tokens and 45 different parts-of-speech.
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Monroe, Will and Green, Spence and Manning, Christopher D.
Error Analysis
We classify 7 as typos and 26 as annotation inconsistencies, although the distinction between the two is murky: typos are intentionally preserved in the treebank data, but segmentation of typos varies depending on how well they can be reconciled with standard Arabic orthography.
Error Analysis
The first example is segmented in the Egyptian treebank but is left unsegmented by our system; the second is left as a single token in the treebank but is split into the above three segments by our system.
Experiments
We train and evaluate on three corpora: parts 1—3 of the newswire Arabic Treebank (ATB),1 the Broadcast News Arabic Treebank (BN),2 and parts 1—8 of the BOLT Phase 1 Egyptian Arabic Treebank (ARZ).3 These correspond respectively to the domains in section 2.2.
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Abstract
Experimental results on the Chinese Treebank demonstrate improved performances over word-based parsing methods.
Character-Level Dependency Tree
We use the Chinese Penn Treebank 5 .0, 6.0 and 7.0 to conduct the experiments, splitting the corpora into training, development and test sets according to previous work.
Introduction
Their results on the Chinese Treebank (CTB) showed that character-level constituent parsing can bring increased performances even with the pseudo word structures.
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jansen, Peter and Surdeanu, Mihai and Clark, Peter
Experiments
Note that, because these domains are considerably different from the RST Treebank , the parser fails to produce a tree on a large number of answer candidates: 6.2% for YA, and 41.1% for Bio.
Related Work
RST Treebank
Related Work
performance on a small sample of seven WSJ articles drawn from the RST Treebank (Carlson et al., 2003).
treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: