Index of papers in Proc. ACL 2014 that mention

treebank

Seen in text as:

treebank (105)
Treebank (98)
treebanks (36)
Treebanks (19)
treebanking (4)

Seen in 233 sentences in 24 papers.

1. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

Ma, Xuezhe and Xia, Fei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages.
Data and Tools	Our experiments rely on two kinds of data sets: (i) Monolingual Treebanks with consistent annotation schema — English treebank is used to train the English parsing model, and the Treebanks for target languages are used to evaluate the parsing performance of our approach.
Data and Tools	The monolingual treebanks in our experiments are from the Google Universal Dependency Treebanks (McDonald et al., 2013), for the reason that the treebanks of different languages in Google Universal Dependency Treebanks have consistent syntactic representations.
Data and Tools	The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice.
Introduction	Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013).
Introduction	However, the manually annotated treebanks that these parsers rely on are highly expensive to create, in particular when we want to build treebanks for resource-poor languages.
Introduction	However, most bilingual text parsing approaches require bilingual treebanks — treebanks that have manually annotated tree structures on both sides of source and target languages (Smith and Smith, 2004; Burkett and Klein, 2008), or have tree structures on the source side and translated sentences in the target languages (Huang et
Our Approach	Table 1: Data statistics of two versions of Google Universal Treebanks for the target languages.

treebank is mentioned in 28 sentences in this paper.

Topics mentioned in this paper:

2. An Empirical Study on the Effect of Negation Words on Sentiment

Zhu, Xiaodan and Guo, Hongyu and Mohammad, Saif and Kiritchenko, Svetlana

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use a sentiment treebank to show that these existing heuristics are poor estimators of sentiment.
Experiment setup	Data As described earlier, the Stanford Sentiment Treebank (Socher et al., 2013) has manually annotated, real-valued sentiment values for all phrases in parse trees.
Experiment setup	We search these negators in the Stanford Sentiment Treebank and normalize the same negators to a single form; e.g., “is n’t”, “isn’t”, and “is not” are all normalized to “is_not”.
Experiment setup	Each occurrence of a negator and the phrase it is directly composed with in the treebank , i.e., (7,071,217), is considered a data point in our study.
Experimental results	Table 1: Mean absolute errors (MAE) of fitting different models to Stanford Sentiment Treebank .
Experimental results	The figure includes five most frequently used negators found in the sentiment treebank .
Experimental results	Below, we take a closer look at the fitting errors made at different depths of the sentiment treebank .
Introduction	Figure 1: Effect of a list of common negators in modifying sentiment values in Stanford Sentiment Treebank .
Introduction	Each dot in the figure corresponds to a text span being modified by (composed with) a negator in the treebank .
Introduction	The recently available Stanford Sentiment Treebank (Socher et al., 2013) renders manually annotated, real-valued sentiment scores for all phrases in parse trees.

treebank is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

3. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled).
Experiments	The monolingual segmented data, trainTB, is extracted from the Penn Chinese Treebank (CTB-7) (Xue et al., 2005), containing 51,447 sentences.
Experiments	0 Supervised Monolingual Segmenter (SMS): this model is trained by CRFs on treebank training data (trainTB).
Introduction	The practice in state-of-the-art MT systems is that Chinese sentences are tokenized by a monolingual supervised word segmentation model trained on the hand-annotated treebank data, e.g., Chinese treebank
Introduction	But one outstanding problem is that these models may leave out some crucial segmentation features for SMT, since the output words conform to the treebank segmentation standard designed for monolingually linguistic intuition, rather than specific to the SMT task.
Introduction	Crucially, the GP expression with the bilingual knowledge is then used as side information to regularize a CRFs (conditional random fields) model’s learning over treebank and bitext data, based on the posterior regularization (PR) framework (Ganchev et al., 2010).
Methodology	The input data requires two types of training resources, segmented Chinese sentences from treebank ’ch and parallel unsegmented sentences of Chinese and foreign language “Di and D5.
Methodology	Algorithm 1 CWS model induction with bilingual constraints Require: Segmented Chinese sentences from treebank ’ch; Parallel sentences of Chinese and foreign
Methodology	As in conventional GP examples (Das and Smith, 2012), a similarity graph Q = (V, E) is constructed over N types extracted from Chinese training data, including treebank ’ch and bitexts “Di.

treebank is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

4. Constructing a Turkish-English Parallel TreeBank

Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation.
Abstract	In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank .
Introduction	In recent years, many efforts have been made to annotate parallel corpora with syntactic structure to build parallel treebanks .
Introduction	A parallel treebank is a parallel corpus where the sentences in each language are syntactically (if necessary morphologically) annotated, and the sentences and words are aligned.
Introduction	In the parallel treebanks , the syntactic annotation usually follows constituent and/or dependency structure.

treebank is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

5. A chance-corrected measure of inter-annotator agreement for syntax

Skjaerholt, Arne

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	However, most evaluations of syntactic treebanks use simple accuracy measures such as bracket F1 scores for constituent trees (NEGRA, Brants, 2000; TIGER, Brants and Hansen, 2002; Cat3LB, Civit et al., 2003; The Arabic Treebank, Maamouri et al., 2008) or labelled or unlabelled attachment scores for dependency syntax (PDT, Hajic, 2004; PCEDT Mikulova and 8tepanek, 2010; Norwegian Dependency Treebank , Skjaerholt, 2013).
Introduction	In grammar-driven treebanking (or parsebank-ing), the problems encountered are slightly different.
Introduction	In HPSG and LPG treebanking annotators do not annotate structure directly.
Real-world corpora	Three of the data sets are dependency treebanks
Real-world corpora	7We contacted a number of treebank projects, among them the Penn Treebank and the Prague Dependency Treebank , but not all of them had data available.
Real-world corpora	(NDT, CDT, PCEDT) and one phrase structure treebank (SSD), and of the dependency treebanks the PCEDT contains semantic dependencies, while the other two have traditional syntactic dependencies.
Synthetic experiments	An already annotated corpus, in our case 100 randomly selected sentences from the Norwegian Dependency Treebank (Solberg et al., 2014), are taken as correct and then permuted to produce “annotations” of different quality.

treebank is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

6. Text-level Discourse Dependency Parsing

Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Add arc <eC,ej> to GC with	We use the syntactic trees from the Penn Treebank to find the dominating nodes,.
Add arc <eC,ej> to GC with	But we think that MST algorithm has more potential in discourse dependency parsing, because our converted discourse dependency treebank contains only projective trees and somewhat suppresses the MST algorithm to exhibit its advantage of parsing non-projective trees.
Add arc <eC,ej> to GC with	In fact, we observe that some non-projective dependencies produced by the MST algorithm are even reasonable than what they are in the dependency treebank .
Discourse Dependency Structure and Tree Bank	Section 2 formally defines discourse dependency structure and introduces how to build a discourse dependency treebank from the existing RST corpus.
Discourse Dependency Structure and Tree Bank	2.2 Our Discourse Dependency Treebank
Discourse Dependency Structure and Tree Bank	To automatically conduct discourse dependency parsing, constructing a discourse dependency treebank is fundamental.

treebank is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

7. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Annotations	Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Annotations	Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank .
Annotations	Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank .
Features	Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Features	Because constituents in the treebank can be quite long, we bin our length features into 8 buckets, of
Introduction	Nai've context-free grammars, such as those embodied by standard treebank annotations, do not parse well because their symbols have too little context to constrain their syntactic behavior.
Introduction	Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects.
Other Languages	Historically, many annotation schemes for parsers have required language-specific engineering: for example, lexicalized parsers require a set of head rules and manually-annotated grammars require detailed analysis of the treebank itself (Klein and Manning, 2003).
Parsing Model	Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task.
Surface Feature Framework	Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages.
Surface Feature Framework	There are a great number of spans in a typical treebank ; extracting features for every possible combination of span and rule is prohibitive.

treebank is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

8. On WordNet Semantic Classes and Dependency Parsing

Bengoetxea, Kepa and Agirre, Eneko and Nivre, Joakim and Zhang, Yue and Gojenola, Koldo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank .
Abstract	In addition, we explore parser combinations, showing that the semantically enhanced parsers yield a small significant gain only on the more semantically oriented LTH treebank conversion.
Experimental Framework	3.1 Treebank conversions
Experimental Framework	PermZMalt1 performs a simple and direct conversion from the constituency-based PTB to a dependency treebank .
Experimental Framework	supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech.
Introduction	Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank .
Introduction	tion 3 describes the treebank conversions, parsers and semantic features.
Related work	The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing.
Related work	They demonstrated its effectiveness in dependency parsing experiments on the PTB and the Prague Dependency Treebank .
Results	Looking at table 2, we can say that the differences in baseline parser performance are accentuated when using the LTH treebank conversion, as ZPar clearly outperforms the other two parsers by more than 4 absolute points.
Results	We can also conclude that automatically acquired clusters are specially effective with the MST parser in both treebank conversions, which suggests that the type of semantic information has a direct relation to the parsing algorithm.

treebank is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

9. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank .
Experiments	We evaluate DPLP on the RST Discourse Treebank (Carlson et al., 2001), comparing against state-of-the-art results.
Experiments	Dataset The RST Discourse Treebank (RST-DT) consists of 385 documents, with 347 for train-
Implementation	We consider the values K E {30,60,90, 150}, A E {1,10,50, 100} and 7' E {1.0, 0.1, 0.01, 0.001}, and search over this space using a development set of thirty document randomly selected from within the RST Treebank training data.
Introduction	Unfortunately, the performance of discourse parsing is still relatively weak: the state-of-the-art F—measure for text-level relation detection in the RST Treebank is only slightly above 55% (Joty
Introduction	In addition, we show that the latent representation coheres well with the characterization of discourse connectives in the Penn Discourse Treebank (Prasad et al., 2008).
Model	(2010) show that there is a long tail of alternative lexicalizations for discourse relations in the Penn Discourse Treebank , posing obvious challenges for approaches based on directly matching lexical features observed in the training data.
Model	We apply transition-based (incremental) structured prediction to obtain a discourse parse, training a predictor to make the correct incremental moves to match the annotations of training data in the RST Treebank .
Related Work	(2009) in the context of the Penn Discourse Treebank (Prasad et al., 2008).

treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

10. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
Experiments	To compare with prior approaches that use semantic supervision for grammar induction, we utilize Section 23 of the WSJ portion of the Penn Treebank (Marcus et al., 1993).
Experiments	We contrast low-resource (D) and high-resource settings (E), where latter uses a treebank .
Experiments	We therefore turn to an analysis of other approaches to grammar induction in Table 8, evaluated on the Penn Treebank .
Introduction	However, richly annotated data such as that provided in parsing treebanks is expensive to produce, and may be tied to specific domains (e.g., newswire).
Related Work	(2012) observe that syntax may be treated as latent when a treebank is not available.
Related Work	(2011) require an oracle CCG tag dictionary extracted from a treebank .
Related Work	There has not yet been a comparison of techniques for SRL that do not rely on a syntactic treebank , and no exploration of probabilistic models for unsupervised grammar induction within an SRL pipeline that we have been able to find.

treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing

Candito, Marie and Constant, Matthieu

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Conclusion	We experimented strategies to predict both MWE analysis and dependency structure, and tested them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Data: MWEs in Dependency Trees	It contains projective dependency trees that were automatically derived from the latest status of the French Treebank (Abeille and Barrier, 2004), which consists of constituency trees for sentences from the
Data: MWEs in Dependency Trees	For instance, in the French Treebank , population active (lit.
Introduction	The French dataset is the only one containing MWEs: the French treebank has the particularity to contain a high ratio of tokens belonging to a MWE (12.7% of non numerical tokens).
Related work	Our representation also resembles that of light-verb constructions (LVC) in the hungarian dependency treebank (Vincze et al., 2010): the construction has regular syntax, and a suffix is used on labels to express it is a LVC (Vincze et al., 2013).
Use of external MWE resources	In order to compare the MWEs present in the lexicons and those encoded in the French treebank , we applied the following procedure (hereafter called lexicon
Use of external MWE resources	We had to convert the DELA POS tagset to that of the French Treebank .

treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

12. Parser Evaluation Using Derivation Trees: A Complement to evalb

Kulick, Seth and Bies, Ann and Mott, Justin and Kroch, Anthony and Santorini, Beatrice and Liberman, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees.
Abstract	We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank .
Analysis of parsing results	The high coverage (%) reinforces the point that there is a limited number of core structures in the treebank .
Framework for analyzing parsing performance	1We refer only to the WSJ treebank portion of OntoNotes, which is roughly a subset of the Penn Treebank (Marcus et al., 1999) with annotation revisions including the addition of NML nodes.
Framework for analyzing parsing performance	We derived the regexes via an iterative process of inspection of tree decomposition on dataset (a), together with taking advantage of the treebanking experience from some of the coauthors.
Introduction	Second, we use a set of regular expressions (henceforth “regexes”) that categorize the possible structures in the treebank .
Introduction	After describing in more detail the basic framework, we show some aspects of the resulting analysis of the performance of the Berkeley parser (Petrov et al., 2008) on three datasets: (a) OntoNotes WSJ sections 2-21 (Weischedel et al., 2011)1, (b) OntoNotes WSJ section 22, and (c) the “Answers” section of the English Web Treebank (Bies et al., 2012).

treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Linguistic Structured Sparsity in Text Categorization

Yogatama, Dani and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Our sentiment analysis datasets consist of movie reviews from the Stanford sentiment treebank (Socher et al., 2013),11 and floor speeches by US.
Experiments	Congressmen alongside “yea”/“nay” votes on the bill under discussion (Thomas et al., 2006).12 For the Stanford sentiment treebank , we only predict binary classifications (positive or negative) and exclude neutral reviews.
Structured Regularizers for Text	Figure 1: An example of a parse tree from the Stanford sentiment treebank , which annotates sentiment at the level of every constituent (indicated here by —\|— and ++; no marking indicates neutral sentiment).
Structured Regularizers for Text	The Stanford sentiment treebank has an annotation of sentiments at the constituent level.
Structured Regularizers for Text	Figure 1 illustrates the group structures derived from an example sentence from the Stanford sentiment treebank (Socher et al., 2013).

treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

14. Sparser, Better, Faster GPU Parsing

Hall, David and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analyzing System Performance	6We replicated the Treebank for the 100,000 sentences pass.
Anatomy of a Dense GPU Parser	Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank .
Introduction	As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993).
Minimum Bayes risk parsing	Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank .
Minimum Bayes risk parsing	Therefore, in the fine pass, we normalize the inside scores at the leaves to sum to 1.0.5 Using this slight modification, no sentences from the Treebank under- or overflow.
Minimum Bayes risk parsing	We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank .

treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Grammatical Relations in Chinese: GB-Ground Extraction and Data-Driven Parsing

Sun, Weiwei and Du, Yantao and Kou, Xin and Ding, Shuoyang and Wan, Xiaojun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

GB-grounded GR Extraction	structure treebank , namely CTB.
GB-grounded GR Extraction	Our treebank conversion algorithm borrows key insights from Lexical Functional Grammar (LFG; Bresnan and Kaplan, 1982; Dalrymple, 2001).
GB-grounded GR Extraction	There are two sources of errors in treebank conversion: (1) inadequate conversion rules and (2) wrong or inconsistent original annotations.
Introduction	To acquire high-quality GR corpus, we propose a linguistically-motivated algorithm to translate a Government and Binding (GB; Chomsky, 1981; Camie, 2007) grounded phrase structure treebank , i.e.
Introduction	Chinese Treebank (CTB; Xue et al., 2005) to a deep dependency bank where GRs are explicitly represented.
Transition-based GR Parsing	The availability of large-scale treebanks has contributed to the blossoming of statistical approaches to build accurate shallow constituency and dependency parsers.

treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. How much do word embeddings encode about syntax?

Andreas, Jacob and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	Experiments are conducted on the Wall Street Journal portion of the English Penn Treebank .
Experimental setup	We prepare three training sets: the complete training set of 39,832 sentences from the treebank (sections 2 through 21), a smaller training set, consisting of the first 3000 sentences, and an even smaller set of the first 300.
Results	test on the French treebank (the “French” column).
Three possible benefits of word embeddings	Example: the infrequently-occurring treebank tag UH dominates greetings (among other interjections).
Three possible benefits of word embeddings	Example: individual first names are also rare in the treebank , but tend to cluster together in distributional representations.

treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

17. That's Not What I Meant! Using Parsers to Avoid Structural Ambiguities in Generated Text

Duan, Manjuan and White, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make.
Analysis and Discussion	A limitation of the experiments reported in this paper is that OpenCCG’s input semantic dependency graphs are not the same as the Stanford dependencies used with the Treebank parsers, and thus we have had to rely on the gold parses in the PTB to derive gold dependencies for measuring accuracy of parser dependency recovery.
Introduction	With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model.
Simple Reranking	We ran two OpenCCG surface realization models on the CCGbank dev set (derived from Section 00 of the Penn Treebank ) and obtained n-best (n = 10) realizations.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

perceptron (29)
SVM (23)
BLEU (20)

18. A Feature-Enriched Tree Kernel for Relation Extraction

Sun, Le and Han, Xianpei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In a syntactic tree, each node indicates a clause/phrase/word and is only labeled with a Treebank tag (Marcus et al., 1993).
Introduction	The Treebank tag, unfortunately, is usually too coarse or too general to capture semantic information.
Introduction	where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as:

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04.
Abstract	For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing.
Abstract	For German and Chinese we use the Ne-gra treebank and the Chinese treebank respectively and the first 80% of the sentences are used for training and the last 20% for testing.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

20. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set, and Articles 271-300 were used
Experiment	To check whether more labeled data can further improve our parsing system, we evaluated our N0nlocal&Cluster system on the Chinese TreeBank version 6.0 (CTB6), which is a super set of CTB5 and contains more annotated data.
Transition-based Constituent Parsing	However, parse trees in Treebanks often contain an arbitrary number of branches.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Learning Ensembles of Structured Prediction Rules

Cortes, Corinna and Kuznetsov, Vitaly and Mohri, Mehryar

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	5.4 Penn Treebank data set
Experiments	The Penn Treebank 2 data set is available through LDC license at http: / /WWW .
Experiments	edu/ ~treebank/ and contains 251,854 sentences with a total of 6,080,493 tokens and 45 different parts-of-speech.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. Word Segmentation of Informal Arabic with Domain Adaptation

Monroe, Will and Green, Spence and Manning, Christopher D.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Analysis	We classify 7 as typos and 26 as annotation inconsistencies, although the distinction between the two is murky: typos are intentionally preserved in the treebank data, but segmentation of typos varies depending on how well they can be reconciled with standard Arabic orthography.
Error Analysis	The first example is segmented in the Egyptian treebank but is left unsegmented by our system; the second is left as a single token in the treebank but is split into the above three segments by our system.
Experiments	We train and evaluate on three corpora: parts 1—3 of the newswire Arabic Treebank (ATB),1 the Broadcast News Arabic Treebank (BN),2 and parts 1—8 of the BOLT Phase 1 Egyptian Arabic Treebank (ARZ).3 These correspond respectively to the domains in section 2.2.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Character-Level Chinese Dependency Parsing

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results on the Chinese Treebank demonstrate improved performances over word-based parsing methods.
Character-Level Dependency Tree	We use the Chinese Penn Treebank 5 .0, 6.0 and 7.0 to conduct the experiments, splitting the corpora into training, development and test sets according to previous work.
Introduction	Their results on the Chinese Treebank (CTB) showed that character-level constituent parsing can bring increased performances even with the pseudo word structures.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

24. Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

Jansen, Peter and Surdeanu, Mihai and Clark, Peter

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Note that, because these domains are considerably different from the RST Treebank , the parser fails to produce a tree on a large number of answer candidates: 6.2% for YA, and 41.1% for Bio.
Related Work	RST Treebank
Related Work	performance on a small sample of seven WSJ articles drawn from the RST Treebank (Carlson et al., 2003).

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: