Index of papers in Proc. ACL 2012 that mention

treebank

Seen in text as:

treebank (68)
Treebank (67)
treebanks (29)

Seen in 146 sentences in 17 papers.

1. Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions and Future Work	The authors are very grateful to Spence Green for his useful help on the treebank , and to Jennifer Thewis-sen for her careful proofreading.
Introduction	The grammar was trained with a reference treebank where MWEs were annotated with a specific nonterminal node.
Introduction	The experiments were carried out on the French Treebank (Abeille et al., 2003) where MWEs are annotated.
MWE-dedicated Features	In our collocation resource, each candidate collocation of the French treebank is associated with its internal syntactic structure and its association score (log-likelihood).
Multiword expressions	(2011) confirmed these bad results on the French Treebank .
Multiword expressions	They show a general tagging accuracy of 94% on the French Treebank .
Multiword expressions	To do so, the MWEs in the training treebank were annotated with specific nonterminal nodes.
Resources	The French Treebank is composed of 435,860 lexical units (34,178 types).
Resources	In order to compare compounds in these lexical resources with the ones in the French Treebank , we applied on the development corpus the dictionaries and the lexicon extracted from the training corpus.
Resources	The authors provided us with a list of 17,315 candidate nominal collocations occurring in the French treebank with their log-likelihood and their internal flat structure.
Two strategies, two discriminative models	The vector 6 is estimated during the training stage from a reference treebank and the baseline parser ouputs.

treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

Li, Zhenghua and Liu, Ting and Che, Wanxiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing.
Abstract	Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks .
Abstract	Our approach can significantly advance the state—of—the—art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5.1 and 6.0) using the Chinese Dependency Treebank as the source treebank .
Introduction	However, the heavy cost of treebanking typically limits one single treebank in both scale and genre.
Introduction	At present, learning from one single treebank seems inadequate for further boosting parsing accuracy.1
Introduction	Treebanks # of Words Grammar CTB5 0.51 million Phrase structure CTB6 0.78 million Phrase structure

treebank is mentioned in 62 sentences in this paper.

Topics mentioned in this paper:

3. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In Table l, we show the first four samples of length between 15 and 20 generated from our model and a 5- gram model trained on the Penn Treebank .
Experiments	For training data, we constructed a large treebank by concatenating the WSJ and Brown portions of the Penn Treebank , the 50K BLLIP training sentences from Post (2011), and the AFP and APW portions of English Gigaword version 3 (Graff, 2003), totaling about 1.3 billion tokens.
Experiments	We used the human-annotated parses for the sentences in the Penn Treebank , but parsed the Gigaword and BLLIP sentences with the Berkeley Parser.
Tree Transformations	Figure 2: A sample parse from the Penn Treebank after the tree transformations described in Section 3.
Tree Transformations	number of transformations of Treebank constituency parses that allow us to capture such dependencies.
Tree Transformations	Although the Penn Treebank annotates temporal N Ps, most off-the-shelf parsers do not retain these tags, and we do not assume their presence.
Treelet Language Modeling	There is one additional hurdle in the estimation of our model: while there exist corpora with human-annotated constituency parses like the Penn Treebank (Marcus et al., 1993), these corpora are quite small — on the order of millions of tokens — and we cannot gather nearly as many counts as we can for 77.-grams, for which billions or even trillions (Brants et al., 2007) of tokens are available on the Web.

treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Error Mining on Dependency Trees

Gardent, Claire and Narayan, Shashi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Using the Penn Treebank sentences associated with each SR Task dependency tree, we will create the two tree sets necessary to support error mining by dividing the set of trees output by the surface realiser into a set of trees (FAIL) associated with overgeneration (the generated sentences do not match the original sentences) and a set of trees (SUCCESS) associated with success (the generated sentence matches the original sentences).
Experiment and Results	The shallow input data provided by the SR Task was obtained from the Penn Treebank using the LTH Constituent—to—Dependency Conversion Tool for Penn—style Treebanks (Pennconverter, (J ohans—son and Nugues, 2007)).
Experiment and Results	The chunking was performed by retrieving from the Penn Treebank (PTB), for each phrase type, the yields of the constituents of that type and by using the alignment between words and dependency tree nodes provided by the organisers of the SR Task.
Experiment and Results	5 In the Penn Treebank , the POS tag is the category assigned to possessive ’s.
Related Work	(Callaway, 2003) avoids this shortcoming by converting the Penn Treebank to the format expected by his realiser.

treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

5. Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
Experiment	We ran experiments on the Wall Street Journal (WSJ) portion of the English Penn Treebank data set (Marcus et al., 1993), using a standard data split (sections 2—21 for training, 22 for development and 23 for testing).
Experiment	The treebank data is right-binarized (Matsuzaki et al., 2005) to construct grammars with only unary and binary productions.
Experiment	This result suggests that the conventional TSG model trained from the vanilla treebank is insufficient to resolve
Introduction	Probabilistic context-free grammar (PCFG) underlies many statistical parsers, however, it is well known that the PCFG rules extracted from treebank data Via maximum likelihood estimation do not perform well due to unrealistic context freedom assumptions (Klein and Manning, 2003).
Introduction	Symbol refinement is a successful approach for weakening context freedom assumptions by dividing coarse treebank symbols (e.g.
Introduction	Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.

treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On the other hand, consider the annotation guideline of English Treebank (Marcus et a1., 1993) instead.
Abstract	Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank .
Abstract	To explore determinacy in the distribution of POS tags in Penn Treebank , we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.

treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

7. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Class-based Model of Agreement	More than 25 treebanks (in 22 languages) can be automatically mapped to this tag set, which includes “Noun” (nominals), “Verb” (verbs), “Adj” (adjectives), and “ADP” (pre-and postpositions).
A Class-based Model of Agreement	Many of these treebanks also contain per-token morphological annotations.
A Class-based Model of Agreement	We trained a simple add-1 smoothed bigram language model over gold class sequences in the same treebank training data:
Conclusion and Outlook	The model can be implemented with a standard CRF package, trained on existing treebanks for many languages, and integrated easily with many MT feature APIs.
Experiments	Experimental Setup All experiments use the Penn Arabic Treebank (ATB) (Maamouri et al., 2004) parts 1—3 divided into training/dev/test sections according to the canonical split (Rambow et al., 2005).7

treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

8. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

About Heterogeneous Annotations	This paper focuses on two representative popular corpora for Chinese lexical processing: (1) the Penn Chinese Treebank (CTB) and (2) the PKU’s People’s Daily data (PPD).
Abstract	Penn Chinese Treebank (CTB) and PKU’s People’s Daily (PPD), on manually mapped data, and show that their linguistic annotations are systematically different and highly compatible.
Data-driven Annotation Conversion	A well known work is transforming Penn Treebank into resources for various deep linguistic processing, including LTAG (Xia, 1999), CCG (Hockenmaier and Steedman, 2007), HP SG (Miyao et al., 2004) and LFG (Cahill et al., 2002).
Introduction	For example, the Penn Treebank is popular to train PCFG-based parsers, while the Redwoods Treebank is well known for HP SG research; the Propbank is favored to build general semantic role labeling systems, while the FrameNet is attractive for predicate-specific labeling.
Introduction	Penn Chinese Treebank (CTB) and PKU’s People’s Daily (PPD).

treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. PDTB-style Discourse Annotation of Chinese Text

Zhou, Yuping and Xue, Nianwen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text.
Adapted scheme for Chinese	According to a rough count on 20 randomly selected files from Chinese Treebank (Xue et al., 2005), 82% are tokens of implicit relation, compared to 54.5% in the PDTB 2.0.
Annotation experiment	The data set consists of 98 files taken from the Chinese Treebank (Xue et al., 2005).
Introduction	In the realm of discourse annotation, the Penn Discourse TreeBank (PDTB) (Prasad et al., 2008) separates itself by adopting a lexically grounded approach: Discourse relations are lexically anchored by discourse connectives (e.g., because, but, therefore), which are viewed as predicates that take abstract objects such as propositions, events and states as their arguments.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

TreeBank (4)
lexicalized (3)

10. Higher-order Constituent Parsing and Parser Combination

Chen, Xiao and Kit, Chunyu

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	EXperiments on English and Chinese treebanks confirm its advantage over its first-order version.
Experiment	Our parsing models are evaluated on both English and Chinese treebanks, i.e., the WSJ section of Penn Treebank 3.0 (LDC99T42) and the Chinese Treebank 5.1 (LDC2005T01U01).
Experiment	For parser combination, we follow the setting of Fossum and Knight (2009), using Section 24 instead of Section 22 of WSJ treebank as development set.
Introduction	Evaluated on the PTB WSJ and Chinese Treebank , it achieves its best Fl scores of 91.86% and 85.58%, respectively.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations.
Introduction	We conduct experiments on the Penn Chinese Treebank and Chinese Gigaword.
State-of-the-Art	Their evaluations on the Chinese Treebank show that Chinese POS tagging obtains an accuracy of about 93-94%.
State-of-the-Art	Penn Chinese Treebank (CTB) (Xue et al., 2005) is a popular data set to evaluate a number of Chinese NLP tasks, including word segmentation (Sun and

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Text-level Discourse Parsing with Rich Linguistic Features

Feng, Vanessa Wei and Hirst, Graeme

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discourse-annotated corpora	2.1 The RST Discourse Treebank
Discourse-annotated corpora	The RST Discourse Treebank (RST-DT) (Carlson et al., 2001), is a corpus annotated in the framework of RST.
Discourse-annotated corpora	2.2 The Penn Discourse Treebank

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Learning Syntactic Verb Frames using Graphical Models

Lippincott, Thomas and Korhonen, Anna and Ó Séaghdha, Diarmuid

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	However, the treebanks necessary for training a high-accuracy parsing model are expensive to build for new domains.
Methodology	An unlexicalized parser cannot distinguish these based just on POS tags, while a lexicalized parser requires a large treebank .
Previous work	These typically rely on language-specific knowledge, either directly through heuristics, or indirectly through parsing models trained on treebanks .

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

14. Robust Conversion of CCG Derivations to Phrase Structure Trees

Kummerfeld, Jonathan K. and Klein, Dan and Curran, James R.

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Using sections 00-21 of the treebanks , we handcrafted instructions for 527 lexical categories, a process that took under 100 hours, and includes all the categories used by the C&C parser.
Evaluation	Figure 3: For each sentence in the treebank , we plot the converted parser output against gold conversion (left), and the original parser evaluation against gold conversion (right).
Introduction	Converting the Penn Treebank (PTB, Marcus et al., 1993) to other formalisms, such as HPSG (Miyao et al., 2004), LFG (Cahill et al., 2008), LTAG (Xia, 1999), and CCG (Hockenmaier, 2003), is a complex process that renders linguistic phenomena in formalism-specific ways.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (30)
subtrees (3)
Treebank (3)

15. Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing.
Introduction	We perform experiments using the Chinese Treebank (CTB) corpora, demonstrating that the accuracies of the three tasks can be improved significantly over the pipeline combination of the state-of-the-art joint segmentation and POS tagging model, and the dependency parser.
Model	’e use the Chinese Penn Treebank ver.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. Estimating Compact Yet Rich Tree Insertion Grammars

Yamangil, Elif and Shieber, Stuart

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.
Evaluation Results	We use the standard Penn treebank methodology of training on sections 2—21 and testing on section 23.
Evaluation Results	carried out a small treebank experiment where we train on Section 2, and a large one where we train on the full training set.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

parse tree (3)
treebank (3)

17. Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Chen, Wenliang and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For English, we used the Penn Treebank (Marcus et al., 1993) in our experiments.
Experiments	For Chinese, we used the Chinese Treebank (CTB) version 4.04 in the experiments.
Experiments	3 We ensured that the text used for extracting subtrees did not include the sentences of the Penn Treebank .

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: