Abstract | We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages. |
Data and Tools | Our experiments rely on two kinds of data sets: (i) Monolingual Treebanks with consistent annotation schema — English treebank is used to train the English parsing model, and the Treebanks for target languages are used to evaluate the parsing performance of our approach. |
Data and Tools | The monolingual treebanks in our experiments are from the Google Universal Dependency Treebanks (McDonald et al., 2013), for the reason that the treebanks of different languages in Google Universal Dependency Treebanks have consistent syntactic representations. |
Data and Tools | The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice. |
Introduction | Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013). |
Introduction | However, the manually annotated treebanks that these parsers rely on are highly expensive to create, in particular when we want to build treebanks for resource-poor languages. |
Introduction | However, most bilingual text parsing approaches require bilingual treebanks — treebanks that have manually annotated tree structures on both sides of source and target languages (Smith and Smith, 2004; Burkett and Klein, 2008), or have tree structures on the source side and translated sentences in the target languages (Huang et |
Our Approach | Table 1: Data statistics of two versions of Google Universal Treebanks for the target languages. |
Abstract | We use a sentiment treebank to show that these existing heuristics are poor estimators of sentiment. |
Experiment setup | Data As described earlier, the Stanford Sentiment Treebank (Socher et al., 2013) has manually annotated, real-valued sentiment values for all phrases in parse trees. |
Experiment setup | We search these negators in the Stanford Sentiment Treebank and normalize the same negators to a single form; e.g., “is n’t”, “isn’t”, and “is not” are all normalized to “is_not”. |
Experiment setup | Each occurrence of a negator and the phrase it is directly composed with in the treebank , i.e., (7,071,217), is considered a data point in our study. |
Experimental results | Table 1: Mean absolute errors (MAE) of fitting different models to Stanford Sentiment Treebank . |
Experimental results | The figure includes five most frequently used negators found in the sentiment treebank . |
Experimental results | Below, we take a closer look at the fitting errors made at different depths of the sentiment treebank . |
Introduction | Figure 1: Effect of a list of common negators in modifying sentiment values in Stanford Sentiment Treebank . |
Introduction | Each dot in the figure corresponds to a text span being modified by (composed with) a negator in the treebank . |
Introduction | The recently available Stanford Sentiment Treebank (Socher et al., 2013) renders manually annotated, real-valued sentiment scores for all phrases in parse trees. |
Abstract | We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled). |
Experiments | The monolingual segmented data, trainTB, is extracted from the Penn Chinese Treebank (CTB-7) (Xue et al., 2005), containing 51,447 sentences. |
Experiments | 0 Supervised Monolingual Segmenter (SMS): this model is trained by CRFs on treebank training data (trainTB). |
Introduction | The practice in state-of-the-art MT systems is that Chinese sentences are tokenized by a monolingual supervised word segmentation model trained on the hand-annotated treebank data, e.g., Chinese treebank |
Introduction | But one outstanding problem is that these models may leave out some crucial segmentation features for SMT, since the output words conform to the treebank segmentation standard designed for monolingually linguistic intuition, rather than specific to the SMT task. |
Introduction | Crucially, the GP expression with the bilingual knowledge is then used as side information to regularize a CRFs (conditional random fields) model’s learning over treebank and bitext data, based on the posterior regularization (PR) framework (Ganchev et al., 2010). |
Methodology | The input data requires two types of training resources, segmented Chinese sentences from treebank ’ch and parallel unsegmented sentences of Chinese and foreign language “Di and D5. |
Methodology | Algorithm 1 CWS model induction with bilingual constraints Require: Segmented Chinese sentences from treebank ’ch; Parallel sentences of Chinese and foreign |
Methodology | As in conventional GP examples (Das and Smith, 2012), a similarity graph Q = (V, E) is constructed over N types extracted from Chinese training data, including treebank ’ch and bitexts “Di. |
Abstract | In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. |
Abstract | In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank . |
Introduction | In recent years, many efforts have been made to annotate parallel corpora with syntactic structure to build parallel treebanks . |
Introduction | A parallel treebank is a parallel corpus where the sentences in each language are syntactically (if necessary morphologically) annotated, and the sentences and words are aligned. |
Introduction | In the parallel treebanks , the syntactic annotation usually follows constituent and/or dependency structure. |
Introduction | However, most evaluations of syntactic treebanks use simple accuracy measures such as bracket F1 scores for constituent trees (NEGRA, Brants, 2000; TIGER, Brants and Hansen, 2002; Cat3LB, Civit et al., 2003; The Arabic Treebank, Maamouri et al., 2008) or labelled or unlabelled attachment scores for dependency syntax (PDT, Hajic, 2004; PCEDT Mikulova and 8tepanek, 2010; Norwegian Dependency Treebank , Skjaerholt, 2013). |
Introduction | In grammar-driven treebanking (or parsebank-ing), the problems encountered are slightly different. |
Introduction | In HPSG and LPG treebanking annotators do not annotate structure directly. |
Real-world corpora | Three of the data sets are dependency treebanks |
Real-world corpora | 7We contacted a number of treebank projects, among them the Penn Treebank and the Prague Dependency Treebank , but not all of them had data available. |
Real-world corpora | (NDT, CDT, PCEDT) and one phrase structure treebank (SSD), and of the dependency treebanks the PCEDT contains semantic dependencies, while the other two have traditional syntactic dependencies. |
Synthetic experiments | An already annotated corpus, in our case 100 randomly selected sentences from the Norwegian Dependency Treebank (Solberg et al., 2014), are taken as correct and then permuted to produce “annotations” of different quality. |
Add arc <eC,ej> to GC with | We use the syntactic trees from the Penn Treebank to find the dominating nodes,. |
Add arc <eC,ej> to GC with | But we think that MST algorithm has more potential in discourse dependency parsing, because our converted discourse dependency treebank contains only projective trees and somewhat suppresses the MST algorithm to exhibit its advantage of parsing non-projective trees. |
Add arc <eC,ej> to GC with | In fact, we observe that some non-projective dependencies produced by the MST algorithm are even reasonable than what they are in the dependency treebank . |
Discourse Dependency Structure and Tree Bank | Section 2 formally defines discourse dependency structure and introduces how to build a discourse dependency treebank from the existing RST corpus. |
Discourse Dependency Structure and Tree Bank | 2.2 Our Discourse Dependency Treebank |
Discourse Dependency Structure and Tree Bank | To automatically conduct discourse dependency parsing, constructing a discourse dependency treebank is fundamental. |
Annotations | Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar. |
Annotations | Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank . |
Annotations | Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank . |
Features | Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set. |
Features | Because constituents in the treebank can be quite long, we bin our length features into 8 buckets, of |
Introduction | Nai've context-free grammars, such as those embodied by standard treebank annotations, do not parse well because their symbols have too little context to constrain their syntactic behavior. |
Introduction | Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects. |
Other Languages | Historically, many annotation schemes for parsers have required language-specific engineering: for example, lexicalized parsers require a set of head rules and manually-annotated grammars require detailed analysis of the treebank itself (Klein and Manning, 2003). |
Parsing Model | Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task. |
Surface Feature Framework | Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages. |
Surface Feature Framework | There are a great number of spans in a typical treebank ; extracting features for every possible combination of span and rule is prohibitive. |
Abstract | We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank . |
Abstract | In addition, we explore parser combinations, showing that the semantically enhanced parsers yield a small significant gain only on the more semantically oriented LTH treebank conversion. |
Experimental Framework | 3.1 Treebank conversions |
Experimental Framework | PermZMalt1 performs a simple and direct conversion from the constituency-based PTB to a dependency treebank . |
Experimental Framework | supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech. |
Introduction | Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank . |
Introduction | tion 3 describes the treebank conversions, parsers and semantic features. |
Related work | The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing. |
Related work | They demonstrated its effectiveness in dependency parsing experiments on the PTB and the Prague Dependency Treebank . |
Results | Looking at table 2, we can say that the differences in baseline parser performance are accentuated when using the LTH treebank conversion, as ZPar clearly outperforms the other two parsers by more than 4 absolute points. |
Results | We can also conclude that automatically acquired clusters are specially effective with the MST parser in both treebank conversions, which suggests that the type of semantic information has a direct relation to the parsing algorithm. |
Abstract | The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank . |
Experiments | We evaluate DPLP on the RST Discourse Treebank (Carlson et al., 2001), comparing against state-of-the-art results. |
Experiments | Dataset The RST Discourse Treebank (RST-DT) consists of 385 documents, with 347 for train- |
Implementation | We consider the values K E {30,60,90, 150}, A E {1,10,50, 100} and 7' E {1.0, 0.1, 0.01, 0.001}, and search over this space using a development set of thirty document randomly selected from within the RST Treebank training data. |
Introduction | Unfortunately, the performance of discourse parsing is still relatively weak: the state-of-the-art F—measure for text-level relation detection in the RST Treebank is only slightly above 55% (Joty |
Introduction | In addition, we show that the latent representation coheres well with the characterization of discourse connectives in the Penn Discourse Treebank (Prasad et al., 2008). |
Model | (2010) show that there is a long tail of alternative lexicalizations for discourse relations in the Penn Discourse Treebank , posing obvious challenges for approaches based on directly matching lexical features observed in the training data. |
Model | We apply transition-based (incremental) structured prediction to obtain a discourse parse, training a predictor to make the correct incremental moves to match the annotations of training data in the RST Treebank . |
Related Work | (2009) in the context of the Penn Discourse Treebank (Prasad et al., 2008). |
Abstract | We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL). |
Experiments | To compare with prior approaches that use semantic supervision for grammar induction, we utilize Section 23 of the WSJ portion of the Penn Treebank (Marcus et al., 1993). |
Experiments | We contrast low-resource (D) and high-resource settings (E), where latter uses a treebank . |
Experiments | We therefore turn to an analysis of other approaches to grammar induction in Table 8, evaluated on the Penn Treebank . |
Introduction | However, richly annotated data such as that provided in parsing treebanks is expensive to produce, and may be tied to specific domains (e.g., newswire). |
Related Work | (2012) observe that syntax may be treated as latent when a treebank is not available. |
Related Work | (2011) require an oracle CCG tag dictionary extracted from a treebank . |
Related Work | There has not yet been a comparison of techniques for SRL that do not rely on a syntactic treebank , and no exploration of probabilistic models for unsupervised grammar induction within an SRL pipeline that we have been able to find. |
Abstract | In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013). |
Conclusion | We experimented strategies to predict both MWE analysis and dependency structure, and tested them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013). |
Data: MWEs in Dependency Trees | It contains projective dependency trees that were automatically derived from the latest status of the French Treebank (Abeille and Barrier, 2004), which consists of constituency trees for sentences from the |
Data: MWEs in Dependency Trees | For instance, in the French Treebank , population active (lit. |
Introduction | The French dataset is the only one containing MWEs: the French treebank has the particularity to contain a high ratio of tokens belonging to a MWE (12.7% of non numerical tokens). |
Related work | Our representation also resembles that of light-verb constructions (LVC) in the hungarian dependency treebank (Vincze et al., 2010): the construction has regular syntax, and a suffix is used on labels to express it is a LVC (Vincze et al., 2013). |
Use of external MWE resources | In order to compare the MWEs present in the lexicons and those encoded in the French treebank , we applied the following procedure (hereafter called lexicon |
Use of external MWE resources | We had to convert the DELA POS tagset to that of the French Treebank . |
Abstract | This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees. |
Abstract | We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank . |
Analysis of parsing results | The high coverage (%) reinforces the point that there is a limited number of core structures in the treebank . |
Framework for analyzing parsing performance | 1We refer only to the WSJ treebank portion of OntoNotes, which is roughly a subset of the Penn Treebank (Marcus et al., 1999) with annotation revisions including the addition of NML nodes. |
Framework for analyzing parsing performance | We derived the regexes via an iterative process of inspection of tree decomposition on dataset (a), together with taking advantage of the treebanking experience from some of the coauthors. |
Introduction | Second, we use a set of regular expressions (henceforth “regexes”) that categorize the possible structures in the treebank . |
Introduction | After describing in more detail the basic framework, we show some aspects of the resulting analysis of the performance of the Berkeley parser (Petrov et al., 2008) on three datasets: (a) OntoNotes WSJ sections 2-21 (Weischedel et al., 2011)1, (b) OntoNotes WSJ section 22, and (c) the “Answers” section of the English Web Treebank (Bies et al., 2012). |
Experiments | Our sentiment analysis datasets consist of movie reviews from the Stanford sentiment treebank (Socher et al., 2013),11 and floor speeches by US. |
Experiments | Congressmen alongside “yea”/“nay” votes on the bill under discussion (Thomas et al., 2006).12 For the Stanford sentiment treebank , we only predict binary classifications (positive or negative) and exclude neutral reviews. |
Structured Regularizers for Text | Figure 1: An example of a parse tree from the Stanford sentiment treebank , which annotates sentiment at the level of every constituent (indicated here by —|— and ++; no marking indicates neutral sentiment). |
Structured Regularizers for Text | The Stanford sentiment treebank has an annotation of sentiments at the constituent level. |
Structured Regularizers for Text | Figure 1 illustrates the group structures derived from an example sentence from the Stanford sentiment treebank (Socher et al., 2013). |
Analyzing System Performance | 6We replicated the Treebank for the 100,000 sentences pass. |
Anatomy of a Dense GPU Parser | Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank . |
Introduction | As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993). |
Minimum Bayes risk parsing | Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank . |
Minimum Bayes risk parsing | Therefore, in the fine pass, we normalize the inside scores at the leaves to sum to 1.0.5 Using this slight modification, no sentences from the Treebank under- or overflow. |
Minimum Bayes risk parsing | We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank . |
GB-grounded GR Extraction | structure treebank , namely CTB. |
GB-grounded GR Extraction | Our treebank conversion algorithm borrows key insights from Lexical Functional Grammar (LFG; Bresnan and Kaplan, 1982; Dalrymple, 2001). |
GB-grounded GR Extraction | There are two sources of errors in treebank conversion: (1) inadequate conversion rules and (2) wrong or inconsistent original annotations. |
Introduction | To acquire high-quality GR corpus, we propose a linguistically-motivated algorithm to translate a Government and Binding (GB; Chomsky, 1981; Camie, 2007) grounded phrase structure treebank , i.e. |
Introduction | Chinese Treebank (CTB; Xue et al., 2005) to a deep dependency bank where GRs are explicitly represented. |
Transition-based GR Parsing | The availability of large-scale treebanks has contributed to the blossoming of statistical approaches to build accurate shallow constituency and dependency parsers. |
Experimental setup | Experiments are conducted on the Wall Street Journal portion of the English Penn Treebank . |
Experimental setup | We prepare three training sets: the complete training set of 39,832 sentences from the treebank (sections 2 through 21), a smaller training set, consisting of the first 3000 sentences, and an even smaller set of the first 300. |
Results | test on the French treebank (the “French” column). |
Three possible benefits of word embeddings | Example: the infrequently-occurring treebank tag UH dominates greetings (among other interjections). |
Three possible benefits of word embeddings | Example: individual first names are also rare in the treebank , but tend to cluster together in distributional representations. |
Abstract | Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make. |
Analysis and Discussion | A limitation of the experiments reported in this paper is that OpenCCG’s input semantic dependency graphs are not the same as the Stanford dependencies used with the Treebank parsers, and thus we have had to rely on the gold parses in the PTB to derive gold dependencies for measuring accuracy of parser dependency recovery. |
Introduction | With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model. |
Simple Reranking | We ran two OpenCCG surface realization models on the CCGbank dev set (derived from Section 00 of the Penn Treebank ) and obtained n-best (n = 10) realizations. |
Introduction | In a syntactic tree, each node indicates a clause/phrase/word and is only labeled with a Treebank tag (Marcus et al., 1993). |
Introduction | The Treebank tag, unfortunately, is usually too coarse or too general to capture semantic information. |
Introduction | where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as: |
Abstract | 3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04. |
Abstract | For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing. |
Abstract | For German and Chinese we use the Ne-gra treebank and the Chinese treebank respectively and the first 80% of the sentences are used for training and the last 20% for testing. |
Experiment | We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set, and Articles 271-300 were used |
Experiment | To check whether more labeled data can further improve our parsing system, we evaluated our N0nlocal&Cluster system on the Chinese TreeBank version 6.0 (CTB6), which is a super set of CTB5 and contains more annotated data. |
Transition-based Constituent Parsing | However, parse trees in Treebanks often contain an arbitrary number of branches. |
Experiments | 5.4 Penn Treebank data set |
Experiments | The Penn Treebank 2 data set is available through LDC license at http: / /WWW . |
Experiments | edu/ ~treebank/ and contains 251,854 sentences with a total of 6,080,493 tokens and 45 different parts-of-speech. |
Error Analysis | We classify 7 as typos and 26 as annotation inconsistencies, although the distinction between the two is murky: typos are intentionally preserved in the treebank data, but segmentation of typos varies depending on how well they can be reconciled with standard Arabic orthography. |
Error Analysis | The first example is segmented in the Egyptian treebank but is left unsegmented by our system; the second is left as a single token in the treebank but is split into the above three segments by our system. |
Experiments | We train and evaluate on three corpora: parts 1—3 of the newswire Arabic Treebank (ATB),1 the Broadcast News Arabic Treebank (BN),2 and parts 1—8 of the BOLT Phase 1 Egyptian Arabic Treebank (ARZ).3 These correspond respectively to the domains in section 2.2. |
Abstract | Experimental results on the Chinese Treebank demonstrate improved performances over word-based parsing methods. |
Character-Level Dependency Tree | We use the Chinese Penn Treebank 5 .0, 6.0 and 7.0 to conduct the experiments, splitting the corpora into training, development and test sets according to previous work. |
Introduction | Their results on the Chinese Treebank (CTB) showed that character-level constituent parsing can bring increased performances even with the pseudo word structures. |
Experiments | Note that, because these domains are considerably different from the RST Treebank , the parser fails to produce a tree on a large number of answer candidates: 6.2% for YA, and 41.1% for Bio. |
Related Work | RST Treebank |
Related Work | performance on a small sample of seven WSJ articles drawn from the RST Treebank (Carlson et al., 2003). |