Dependency-based Pre-ordering Rule Set | Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same |
Dependency-based Pre-ordering Rule Set | As shown in the figure, the number of nodes in the dependency parse tree (i.e. |
Dependency-based Pre-ordering Rule Set | 9) is much fewer than that in its corresponding constituent parse tree (i.e. |
Introduction | These pre-ordering approaches first parse the source language sentences to create parse trees . |
Introduction | Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language. |
Introduction | terrorism definition (a) A constituent parse tree |
Composite language model | Figure 1: A composite n-gram/m-SLM/PLSA language model where the hidden information is the parse tree T and semantic content 9. |
Training algorithm | the lth sentence Wl with its parse tree structure Tl |
Training algorithm | of tag 75 predicted by word 21) and the tags of m most recent exposed headwords in parse tree Tl of the lth sentence Wl in document d, and finally #(ahjn, Wl, Tl, d) is the count of constructor move a conditioning on m exposed headwords bin in parse tree Tl of the lth sentence Wl in document d. |
Training algorithm | For a given sentence, its parse tree and semantic content are hidden and the number of parse trees grows faster than exponential with sentence length, Wang et al. |
Abstract | Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data. |
Abstract | 1) ambiguity encoded in parse forests compromises noise in l-best parse trees . |
Abstract | During training, the parser is aware of these ambiguous structures, and has the flexibility to distribute probability mass to its preferred parse trees as long as the likelihood improves. |
Introduction | Both work employs two parsers to process the unlabeled data, and only select as extra training data sentences on which the 1-best parse trees of the two parsers are identical. |
Introduction | Different from traditional self/co/tri-training which only use l-best parse trees on unlabeled data, our approach adopts ambiguous labelings, represented by parse forest, as gold-standard for unlabeled sentences. |
Introduction | The forest is formed by two parse trees , respectively shown at the upper and lower sides of the sentence. |
Abstract | We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. |
Decoding | Given a grammar G, and the input source parse tree 7r from a monolingual parser, we first construct the elementary tree for a source span, and then retrieve all the relevant subgraphs seen in the given grammar through the proposed operators. |
Elementary Trees to String Grammar | We propose to use variations of an elementary tree, which is a connected sub graph fitted in the original monolingual parse tree . |
Elementary Trees to String Grammar | where of is a set of frontier nodes which contain nonterminals or words; of are the interior nodes with source la-bels/symbols; E is the set of edges connecting the nodes 12 = of +vi into a connected subgraph fitted in the source parse tree ; 6 is the immediate common parent of the frontier nodes of . |
Experiments | There are 16 thousand human parse trees with human alignment; additional 1 thousand human parse and aligned sent-pairs are used as unseen test set to verify our MaxEnt models and parsers. |
Introduction | For instance, in Arabic—to—English translation, we find only 45.5% of Arabic NP-SBJ structures are mapped to the English NP-SBJ with machine alignment and parse trees, and only 60.1% of NP-SBJs are mapped with human alignment and parse trees as in § 2. |
Introduction | Mi and Huang (2008) introduced parse forests to blur the chunking decisions to a certain degree, to expand search space and reduce parsing errors from l-best trees (Mi et al., 2008); others tried to use the parse trees as soft constraints on top of unlabeled grammar such as Hiero (Marton and Resnik, 2008; Chiang, 2010; Huang et al., 2010; Shen et al., 2010) without sufficiently leveraging rich tree context. |
Introduction | On the basis of our study on investigating the language divergence between Arabic-English with human aligned and parsed data, we integrate several simple statistical operations, to transform parse trees adaptively to serve the |
The Projectable Structures | We carried out a controlled study on the projectable structures using human annotated parse trees and word alignment for 5k Arabic—English sentence-pairs. |
The Projectable Structures | In Table 1, the unlabeled F-measures with machine alignment and parse trees show that, for only 48.71% of the time, the boundaries introduced by the source parses |
The Projectable Structures | Table 1: The labeled and unlabeled F-measures for projecting the source nodes onto the target side via alignments and parse trees ; unlabeled F—measures show the bracketing accuracies for translating a source span contiguously. |
Abstract | In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. |
Incorporating Structural Syntactic Information | A parse tree that covers both discourse arguments could provide us much syntactic information related to the pair. |
Incorporating Structural Syntactic Information | Both the syntactic flat path connecting connective and arguments and the 2-level production rules in the parse tree used in previous study can be directly described by the tree structure. |
Incorporating Structural Syntactic Information | To present their syntactic properties and relations in a single tree structure, we construct a syntax tree for each paragraph by attaching the parsing trees of all its sentences to an upper paragraph node. |
Introduction | Nevertheless, Ben and James (2007) only uses flat syntactic path connecting connective and arguments in the parse tree . |
Introduction | (2009) uses 2-level production rules to represent parse tree information. |
Introduction | information from the parse trees for discourse analysis, applying kernel function to the parse tree structures directly. |
Related Work | While the feature based approach may not be able to fully utilize the syntactic information in a parse tree , an alternative to the feature-based methods, tree kernel methods (Haussler, 1999) have been proposed to implicitly explore features in a high dimensional space by employing a kernel function to calculate the similarity between two objects directly. |
The Recognition Framework | One advantage of SVM is that we can use tree kernel approach to capture syntactic parse tree information in a particular high-dimension space. |
Background and Related Work | Our SR-TSG work is built upon recent work on Bayesian TSG induction from parse trees (Post and Gildea, 2009; Cohn et al., 2010). |
Background and Related Work | A derivation is a process of forming a parse tree . |
Background and Related Work | Figure la shows an example parse tree and Figure lb shows its example TSG derivation. |
Inference | We use Markov Chain Monte Carlo (MCMC) sampling to infer the SR-TSG derivations from parse trees . |
Inference | We first infer latent symbol subcategories for every symbol in the parse trees , and then infer latent substitution sites stepwise. |
Inference | After that, we unfiX that assumption and infer latent substitution sites given symbol-refined parse trees . |
Symbol-Refined Tree Substitution Grammars | As with previous work on TSG induction, our task is the induction of SR-TSG derivations from a corpus of parse trees in an unsupervised fashion. |
Symbol-Refined Tree Substitution Grammars | That is, we wish to infer the symbol subcategories of every node and substitution site (i.e., nodes where substitution occurs) from parse trees . |
Abstract | Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for subtree alignment and the bilingual tree kernels can well capture such features. |
Abstract | The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. |
Introduction | A subtree alignment process pairs up subtree pairs across bilingual parse trees whose contexts are semantically translational equivalent. |
Introduction | (2007), a subtree aligned parse tree pair follows the following criteria: |
Introduction | Each pair consists of both the lexical constituents and their maximum tree structures generated over the lexical sequences in the original parse trees . |
Substructure Spaces for BTKs | The plain syntactic structural features can deal with the structural divergence of bilingual parse trees in a more general perspective. |
Substructure Spaces for BTKs | _ lin(S)| lin(T)I $161) _ lin(S)I lin(T)I S and T refer to the entire source and target parse trees respectively. |
Substructure Spaces for BTKs | Therefore, |in(S)| and |in(T)| are the respective span length of the parse tree used for normalization. |
Abstract | In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. |
Experiments | None means the original sentences without reordering; Oracle means the best permutation allowed by the source parse tree ; ManR refers to manual reorder rules; Rank means ranking reordering model. |
Experiments | On the other hand, the performance of the ranking reorder model still fall far short of oracle, which is the lowest crossing-link number of all possible permutations allowed by the parse tree . |
Introduction | The most notable solution to this problem is adopting syntaX-based SMT models, especially methods making use of source side syntactic parse trees . |
Introduction | One is tree-to-string model (Quirk et al., 2005; Liu et al., 2006) which directly uses source parse trees to derive a large set of translation rules and associated model parameters. |
Introduction | The other is called syntax pre-reordering — an approach that re-positions source words to approximate target language word order as much as possible based on the features from source syntactic parse trees . |
Word Reordering as Syntax Tree Node Ranking | Given a source side parse tree T6, the task of word reordering is to transform Te to T4, so that 6’ can match the word order in target language as much as possible. |
Word Reordering as Syntax Tree Node Ranking | By permuting tree nodes in the parse tree , the source sentence is reordered into the target language order. |
Word Reordering as Syntax Tree Node Ranking | parse tree , we can obtain the same word order of Japanese translation. |
Abstract | Therefore, it can not only utilize forest structure that compactly encodes exponential number of parse trees but also capture non-syntactic translation equivalences with linguistically structured information through tree sequence. |
Forest-based tree sequence to string model | parse trees ) for a given sentence under a context free grammar (CFG). |
Forest-based tree sequence to string model | The two parse trees T1 and T2 encoded in Fig. |
Forest-based tree sequence to string model | Different parse tree represents different derivations and explanations for a given sentence. |
Introduction | In theory, one may worry about whether the advantage of tree sequence has already been covered by forest because forest encodes implicitly a huge number of parse trees and these parse trees may generate many different phrases and structure segmentations given a source sentence. |
Related work | Here, a tree sequence refers to a sequence of consecutive sub-trees that are embedded in a full parse tree . |
Related work | parse trees . |
Analysis | This proportion, which we call consistent constituent matching (CCM) rate , reflects the extent to which the translation output respects the source parse tree . |
Experiments | We removed 15,250 sentences, for which the Chinese parser failed to produce syntactic parse trees . |
Introduction | Consider the following Chinese fragment with its parse tree: |
Introduction | However, the parse tree of the source fragment constrains the phrase “ER “13”” to be translated as a unit. |
Introduction | Without considering syntactic constraints from the parse tree , the decoder makes wrong decisions not only on phrase movement but also on the lexical selection for the multi-meaning word “75’”. |
The Acquisition of Bracketing Instances | Let c and e be the source sentence and the target sentence, W be the word alignment between them, T be the parse tree of c. We define a binary bracketing instance as a tuple (b,7'(cinj),7'(Cj+1nk),7'(cink)> where b E {bracketable,unbracketable}, cinj and cj+1nlc are two neighboring source phrases and 7'(T, 3) (7(3) for short) is a subtree function which returns the minimal subtree covering the source sequence 3 from the source parse tree T. Note that 7(cz-nk) includes both 7(cz-nj) and flog-+1.19). |
The Acquisition of Bracketing Instances | 1: Input: sentence pair (0, e), the parse tree T of c and the word alignment W between c and e 2: QR :2 (Z) 3: for each (i,j, k) E cdo 4: if There exist a target phrase can” aligned to Cinj and ep,,q aligned to Cj+1,_k; then |
The Syntax-Driven Bracketing Model 3.1 The Model | These features capture syntactic “horizontal context” which demonstrates the expansion trend of the source phrase 3, 31 and 32 on the parse tree . |
The Syntax-Driven Bracketing Model 3.1 The Model | The tree path 0(31)..0(3) connecting 0(31) and 0(3), 0(32)..0(3) connecting 0(32) and 0(3), and 0(3)..p connecting 0(3) and the root node p of the whole parse tree are used as features. |
The Syntax-Driven Bracketing Model 3.1 The Model | These features provide syntactic “vertical context” which shows the generation history of the source phrases on the parse tree . |
Abstract | successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees . |
Background | They also altered the processes for constructing productions and mapping parse trees to MRs in order to make the construction of semantic interpretations more compositional and allow the efficient construction of more complex representa- |
Background | A simplified version of a sample parse tree for Kim and Mooney’s model is shown in Figure 2. |
Introduction | parse trees ) to train the discriminative classifier. |
Modified Reranking Algorithm | Therefore, we modify it to rerank the parse trees generated by Kim and Mooney (2012)’s model. |
Modified Reranking Algorithm | The approach requires three subcomponents: l) a GEN function that returns the list of top n candidate parse trees for each NL sentence produced by the generative model, 2) a feature function (I) that maps a NL sentence, 6, and a parse tree, y, into a real-valued feature vector (19(6, 3/) 6 Rd, and 3) a reference parse tree that is compared to the highest-scoring parse tree during training. |
Modified Reranking Algorithm | However, grounded language learning tasks, such as our navigation task, do not provide reference parse trees for training examples. |
A grammar for semantic tagging | T wo equivalent CFSG parse trees |
A grammar for semantic tagging | Figure (7a) shows an example of a parse tree generated for the query “Canon vs Sony Camera” in which B, Q, and T are abbreviations for Brand, Query, and Type, and U is a special tag for the words that does not fall into any other tag categories and have been left unlabeled in our corpus such as a, the, for, etc. |
A grammar for semantic tagging | A more careful look at the grammar shows that there is another parse tree for this query as shown in figure (7b). |
Abstract | In order to take contextual information into account, a discriminative model is used on top of the parser to re—rank the n—best parse trees generated by the parser. |
Introduction | To overcome this limitation, we further present a discriminative re-ranking module on top of the parser to re-rank the n-best parse trees generated by the parser using contextual features. |
Our Grammar Model | A CFSG parse tree |
Our Grammar Model | A CFSG parse tree |
Our Grammar Model | (9) EjP(Ai—> xi) = 1 Consider a sentence wle...wn, a parse tree T of this sentence, and an interior node v in T labeled with Av and assume that v1, 122, ...vk are the children of the node v in T. We define: |
Parsing Algorithm | This information is necessary for the termination step in order to print the parse trees . |
Algorithm | As preprocessing, we use an unsupervised parser that generates an unlabeled parse tree for each sen- |
Algorithm | Second, they should be k-th degree cousins of the predicate in the parse tree . |
Algorithm | Our algorithm attempts to find sub-trees within the parse tree , whose structure resembles the structure of a full sentence. |
Experimental Setup | A minimal clause is the lowest ancestor of the verb in the parse tree that has a syntactic label of a clause according to the gold standard parse of the PTB. |
Introduction | Initially, the set of possible arguments for a given verb consists of all the constituents in the parse tree that do not contain that predicate. |
Introduction | Using this information, it further reduces the possible arguments only to those contained in the minimal clause, and further prunes them according to their position in the parse tree . |
Related Work | In addition, most models assume that a syntactic representation of the sentence is given, commonly in the form of a parse tree , a dependency structure or a shallow parse. |
Results | In our algorithm, the initial set of potential arguments consists of constituents in the Seginer parser’s parse tree . |
Abstract | A basic approach is template matching on parse trees . |
Abstract | To improve recall, irregularities in parse trees caused by verb form errors are taken into account; to improve precision, n-gram counts are utilized to filter proposed corrections. |
Data 5.1 Development Data | To investigate irregularities in parse tree patterns (see §3.2), we utilized the AQUAINT Corpus of English News Text. |
Introduction | We build on the basic approach of template-matching on parse trees in two ways. |
Introduction | To improve recall, irregularities in parse trees caused by verb form errors are considered; to improve precision, n-gram counts are utilized to filter proposed corrections. |
Previous Research | Similar strategies with parse trees are pursued in (Bender et al., 2004), and error templates are utilized in (Heidom, 2000) for a word processor. |
Previous Research | Relative to verb forms, errors in these categories do not “disturb” the parse tree as much. |
Research Issues | The success of this strategy, then, hinges on accurate identification of these items, for example, from parse trees . |
Research Issues | In other words, sentences containing verb form errors are more likely to yield an “incorrect” parse tree , sometimes with significant differences. |
Research Issues | One goal of this paper is to recognize irregularities in parse trees caused by verb form errors, in order to increase recall. |
Abstract | Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. |
Conclusion and future work | We have presented a novel forest-based translation approach which uses a packed forest rather than the 1-best parse tree (or k-best parse trees ) to direct the translation. |
Experiments | Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees. |
Experiments | 1' (rank of the parse tree picked by the decoder) |
Experiments | Figure 5: Percentage of the i-th best parse tree being picked in decoding. |
Forest-based translation | Informally, a packed parse forest, or forest in short, is a compact representation of all the derivations (i.e., parse trees ) for a given sentence under a context-free grammar (Billot and Lang, 1989). |
Forest-based translation | The parse tree for the preposition case is shown in Figure 2(b) as the l-best parse, while for the conjunction case, the two proper nouns (Basin and Shalong) are combined to form a coordinated NP |
Forest-based translation | Shown in Figure 3(a), these two parse trees can be represented as a single forest by sharing common subtrees such as NPB0,1 and VPB3,6. |
Introduction | Depending on the type of input, these efforts can be divided into two broad categories: the string-based systems whose input is a string to be simultaneously parsed and translated by a synchronous grammar (Wu, 1997; Chiang, 2005; Galley et al., 2006), and the tree-based systems whose input is already a parse tree to be directly converted into a target tree or string (Lin, 2004; Ding and Palmer, 2005; Quirk et al., 2005; Liu et al., 2006; Huang et al., 2006). |
Introduction | However, despite these advantages, current tree-based systems suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors (Quirk and Corston-Oliver, 2006). |
Dependency parsing schemata | Parse tree: A partial dependency tree 75 E D-trees is a parse tree for a given string wl . |
Dependency parsing schemata | .Qn, we will say it is a projective parse tree for the string. |
Dependency parsing schemata | Final items in this formalism will be those containing some forest F containing a parse tree for some arbitrary string. |
Introduction | Each item contains a piece of information about the sentence’s structure, and a successful parsing process will produce at least one final item containing a full parse tree for the sentence or guaranteeing its existence. |
Introduction | Items in parsing schemata are formally defined as sets of partial parse trees from a set denoted |
Introduction | Trees(G), which is the set of all the possible partial parse trees that do not violate the constraints imposed by a grammar G. More formally, an item set I is defined by Sikkel as a quotient set associated with an equivalence relation on Trees(G).1 |
Abstract | We introduce three linguistically motivated structured regularizers based on parse trees , topics, and hierarchical word clusters for text categorization. |
Structured Regularizers for Text | Figure 1: An example of a parse tree from the Stanford sentiment treebank, which annotates sentiment at the level of every constituent (indicated here by —|— and ++; no marking indicates neutral sentiment). |
Structured Regularizers for Text | 4.2 Parse Tree Regularizer |
Structured Regularizers for Text | Sentence boundaries are a rather superficial kind of linguistic structure; syntactic parse trees provide more fine-grained information. |
Introduction | 1 A tree sequence refers to an ordered subtree sequence that covers a phrase or a consecutive tree fragment in a parse tree . |
Related Work | Yamada and Knight (2001) use noisy-channel model to transfer a target parse tree into a source sentence. |
Related Work | (2006) propose a feature-based discriminative model for target language syntactic structures prediction, given a source parse tree . |
Related Work | (2006) create an xRS rule headed by a pseudo, non-syntactic nonterminal symbol that subsumes the phrase and its corresponding multi-headed syntactic structure; and one sibling xRS rule that explains how the pseudo symbol can be combined with other genuine non-terminals for acquiring the genuine parse trees . |
Tree Sequence Alignment Model | source and target parse trees T ( fl‘]) and T (ell ) in Fig. |
Tree Sequence Alignment Model | 2 illustrates two examples of tree sequences derived from the two parse trees . |
Tree Sequence Alignment Model | and their parse trees T(f1‘]) and T (611 ) 9 the tree |
Chinese Empty Category Prediction | For instance, Yang and Xue (2010) attempted to predict the existence of an EC before a word; Luo and Zhao (2011) predicted ECs on parse trees , but the position information of some ECs is partially lost in their representation. |
Chinese Empty Category Prediction | Furthermore, Luo and Zhao (2011) conducted experiments on gold parse trees only. |
Chinese Empty Category Prediction | our opinion, recovering ECs from machine parse trees is more meaningful since that is what one would encounter when developing a downstream application such as machine translation. |
Integrating Empty Categories in Machine Translation | As mentioned in the previous section, the output of our EC predictor is a new parse tree with the labels and positions |
Integrating Empty Categories in Machine Translation | In this work we also take advantages of the augmented Chinese parse trees (with ECs projected to the surface) and extract tree-to-string grammar (Liu et al., 2006) for a tree-to-string MT system. |
Integrating Empty Categories in Machine Translation | Due to the recovered ECs in the source parse trees , the tree-to-string grammar extracted from such trees can be more discriminative, with an increased capability of distinguishing different context. |
Abstract | We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree . |
Conclusions and Outlook | first step in this direction by estimating the probability of a parse tree . |
Conclusions and Outlook | However, our model only looks at the structure of a parse tree and does not take the actual words into account. |
Experiments | As P(T) does not directly apply to parse trees , all possible readings have to be unpacked. |
Experiments | For these lattices the grammar-based language model was simply switched off in the experiment, as no parse trees were produced for efficiency reasons. |
Language Model 2.1 The General Approach | (2) Pyram(W) is defined as the probability of the most likely parse tree of a word sequence W: P W = P T 3 gram( ) Tepggefiw) ( > ( ) To determine Pyram(W) is an expensive operation as it involves parsing. |
Language Model 2.1 The General Approach | 2.2 The Probability of a Parse Tree |
Language Model 2.1 The General Approach | The parse trees produced by our parser are binary-branching and rather deep. |
Features | Feature development Our features are inspired by analysis of patterns contained among our gold alignment data and automatically generated parse trees . |
Features | link (e, f) if the part-of-speech tag of e is t. The conditional probabilities in this table are computed from our parse trees and the baseline Model 4 alignments. |
Features | 0 Features PP-NP-head, NP-DT-head, and VP-VP-head (Figure 6) all exploit headwords on the parse tree . |
Introduction | Using a foreign string and an English parse tree as input, we formulate a bottom-up search on the parse tree , with the structure of the tree as a backbone for building a hypergraph of possible alignments. |
Word Alignment as a Hypergraph | Algorithm input The input to our alignment algorithm is a sentence-pair (e’i‘, 1m) and a parse tree over one of the input sentences. |
Word Alignment as a Hypergraph | To generate parse trees , we use the Berkeley parser (Petrov et al., 2006), and use Collins head rules (Collins, 2003) to head-out binarize each tree. |
Word Alignment as a Hypergraph | Word alignments are built bottom-up on the parse tree . |
Abstract | Unlike previous methods, it exploits an existing syntactic parser to produce disam-biguated parse trees that drive the compositional semantic interpretation. |
Ensuring Meaning Composition | 3 only works if the syntactic parse tree strictly follows the predicate-argument structure of the MR, since meaning composition at each node is assumed to combine a predicate with one of its arguments. |
Introduction | 1Ge and Mooney (2005) use training examples with semantically annotated parse trees , and Zettlemoyer and Collins (2005) learn a probabilistic semantic parsing model |
Introduction | This paper presents an approach to learning semantic parsers that uses parse trees from an existing syntactic analyzer to drive the interpretation process. |
Learning Semantic Knowledge | Next, each resulting parse tree is linearized to produce a sequence of predicates by using a top-down, left-to-right traversal of the parse tree . |
Semantic Parsing Framework | Th framework is composed of three components: 1 an existing syntactic parser to produce parse tree for NL sentences; 2) learned semantic knowledg |
Semantic Parsing Framework | First, the syntactic parser produces a parse tree for the NL sentence. |
Semantic Parsing Framework | 3(a) shows one possible semantically-augmented parse tree (SAPT) (Ge and Mooney, 2005) for the condition part of the example in Fig. |
Dependency Parsing | Let us first describe formally the set of legal dependency parse trees . |
Dependency Parsing | We define the set of legal dependency parse trees of at (denoted 34:10)) as the set of O-arborescences of D, i.e., we admit each arborescence as a potential dependency tree. |
Dependency Parsing | Combinatorial algorithms (Chu and Liu, 1965; Edmonds, 1967) can solve this problem in cubic time.4 If the dependency parse trees are restricted to be projective, cubic-time algorithms are available via dynamic programming (Eisner, 1996). |
Dependency Parsing as an ILP | Our formulations rely on a concise polyhedral representation of the set of candidate dependency parse trees , as sketched in §2.l. |
Dependency Parsing as an ILP | For most languages, dependency parse trees tend to be nearly projective (cf. |
Dependency Parsing as an ILP | It would be straightforward to adapt the constraints in §3.5 to allow only projective parse trees : simply force 23" = 0 for any a E A. |
Experiments | net /projects /mstparser 11Note that, unlike reranking approaches, there are still exponentially many candidate parse trees after pruning. |
Experiments | 13Unlike our model, the hybrid models used here as baselines make use of the dependency labels at training time; indeed, the transition-based parser is trained to predict a labeled dependency parse tree , and the graph-based parser use these predicted labels as input features. |
Abstract | We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees . |
Related Work | Rather than attempt to derive a new parse tree like Knight and Marcu (2000) and Galley and McKeown (2007), we learn to safely remove a set of constituents in our parse tree-based compression model while preserving grammatical structure and essential content. |
Results | Those issues can be addressed by analyzing k-best parse trees and we leave it in the future work. |
Sentence Compression | Our tree-based compression methods are in line with syntax-driven approaches (Galley and McKeown, 2007), where operations are carried out on parse tree constituents. |
Sentence Compression | Unlike previous work (Knight and Marcu, 2000; Galley and McKeown, 2007), we do not produce a new parse tree, |
Sentence Compression | Formally, given a parse tree T of the sentence to be compressed and a tree traversal algorithm, T can be presented as a list of ordered constituent nodes, T = 750751 . |
Abstract | Cog-nitively, it is more plausible to assume that children obtain only terminal strings of parse trees and not the actual parse trees . |
Abstract | Most existing solutions treat the problem of unsupervised parsing by assuming a generative process over parse trees e.g. |
Abstract | Unlike in phylogenetics and graphical models, where a single latent tree is constructed for all the data, in our case, each part of speech sequence is associated with its own parse tree . |
Joint POS Tagging and Parsing with Nonlocal Features | Assuming an input sentence contains n words, in order to reach a terminal state, the initial state requires n sh—x actions to consume all words in 6, and n — l rl/rr—x actions to construct a complete parse tree by consuming all the subtrees in 0. |
Joint POS Tagging and Parsing with Nonlocal Features | For example, the parse tree in Figure la contains no ru—x action, while the parse tree for the same input sentence in Figure lb contains four ru—x actions. |
Joint POS Tagging and Parsing with Nonlocal Features | Input: A word-segmented sentence, beam size k. Output: A constituent parse tree . |
Transition-based Constituent Parsing | empty stack 0 and a queue 6 containing the entire input sentence (word-POS pairs), and the terminal states have an empty queue 6 and a stack 0 containing only one complete parse tree . |
Transition-based Constituent Parsing | In order to construct lexicalized constituent parse trees , we define the following actions for the action set T according to (Sagae and Lavie, 2005; Wang et al., 2006; Zhang and Clark, 2009): |
Transition-based Constituent Parsing | For example, in Figure l, for the input sentence wowlwg and its POS tags abc, our parser can construct two parse trees using action sequences given below these trees. |
Bayesian inference for PCFGs | In the supervised setting the data D consists of a corpus of parse trees D = (t1, . |
Bayesian inference for PCFGs | Ignoring issues of tightness for the moment and setting P(t | 6)) = Me (If) , this means that in the supervised setting the posterior distribution P(@ | 13,04) given a set of parse trees 1: = (t1, . |
Bayesian inference for PCFGs | The algorithms we give here are based on their Gibbs sampler, which in each iteration first samples parse trees |
Introduction | Cognitively it is implausible that children can perceive the parse trees of the language they are learning, but it is more reasonable to assume that they can obtain the terminal strings or yield of these trees. |
Introduction | (2007) proposed MCMC samplers for the posterior distribution over rule probabilities and the parse trees of the training data strings. |
Introduction | timates are always tight for both the supervised case (where the input consists of parse trees ) and the unsupervised case (where the input consists of yields or terminal strings). |
Abstract | The network does not rely on a parse tree and is easily applicable to any language. |
Background | A model that adopts a more general structure provided by an external parse tree is the Recursive Neural Network (RecNN) (Pollack, 1990; Kiichler and Goller, 1996; Socher et al., 2011; Hermann and Blunsom, 2013). |
Background | It is sensitive to the order of the words in the sentence and it does not depend on external language-specific features such as dependency or constituency parse trees . |
Experiments | RECNTN is a recursive neural network with a tensor-based feature function, which relies on external structural features given by a parse tree and performs best among the RecNNs. |
Introduction | The feature graph induces a hierarchical structure somewhat akin to that in a syntactic parse tree . |
Properties of the Sentence Model | The recursive neural network follows the structure of an external parse tree . |
Properties of the Sentence Model | Likewise, the induced graph structure in a DCNN is more general than a parse tree in that it is not limited to syntactically dictated phrases; the graph structure can capture short or long-range semantic relations between words that do not necessarily correspond to the syntactic relations in a parse tree . |
Properties of the Sentence Model | The DCNN has internal input-dependent structure and does not rely on externally provided parse trees , which makes the DCNN directly applicable to hard-to-parse sentences such as tweets and to sentences from any language. |
Base Models | Figure 3c shows a parse tree representation of a semi-CRF. |
Base Models | Let t be a complete parse tree for sentence 3, and each local subtree 7“ E t encodes both the rule from the grammar, and the span and split information (e.g NP(7,9) —> JJ(7,8)NN(8,9) which covers the last two words in Figure l). |
Base Models | f(7~,s)} (9) r675 To compute the partition function ZS, which serves to normalize the function, we must sum over 7(3), the set of all possible parse trees for sentence 3. |
Experiments and Discussion | For the hierarchical model, we used the CNN portion of the data (5093 sentences) for the extra named entity data (and ignored the parse trees ) and the remaining portions combined for the extra parse data (and ignored the named entity annotations). |
Introduction | When trained separately, these single-task models can produce outputs which are inconsistent with one another, such as named entities which do not correspond to any nodes in the parse tree (see Figure l for an example). |
Introduction | Because a named entity should correspond to a node in the parse tree , strong evidence about either aspect of the model should positively impact the other aspect |
Experiments | To obtain syntactic parse trees and semantic roles on the tuning and test datasets, we first parse the source sentences with the Berkeley Parser (Petrov and Klein, 2007), trained on the Chinese Treebank 7.0 (Xue et al., 2005). |
Experiments | In order to understand how well the MR08 system respects their reordering preference, we use the gold alignment dataset LDC2006E86, in which the source sentences are from the Chinese Treebank, and thus both the gold parse trees and gold predicate-argument structures are available. |
Related Work | (2012) obtained word order by using a reranking approach to reposition nodes in syntactic parse trees . |
Unified Linguistic Reordering Models | According to the annotation principles in (Chinese) PropB ank (Palmer et al., 2005; Xue and Palmer, 2009), all the roles in a PAS map to a corresponding constituent in the parse tree , and these constituents (e.g., NPs and VBD in Figure 1) do not overlap with each other. |
Unified Linguistic Reordering Models | parse tree and its word alignment links to the target language. |
Unified Linguistic Reordering Models | Given a hypothesis H with its alignment a, it traverses all CFG rules in the parse tree and sees if two adjacent constituents are conditioned to trigger the reordering models (lines 2-4). |
Abstract | This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree . |
Conclusion | This paper has presented a higher-order model for constituent parsing that factorizes a parse tree into larger parts than before, in hopes of increasing its power of discriminating the true parse from the others without losing tractability. |
Higher-order Constituent Parsing | Figure l: A part of a parse tree centered at NP —> NP VP |
Higher-order Constituent Parsing | A part in a parse tree is illustrated in Figure 1. |
Introduction | Previous discriminative parsing models usually factor a parse tree into a set of parts. |
Introduction | It allows multiple adjacent grammar rules in each part of a parse tree , so as to utilize more local structural context to decide the plausibility of a grammar rule instance. |
Deriving Eisner Normal Form | (30) For a set S of semantically equivalent2 parse trees for a string ABC, admit the unique parse tree such that at least one of (i) or (ii) holds: |
Deriving Eisner Normal Form | (31) Theorem 1 : For every parse tree oz, there is a semantically equivalent parse-tree N F(a) in which no node resulting from application of B or S functions as the primary functor in a rule application. |
Deriving Eisner Normal Form | (32) Theorem 2: If N F(a) and N F(o/ ) are distinct parse trees , then their model-theoretic interpretations are distinct. |
Experimental Setup | We split the sentence based on the ending punctuation, predict the parse tree for each segment and group the roots of resulting trees into a single node. |
Introduction | Because the number of alternatives is small, the scoring function could in principle involve arbitrary (global) features of parse trees . |
Related Work | Reranking can be combined with an arbitrary scoring function, and thus can easily incorporate global features over the entire parse tree . |
Sampling-Based Dependency Parsing with Global Features | Ideally, we would change multiple heads in the parse tree simultaneously, and sample those choices from the corresponding conditional distribution of p. While in general this is increasingly difficult with more heads, it is indeed tractable if |
Sampling-Based Dependency Parsing with Global Features | 3/ is always a valid parse tree if we allow multiple children of the root and do not impose projective constraint. |
Sampling-Based Dependency Parsing with Global Features | We extend our model such that it jointly learns how to predict a parse tree and also correct the predicted POS tags for a better parsing performance. |
Content Selection | Both the indicator and argument take the form of constituents in the parse tree . |
Surface Realization | It takes as input a set of relation instances (from the same cluster) R = {(indi, argi)}i]:1 that are produced by content selection component, a set of templates T = {tj that are represented as parsing trees , a transformation function F (described below), and a statistical ranker S for ranking the generated abstracts, for which we defer description later in this Section. |
Surface Realization | The transformation function F models the constituent-level transformations of relation instances and their mappings to the parse trees of templates. |
Surface Realization | F all-Constitnent Mapping denotes that a source constituent is mapped directly to a target constituent of the template parse tree with the same tag. |
Background | We focus on methods that perform transformations over parse trees , and highlight the search challenge with which they are faced. |
Background | In our domain, each state is a parse tree , which is expanded by performing all applicable transformations. |
Search for Textual Inference | Let t be a parse tree , and let 0 be a transformation. |
Search for Textual Inference | Denoting by tT and tH the text parse tree and the hypothesis parse tree , a proof system has to find a sequence 0 with minimal cost such that tT lO m. This forms a search problem of finding the lowest-cost proof among all possible proofs. |
Search for Textual Inference | Next, for a transformation 0, applied on a parse tree If, we define arequiredfi, 0) as the subset of 75’s nodes required for applying 0 (i.e., in the absence of these nodes, 0 could not be applied). |
Experiments | We use the C&C parser (Clark and Curran, 2007) to generate CCG parse trees for the data used in our experiments. |
Experiments | We assume fixed parse trees for all of the compounds (Figure 6), and use these to compute compound level vectors for all word pairs. |
Experiments | Our experimental findings indicate a clear advantage for a deeper integration of syntax over models that use only the bracketing structure of the parse tree . |
Introduction | We achieve this goal by employing the CCG formalism to consider compositional structures at any point in a parse tree . |
Model | We use the parse tree to structure an RAE, so that each combinatory step is represented by an autoencoder function. |
Model | As an internal baseline we use model CCAE-A, which is an RAE structured along a CCG parse tree . |
Introduction | English sentences are usually analyzed by a full parser to make parse trees , and the trees are then trimmed (Knight and Marcu, 2002; Turner and Chamiak, 2005; Unno et al., 2006). |
Introduction | For Japanese, dependency trees are trimmed instead of full parse trees (Takeuchi and Matsumoto, 2001; Oguro et al., 2002; Nomoto, 2008)1 This parsing approach is reasonable because the compressed output is grammatical if the |
Related work | For instance, most English sentence compression methods make full parse trees and trim them by applying the generative model (Knight and Marcu, 2002; Turner and Charniak, 2005), discrimina-tive model (Knight and Marcu, 2002; Unno et a1., 2006). |
Related work | For Japanese sentences, instead of using full parse trees , existing sentence compression methods trim dependency trees by the discrim-inative model (Takeuchi and Matsumoto, 2001; Nomoto, 2008) through the use of simple linear combined features (Oguro et a1., 2002). |
Related work | They simply regard a sentence as a word sequence and structural information, such as full parse tree or dependency trees, are encoded in the sequence as features. |
Approach | (2005) also note these problems and solve them by introducing dozens of rules to transform the transferred parse trees . |
Experiments | The baseline constructs a full parse tree from the incomplete and possibly conflicting transferred edges using a simple random process. |
Introduction | In particular, we address challenges (1) and (2) by avoiding commitment to an entire projected parse tree in the target language during training. |
Parsing Models | The parsing model defines a conditional distribution p9(z | x) over each projective parse tree 2 for a particular sentence X, parameterized by a vector 6. |
Parsing Models | where z is a directed edge contained in the parse tree 2 and gb is a feature function. |
Parsing Models | where r(x) is the part of speech tag of the root of the parse tree 2, z is an edge from parent zp to child 20 in direction zd, either left or right, and vz indicates valency—false if zp has no other children further from it in direction zd than 20, true otherwise. |
Experiment | To relieve the negative effect of SRL errors, we get the multiple SRL results by providing the SRL system with 3-best parse trees of Berkeley parser (Petrov and Klein, 2007), 1-best parse tree of Bikel parser (Bikel, 2004) and Stanford parser (Klein and Manning, 2003). |
Experiment | Thus, the system using PASTRs can only attach the long phrase to the predicate “511:” according to the parse tree , and meanwhile, make use of a transformation rule as follows: |
Inside Context Integration | The stag sequence dominates the corresponding syntactic tree fragments in the parse tree . |
Inside Context Integration | (2012) attached the IC to its neighboring elements based on parse trees . |
Maximum Entropy PAS Disambiguation (MEPD) Model | These features include st(Ei), i.e., the highest syntax tag for each argument, and fst(PAS) which is the lowest father node of Sp in the parse tree . |
Background | While a detailed description of the respective parsing models is beyond the scope of this paper, it is worth noting that both parsers induce a context free grammar as well as a generative parsing model from a training set of parse trees , and use a development set to tune internal parameters. |
Experimental setting | One of the main requirements for our dataset is the availability of gold-standard sense and parse tree annotations. |
Experimental setting | The gold-standard parse tree annotations are required in order to carry out evaluation of parser and PP attachment performance. |
Experimental setting | Following Atterer and Schutze (2007), we wrote a script that, given a parse tree , identifies instances of PP attachment ambiguity and outputs the (v, n1 , p, n2) quadruple involved and the attachment decision. |
Introduction | Traditionally, parse disambiguation has relied on structural features extracted from syntactic parse trees , and made only limited use of semantic information. |
Experiment setup | Data As described earlier, the Stanford Sentiment Treebank (Socher et al., 2013) has manually annotated, real-valued sentiment values for all phrases in parse trees . |
Introduction | The recently available Stanford Sentiment Treebank (Socher et al., 2013) renders manually annotated, real-valued sentiment scores for all phrases in parse trees . |
Related work | Such models work in a bottom-up fashion over the parse tree of a sentence to infer the sentiment label of the sentence as a composition of the sentiment expressed by its constituting parts. |
Semantics-enriched modeling | A recursive neural tensor network (RNTN) is a specific form of feed-forward neural network based on syntactic (phrasal-structure) parse tree to conduct compositional sentiment analysis. |
Semantics-enriched modeling | Each node of the parse tree is a fixed-length vector that encodes compositional semantics and syntax, which can be used to predict the sentiment of this node. |
Abstract | The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. |
Conclusions | A further promising direction is broadening this set with labels taking advantage of both source and target-language linguistic annotation or categories exploring additional phrase-pair properties past the parse trees such as semantic annotations. |
Experiments | The results in Table 2(a) indicate that a large part of the performance improvement can be attributed to the use of the linguistic annotations extracted from the source parse trees , indicating the potential of the LTS system to take advantage of such additional annotations to deliver better translations. |
Introduction | Recent research tries to address these issues, by restructuring training data parse trees to better suit syntax-based SMT training (Wang et al., 2010), or by moving from linguistically motivated synchronous grammars to systems where linguistic plausibility of the translation is assessed through additional features in a phrase-based system (Venugopal et al., 2009; Chiang et al., 2009), obscuring the impact of higher level syntactic processes. |
Related Work | Earlier approaches for linguistic syntax-based translation such as (Yamada and Knight, 2001; Galley et al., 2006; Huang et al., 2006; Liu et al., 2006) focus on memorising and reusing parts of the structure of the source and/or target parse trees and constraining decoding by the input parse tree . |
Experiments | might be incorrect due to errors in English parse trees . |
Experiments | Given a source sentence, the corresponding syntax parse tree T S is first constructed with an English parser. |
Experiments | The other problem comes from the English head word selection error introduced by using source parse trees . |
Model Training and Application 3.1 Training | Based on the source syntax parse tree , for each measure word, we identified its head word by using a toolkit from (Chiang and Bikel, 2002) which can heuristically identify head words for sub-trees. |
Our Method | The source head word feature is defined to be a function fl to indicate whether a word ei is the source head word in English according to a parse tree of the source sentence. |
Fine-grained rule extraction | Considering that a parse tree is a trivial packed forest, we only use the term forest to expand our discussion, hereafter. |
Introduction | Dealing with the parse error problem and rule sparseness problem, Mi and Huang (2008) replaced the l-best parse tree with a packed forest which compactly encodes exponentially many parses for tree-to-string rule extraction. |
Related Work | fi] is a sentence of a foreign language other than English, E5 is a l-best parse tree of an English sentence E = e{, and A = {(j, is an alignment between the words in F and E. |
Related Work | Considering the parse error problem in the l-best or k-best parse trees , Mi and Huang (2008) extracted tree-to-string translation rules from aligned packed forest-string pairs. |
Related Work | In an HPSG parse tree , these lexical syntactic descriptions are included in the LEXENTRY feature (refer to Table 2) of a lexical node (Matsuzaki et al., 2007). |
Machine Translation Quality Prediction | We use the Stanford LeXicalized Parser (Klein and Manning, 2002) with the provided English PCFG model to parse a sentence into a parse tree . |
Machine Translation Quality Prediction | 1) Depth of the parse tree: It refers to the depth of the generated parse tree . |
Machine Translation Quality Prediction | 2) Number of SBARs in the parse tree : SBAR is defined as a clause introduced by a (possibly empty) subordinating conjunction. |
Experiments | The constituent parse trees were then transformed into dependency parse trees , using the head of each constituent (Jiang and Zhai, 2007b). |
Problem Statement | We extract features from a sequence representation and a parse tree representation of each relation instance. |
Problem Statement | Syntactic Features The syntactic parse tree of the relation instance sentence can be augmented to represent the relation instance. |
Problem Statement | Each node in the sequence or the parse tree is augmented by an argument tag that indicates whether the node corresponds to entity A, B, both, or neither. |
Learning | Inference A discriminative k-best parser was used to allow for arbitrary features in the parse tree . |
Learning | Unlike syntactic parsing, child types of a parse tree uniquely define the parent type of the rule; this is a direct consequence of our combination rules being functions with domains defined in terms of the temporal types, and therefore necessarily projecting their inputs into a single output type. |
Temporal Representation | The root of a parse tree should be one of these types. |
Temporal Representation | At the root of a parse tree , we recursively apply |
Improved hypotheses comparison | Unlike dependency parsing, constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes. |
Improved hypotheses comparison | Figure 2: Example parse trees of the same sentence with different numbers of actions. |
Introduction | The pioneering models rely on a classifier to make local decisions, and search greedily for a transition sequence to build a parse tree . |
Introduction | One difference between phrase-structure parsing and dependency parsing is that for the former, parse trees with different numbers of unary rules require different numbers of actions to build. |
Introduction | In the sentence “He expected to receive a prize for winning,” the path from “win” to its ARGO, “he”, involves the verbs “expect” and “receive” and the preposition “for.” The corresponding path through the parse tree likely occurs a relatively small number of times (or not at all) in the training corpus. |
Simple Sentence Production | This procedure is quite expensive; we have to copy the entire parse tree at each step, and in general, this procedure could generate an exponential number of transformed parses. |
Simplification Data Structure | In our case, the AND nodes are similar to constituent nodes in a parse tree — each has a category (e.g. |
Transformation Rules | A transformation rule takes as input a parse tree and produces as output a different, changed parse tree . |
Discussion | Recall that the joint model finds the global optimal solution over a set of opinion entity and relation candidates, which are obtained from the n-best CRF predictions and constituents in the parse tree that satisfy certain syntactic patterns. |
Model | Phrase type: the syntactic category of the deepest constituent that covers the candidate in the parse tree , e.g. |
Model | 2We use the Stanford Parser to generate parse trees and dependency graphs. |
Model | Neighboring constituents: The words and grammatical roles of neighboring constituents of the opinion expression in the parse tree — the left and right sibling of the deepest constituent containing the opinion expression in the parse tree . |
Building a Discourse Parser | The algorithm starts with a list of all atomic discourse sub-trees (made of single edus in their text order) and recursively selects the best match between adjacent sub-trees (using binary classifier S), labels the newly created subtree (using multi-label classifier L) and updates scoring for S, until only one subtree is left: the complete rhetorical parse tree for the input text. |
Evaluation | In each case, parse trees are evaluated using the four following, increasingly complex, matching criteria: blank tree structure (‘S’), tree structure with nuclearity (‘N’), tree structure with rhetorical relations (‘R’) and our final goal: fully labeled structure with both nuclearity and rhetorical relation labels (‘F’). |
Features | A promising concept introduced by Soricut and Marcu (2003) in their sentence-level parser is the identification of ‘dominance sets’ in the syntax parse trees associated to each input sentence. |
Introduction | To the best of our knowledge, Reitter’s (2003b) was the only previous research based exclusively on feature-rich supervised learning to produce text-level RST discourse parse trees . |
Task definition | Following our preVious work (J iang and Zhai, 2007b), we extract features from a sequence representation and a parse tree representation of each relation instance. |
Task definition | Each node in the sequence or the parse tree is augmented by an argument tag that indicates whether the node subsumes arg-I, arg-2, both or neither. |
Task definition | (2008), we trim the parse tree of a relation instance so that it contains only the most essential components. |
Generating from the KBGen Knowledge-Base | To extract a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) from the KB Gen data, we parse the sentences of the training corpus; project the entity and event variables to the syntactic projection of the strings they are aligned with; and extract the elementary trees of the resulting FB-LTAG from the parse tree using semantic information. |
Generating from the KBGen Knowledge-Base | After alignment, the entity and event variables occurring in the input semantics are associated with substrings of the yield of the syntactic parse tree . |
Generating from the KBGen Knowledge-Base | Once entity and event variables have been projected up the parse trees , we extract elementary FB-LTAG trees and their semantics from the input scenario as follows. |
Introduction | (2006) statistically report that discontinuities are very useful for translational equivalence analysis using binary branching structures under word alignment and parse tree constraints. |
NonContiguous Tree sequence Align-ment-based Model | Figure 2: A word-aligned parse tree pair |
NonContiguous Tree sequence Align-ment-based Model | Given the source and target sentence f1] and e{, as well as the corresponding parse trees T(f1]) and T(e{), our approach directly approximates the posterior probability Pr(T(e{)|T(f1])) based on the log-linear framework: |
Tree Sequence Pair Extraction | (2006) also reports that allowing gaps in one side only is enough to eliminate the hierarchical alignment failure with word alignment and one side parse tree constraints. |
Human Language Project | It is also notoriously difficult to obtain agreement about how parse trees should be defined in one language, much less in many languages simultaneously. |
Human Language Project | Let us suppose that the purpose of a parse tree is to mediate interpretation. |
Human Language Project | sus on parse trees is difficult, obtaining consensus on meaning representations is impossible. |
Mention Extraction System | We extract the label of the parse tree constituent (if it exists) that exactly covers the mention, and also labels of all constituents that covers the mention. |
Mention Extraction System | From a sentence, we gather the following as candidate mentions: all nouns and possessive pronouns, all named entities annotated by the the NE tagger (Ratinov and Roth, 2009), all base noun phrase (NP) chunks, all chunks satisfying the pattern: NP (PP NP)+, all NP constituents in the syntactic parse tree , and from each of these constituents, all substrings consisting of two or more words, provided the sub-strings do not start nor end on punctuation marks. |
Syntactico-Semantic Structures | with lw of m,- in the sentence Syntactic parse-label of parse tree constituent parse that exactly covers m,- |
Syntactico-Semantic Structures | parse-labels of parse tree constituents covering 771,; |
Projected Classification Instance | Suppose a bilingual sentence pair, composed of a source sentence e and its target translation f. ye is the parse tree of the source sentence. |
Projected Classification Instance | We define a boolean-valued function 6 (y, i, j, 7“) to investigate the dependency relationship of word 2' and word j in parse tree y: |
Word-Pair Classification Model | Ideally, given the classification results for all candidate word pairs, the dependency parse tree can be composed of the candidate edges with higher score (1 for the boolean-valued classifier, and large p for the real-valued classifier). |
Word-Pair Classification Model | This strategy alleviate the classification errors to some degree and ensure a valid, complete dependency parsing tree . |
Introduction | By incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext, they are believed better than phrase-based counterparts in reorderings. |
Introduction | Depending on the type of input, these models can be broadly divided into two categories (see Table l): the string-based systems whose input is a string to be simultaneously parsed and translated by a synchronous grammar, and the tree-based systems whose input is already a parse tree to be directly converted into a target tree or string. |
Model | A constituency forest (in Figure 1 left) is a compact representation of all the derivations (i.e., parse trees ) for a given sentence under a context-free grammar (Billot and Lang, 1989). |
Model | The solid line in Figure 1 shows the best parse tree , while the dashed one shows the second best tree. |
Parsing Time Expressions | Figure l: A CCG parse tree for the mention “one week ago.” The tree includes forward (>) and backward (<) application, as well as two type-shifting operations |
Parsing Time Expressions | The lexicon pairs words with categories and the combinators define how to combine categories to create complete parse trees . |
Parsing Time Expressions | For example, Figure 1 shows a CCG parse tree for the phrase “one week ago.” The parse tree is read top to bottom, starting from assigning categories to words using the lexicon. |
Resolution | Model Let y be a context-dependent CCG parse, which includes a parse tree TR(y), a set of context operations CNTX(y) applied to the logical form at the root of the tree, a final context-dependent logical form LF(y) and a TIMEX3 value Define gb(m, D, 3/) 6 Rd to be a d-dimensional feature—vector representation and 6 6 Rd to be a parameter vector. |
Conclusion and discussion | with the model of Zollmann and Venugopal (2006), using heuristically generated labels from parse trees . |
Introduction | (2006), target language parse trees are used to identify rules and label their nonterminal symbols, while Liu et al. |
Introduction | (2006) use source language parse trees instead. |
Introduction | Zollmann and Venugopal (2006) directly extend the rule extraction procedure from Chiang (2005) to heuristically label any phrase pair based on target language parse trees . |
Experiments | Most likely, this is because TextRunner’s heuristics rely on parse trees to label training examples, |
Experiments | The Stanford Parser is used to derive dependencies from CJ50 and gold parse trees . |
Related Work | Deep features are derived from parse trees with the hope of training better extractors (Zhang et al., 2006; Zhao and Grishman, 2005; Bunescu and Mooney, 2005; Wang, 2008). |
Wikipedia-based Open IE | Third, it discards the sentence if the subject and the attribute value do not appear in the same clause (or in parent/child clauses) in the parse tree . |
Abstract | First, it is semantic based in that it takes as input a deep semantic representation rather than e.g., a sentence or a parse tree . |
Introduction | While previous simplification approaches starts from either the input sentence or its parse tree , our model takes as input a deep semantic representation namely, the Discourse Representation Structure (DRS, (Kamp, 1981)) assigned by Boxer (Curran et al., 2007) to the input complex sentence. |
Related Work | Their simplification model encodes the probabilities for four rewriting operations on the parse tree of an input sentences namely, substitution, reordering, splitting and deletion. |
Related Work | (2010) and the edit history of Simple Wikipedia, Woodsend and Lapata (2011) learn a quasi synchronous grammar (Smith and Eisner, 2006) describing a loose alignment between parse trees of complex and of simple sentences. |
Background | We define an edge’s figure-of-merit (FOM) as an estimate of the product of its inside (6) and outside (04) scores, conceptually the relative merit the edge has to participate in the final parse tree (see Figure 1). |
Background | predictions about the unlabeled constituent structure of the target parse tree . |
Beam-Width Prediction | The optimal point will necessarily be very conservative, allowing outliers (sentences or sub-phrases with above average ambiguity) to stay within the beam and produce valid parse trees . |
Introduction | Exhaustive search for the maximum likelihood parse tree with a state-of-the-art grammar can require over a minute of processing for a single sentence of 25 words, an unacceptable amount of time for real-time applications or when processing millions of sentences. |
Heuristics-based pattern extraction | The input to the algorithm are a parse tree T and a set of target entities E. We first generate combinations of 1-3 elements of E (line 10), then for each combination 0 we identify all the nodes in T that mention any of the entities in C. We continue by constructing the MST of these nodes, and finally apply our heuristics to the nodes in the MST. |
Memory-based pattern extraction | We highlighted in bold the path corresponding to the linearized form (b) of the example parse tree (a). |
Pattern extraction by sentence compression | Instead, we chose to modify the method of Filippova and Altun (2013) because it relies on dependency parse trees and does not use any LM scoring. |
Approach | Most prior work on learning compositional semantic representations employs parse trees on their training data to structure their composition functions (Socher et al., 2012; Hermann and Blunsom, 2013, inter alia). |
Approach | While these methods have been shown to work in some cases, the need for parse trees and annotated data limits such approaches to resource-fortunate languages. |
Overview | This removes a number of constraints that normally come with CVM models, such as the need for syntactic parse trees , word alignment or annotated data as a training signal. |
Introduction | Figure 1: An example of compositionality in ideological bias detection (red —> conservative, blue —> liberal, gray —> neutral) in which modifier phrases and punctuation cause polarity switches at higher levels of the parse tree . |
Recursive Neural Networks | Based on a parse tree , these words form phrases p (Figure 2). |
Where Compositionality Helps Detect Ideological Bias | The increased accuracy suggests that the trained RNNs are capable of detecting bias polarity switches at higher levels in parse trees . |
Background | Many of them focus on using tree kernels to learn parse tree structure related features (Collins and Duffy, 2001; Culotta and Sorensen, 2004; Bunescu and Mooney, 2005). |
Identifying Key Medical Relations | Figure 2: A Parse Tree Example |
Identifying Key Medical Relations | Consider the sentence: “Antibiotics are the standard therapy for Lyme disease”: MedicalESG first generates a dependency parse tree (Figure 2) to represent grammatical relations between the words in the sentence, and then associates the words with CUIs. |
Abstract | We first design two discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory. |
Conclusions and Future Work | First, we defined two simple discourse-aware similarity metrics (lexicalized and un-lexicalized), which use the all-subtree kernel to compute similarity between discourse parse trees in accordance with the Rhetorical Structure Theory. |
Experimental Setup | Combination of four metrics based on syntactic information from constituency and dependency parse trees : ‘CP—STM-4’, ‘DP-HWCM_c-4’, ‘DP—HWCM1-4’, and ‘DP-Or(*)’. |
Background | Figure 2: A parse tree (left) and its descending paths according to Definition 1 (l - length). |
Methods | Definition 1 (Descending Path): Let T be a parse tree , 2) any nonterminal node in T, do a descendant of 2), including terminals. |
Methods | Figure 2 illustrates a parse tree and its descending paths of different lengths. |
Abstract | Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. |
Backgrounds | 3The forest includes three parse trees rooted at CD, cl, and c2. |
Composed Rule Extraction | 0 C(21): the complement span of U, which is the union of corresponding spans of nodes 21’ that share an identical parse tree with 2) but are neither antecedents nor descendants of v; |
Efficient Prediction | Variables 20 indicate edges in the parse tree that have been cut in order to remove subtrees. |
Joint Model | Variables yn indicate the presence of parse tree nodes. |
Joint Model | We represent a compressive summary as a vector y = (yn : n E 258, s E c) of indicators, one for each nonterminal node in each parse tree of the sentences in the document set c. A word is present in the output summary if and only if its parent parse tree node n has yn = 1 (see Figure lb). |
Experiments | In Figure 3 we show for an example from section 22 the parse trees produced by our generative model and our feature-based discriminative model, and the correct parse. |
The Model | of the parse tree , given the sentence, not joint likelihood of the tree and sentence; and (b) probabilities are normalized globally instead of locally —the graphical models depiction of our trees is undirected. |
The Model | We define t"(s) to be the set of all possible parse trees for the given sentence licensed by the grammar G. |
A Generative PCFG Model | 212,, and a morphological analyzer, we look for the most probable parse tree 7r s.t. |
A Generative PCFG Model | Hence, our parser searches for a parse tree 7r over lexemes (ll H.119) s.t. |
A Generative PCFG Model | Thus our proposed model is a proper model assigning probability mass to all (7r, L) pairs, where 7r is a parse tree and L is the one and only lattice that a sequence of characters (and spaces) W over our alpha-beth gives rise to. |
Summary and Outlook | Furthermore, we aim to use the verb class model in NLP tasks, (i) as resource for lexical induction of verb senses, verb alternations, and collocations, and (ii) as a lexical resource for the statistical disambiguation of parse trees . |
Verb Class Model 2.1 Probabilistic Model | Figure 1: Example parse tree . |
Verb Class Model 2.1 Probabilistic Model | (b) The training tuples are processed: For each tuple, a PCFG parse forest as indicated by Figure l is done, and the Inside-Outside algorithm is applied to estimate the frequencies of the ”parse tree rules”, given the current model probabilities. |
Proposed Method | Let SE be an English sentence, TE the parse tree of SE, 6 a word of SE, we define the subtree and partial subtree following the definitions in (Ouan-graoua et al., 2007). |
Proposed Method | If e,-is a descendant of ej in the parse tree , we remove p05,- from PE(e). |
Proposed Method | Note that the Chinese patterns are not extracted from parse trees . |
Algorithm | A sequence of words will be marked as an argument of the verb if it is a constituent that does not contain the verb (according to the unsupervised parse tree ), whose parent is an ancestor of the verb. |
Algorithm | Each word in the argument is now represented by its word form (without lemmatization), its unsupervised POS tag and its depth in the parse tree of the argument. |
Algorithm | Instead, only those appearing in the top level (depth = l) of the argument under its unsupervised parse tree are taken. |
Introduction | We then gradually add in less-sparse alternatives for the syntactic features that previous systems derive from parse trees . |
Introduction | In standard SRL systems, these path features usually consist of a sequence of constituent parse nodes representing the shortest path through the parse tree between a word and the predicate (Gildea and Jurafsky, 2002). |
Introduction | We substitute paths that do not depend on parse trees . |
Evaluation | The major drawback of PER is that not all decisions in pruning would impact on alignment quality, since certain F-spans are of little use to the entire ITG parse tree . |
Pruning in ITG Parsing | Once the complete parse tree is built, the k-best list of the topmost span is obtained by minimally expanding the list of alignment hypotheses of minimal number of span pairs. |
The DPDI Framework | If the sentence-level annotation satisfies the alignment constraints of ITG, then each F-span will have only one E-span in the parse tree . |
Cognitively Grounded Cost Modeling | As for syntactic complexity, we use two measures based on structural complexity including (a) the number of nodes of a constituency parse tree which are dominated by the annotation phrase (cf. |
Experimental Design | We defined two measures for the complexity of the annotation examples: The syntactic complexity was given by the number of nodes in the constituent parse tree which are dominated by the annotation phrase (Szmrecsanyi, 2004).1 According to a threshold on the number of nodes in such a parse tree , we classified CNPs as having either high or low syntactic complexity. |
Introduction | Structural complexity emerges, e. g., from the static topology of phrase structure trees and procedural graph traversals exploiting the topology of parse trees (see Szmrecsanyi (2004) or Cheung and Kemper (1992) for a survey of metrics of this type). |
Approach | Figure 1 shows a portion of the parse tree for Sentence (1) (from Section 1). |
Approach | We extract the scope of the reference from the parse tree as follows. |
Approach | For example, the parse tree shown in Figure 1 suggests that the scope of the reference is: |
Character-based Chinese Parsing | As shown in Figure 5, a state ST consists of a stack S and a queue Q, where S = -- ,81,SO) contains partially constructed parse trees , and Q = (Q07Q1,°°° 7Qn—j) = (Cj,Cj+1,°°° ,Cn) iS th€ sequence of input characters that have not been processed. |
Character-based Chinese Parsing | parse tree 80 must correspond to a fullword |
Introduction | With richer information than word-level trees, this form of parse trees can be useful for all the aforementioned Chinese NLP applications. |
Forest-to-string Translation | The search problem is finding the derivation with the highest probability in the space of all derivations for all parse trees for an input sentence. |
Introduction | Second, the parse tree is restructured using our binarization algorithm, resulting in a binary packed forest. |
Source Tree Binarization | In a correct English parse tree , however, the subject-verb boundary is between “There” and “is”. |
Evaluation | In order to compare both approaches, parse trees generated by BKYc were automatically transformed in trees with the same MWE annotation scheme as the trees generated by BKY. |
MWE-dedicated Features | The reranker templates are instantiated only for the nodes of the candidate parse tree , which are leaves dominated by a MWE node (i.e. |
MWE-dedicated Features | dominated by a MWE node m in the current parse tree p, |
Method | Since EDU boundaries are highly correlated with the syntactic structures embedded in the sentences, EDU segmentation is a relatively trivial step — using machine- generated syntactic parse trees , HILDA achieves an F -score of 93.8% for EDU segmentation. |
Method | HILDA’s features: We incorporate the original features used in the HILDA discourse parser with slight modification, which include the following four types of features occurring in SL, SR, or both: (1) N-gram prefixes and suffixes; (2) syntactic tag prefixes and suffixes; (3) lexical heads in the constituent parse tree ; and (4) PCS tag of the dominating nodes. |
Related work | They showed that the production rules extracted from constituent parse trees are the most effective features, while contextual features are the weakest. |
Experiments | We used syntactic features (i.e., features obtained from the dependency parse tree of a sentence) and lexical features, and entity types, which essentially correspond to the ones developed by Mintz et a1. |
Knowledge-based Distant Supervision | Since two entities mentioned in a sentence do not always have a relation, we select entity pairs from a corpus when: (i) the path of the dependency parse tree between the corresponding two named entities in the sentence is no longer than 4 and (ii) the path does not contain a sentence-like boundary, such as a relative clause1 (Banko et al., 2007; Banko and Etzioni, 2008). |
Wrong Label Reduction | We define a pattern as the entity types of an entity pair2 as well as the sequence of words on the path of the dependency parse tree from the first entity to the second one. |
Inference | Following previous work, we design a blocked Metropolis-Hastings sampler that samples derivations per entire parse trees all at once in a joint fashion (Cohn and Blunsom, 2010; Shindo et al., 2011). |
Introduction | Recent work that incorporated Dirichlet process (DP) nonparametric models into TSGs has provided an efficient solution to the problem of segmenting training data trees into elementary parse tree fragments to form the grammar (Cohn et al., 2009; Cohn and Blunsom, 2010; Post and Gildea, 2009). |
Introduction | Figure 2: TIG-to-TSG transform: (a) and (b) illustrate transformed TSG derivations for two different TIG derivations of the same parse tree structure. |
Decoding | 12Theoretically, this allows that the decoder ignores unary parser nonterminals, which could also disappear when we make our rules shallow; e. g., the parse tree left in the pre-translation of Figure 5 can be matched by a rule with left-hand side NP(Official, forecasts). |
Theoretical Model | Since we utilize syntactic parse trees , let us introduce trees first. |
Translation Model | 8Actually, t must embed in the parse tree of 6; see Section 4. |
Introduction | While parsing algorithms can be used to parse partial translations in phrase-based decoding, the search space is significantly enlarged since there are exponentially many parse trees for exponentially many translations. |
Introduction | They suffice to operate on well-formed structures and produce projective dependency parse trees . |
Introduction | In addition, their algorithm produces phrasal dependency parse trees while the leaves of our dependency trees are words, making dependency language models can be directly used. |
Introduction | The goal of supervised parsing is to learn a function 9 : 26 —> y, where X is the set of sentences and y is the set of all possible labeled binary parse trees . |
Introduction | The loss increases the more incorrect the proposed parse tree is (Goodman, 1998). |
Introduction | Assume, for now, we are given a labeled parse tree as shown in Fig. |