Cohesive Decoding | The key to checking for interruptions quickly is knowing which subtrees T(r) to check for qualities (l:a,b,c). |
Cohesive Decoding | A na‘1've approach would check every subtree that has begun translation in Figure 3a highlights the roots of all such subtrees for a hypothetical T and Fortunately, with a little analysis that accounts for fh+1, we can show that at most two subtrees need to be checked. |
Cohesive Decoding | For a given interruption-free flh, we call subtrees that have begun translation, but are not yet complete, open subtrees . |
Cohesive Phrasal Output | This alignment is used to project the spans of subtrees from the source tree onto the target sentence. |
Introduction | Equivalently, one can say that phrases in the source, defined by subtrees in its parse, remain contiguous after translation. |
Abstract | Adaptor grammars (Johnson et al., 2007b) are a nonparametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees . |
From PCFGs to Adaptor Grammars | An adaptor grammar can be viewed as a kind of PCFG in which each subtree of each adapted nonterminal A E M is a potential rule, with its own probability, so an adaptor grammar is nonparametric if there are infinitely many possible adapted subtrees . |
From PCFGs to Adaptor Grammars | But any finite set of sample parses for any finite corpus can only involve a finite number of such subtrees , so the corresponding PCFG approximation only involves a finite number of rules, which permits us to build MCMC samplers for adaptor grammars. |
From PCFGs to Adaptor Grammars | this independence assumption by in effect learning the probability of the subtrees rooted in a specified subset M of the nonterminals known as the adapted nonterminals. |
Introduction | Second, we can generalize over arbitrary subtrees rather than local trees in much the way done in DOP or tree substitution grammar (Bod, 1998; J oshi, 2003), which leads to adaptor grammars. |
Introduction | Informally, the units of generalization of adaptor grammars are entire subtrees , rather than just local trees, as in PCFGs. |
Introduction | Just as in tree substitution grammars, each of these subtrees behaves as a new context-free rule that expands the subtree’s root node to its leaves, but unlike a tree substitution grammar, in which the subtrees are specified in advance, in an adaptor grammar the subtrees , as well as their probabilities, are learnt from the training data. |
Word segmentation with adaptor grammars | Depending on the run, between 1, 100 and 1, 400 subtrees (i.e., new rules) were found for Word. |
Bilingual subtree constraints | We use large-scale auto-parsed data to obtain subtrees on the target side. |
Bilingual subtree constraints | Then we generate the mapping rules to map the source subtrees onto the extracted target subtrees . |
Bilingual subtree constraints | These features indicate the information of the constraints between bilingual subtrees , that are called bilingual subtree constraints. |
Dependency parsing | We design bilingual subtree features, as described in Section 4, based on the constraints between the source subtrees and the target subtrees that are verified by the subtree list on the target side. |
Dependency parsing | The source subtrees are from the possible dependency relations. |
Introduction | The subtrees are extracted from large-scale auto-parsed monolingual data on the target side. |
Abstract | We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. |
Budgeted Submodular Maximization with Cost Function | Let V be the finite set of all valid subtrees in the source documents, where valid subtrees are defined to be the ones that can be regarded as grammatical sentences. |
Budgeted Submodular Maximization with Cost Function | In this paper, we regard subtrees containing the root node of the sentence as valid. |
Budgeted Submodular Maximization with Cost Function | Accordingly, V denotes a set of all rooted subtrees in all sentences. |
Introduction | In this study, we avoid this difficulty by reducing the task to one of extracting dependency subtrees from sentences in the source documents. |
Introduction | The reduction replaces the difficulty of numerous linear constraints with another difficulty wherein two subtrees can share the same word to- |
Introduction | ken when they are selected from the same sentence, and as a result, the cost of the union of the two subtrees is not always the mere sum of their costs. |
Introduction | by (Chi et al., 2004) for discovering frequently occurring subtrees in a database of labelled unordered trees. |
Introduction | Section 3 shows how to adapt this algorithm to mine the SR dependency trees for subtrees with high suspicion rate. |
Mining Dependency Trees | Since we work with subtrees of arbitrary length, we also need to check whether constructing a longer subtree is useful that is, whether its suspicion rate is equal or higher than the suspicion rate of any of the subtrees it contains. |
Mining Dependency Trees | In that way, we avoid computing all subtrees (thus saving time and space). |
Mining Dependency Trees | Because we use a milder condition however (we accept bigger trees whose suspicion rate is equal to the suspicion rate of any of their subtrees ), some amount of |
Mining Trees | Mining for frequent subtrees is an important problem that has many applications such as XML data mining, web usage analysis and RNA classification. |
Mining Trees | The HybridTreeMiner (HTM) algorithm presented in (Chi et al., 2004) provides a complete and com—putationally efficient method for discovering frequently occurring subtrees in a database of labelled unordered trees and counting them. |
Mining Trees | Second, the subtrees of the BFCF trees are enumerated in increasing size order using two tree operations called join and extension and their support (the number of trees in the database that contains each subtree) is recorded. |
Conclusion | We also discussed the effectiveness of sentence subtree selection that did not restrict rooted subtrees . |
Experiment | Rooted sentence subtree only selects rooted sentence subtrees 2. |
Experiment | As we can see, subtree selection only selected important subtrees that did not include the parser’s root, e.g., purpose-clauses and that-clauses. |
Generating summary from nested tree | In particular, we extract a rooted document subtree from the document tree, and sentence subtrees from sentence trees in the document tree. |
Generating summary from nested tree | to extract non-rooted sentence subtrees , as we previously mentioned. |
Generating summary from nested tree | Constraints (6)-(10) allow the model to extract subtrees that have an arbitrary root node. |
Introduction | Our method jointly utilizes relations between sentences and relations between words, and extracts a rooted document subtree from a document tree whose nodes are arbitrary subtrees of the sentence tree. |
Related work | However, these studies have only extracted rooted subtrees from sentences. |
Related work | The method of Filippova and Strube (2008) allows the model to extract non-rooted subtrees in sentence compression tasks that compress a single sentence with a given compression ratio. |
Introduction | Therefore, it runs in linear time and can take advantage of arbitrarily complex structural features from already constructed subtrees . |
Joint POS Tagging and Parsing with Nonlocal Features | Assuming an input sentence contains n words, in order to reach a terminal state, the initial state requires n sh—x actions to consume all words in 6, and n — l rl/rr—x actions to construct a complete parse tree by consuming all the subtrees in 0. |
Joint POS Tagging and Parsing with Nonlocal Features | One advantage of transition-based constituent parsing is that it is capable of incorporating arbitrarily complex structural features from the already constructed subtrees in 0 and unprocessed words in 6. |
Joint POS Tagging and Parsing with Nonlocal Features | Instead, we attempt to extract nonlocal features from newly constructed subtrees during the decoding process as they become incrementally available and score newly generated parser states with them. |
Transition-based Constituent Parsing | A parser state 8 E S is defined as a tuple s = (o, 6), where o is a stack which is maintained to hold partial subtrees that are already constructed, and 6 is a queue which is used for storing word-POS pairs that remain unprocessed. |
Transition-based Constituent Parsing | 0 REDUCE-BINARY—{L/R}-X (rl/rr—X): pop the top two subtrees from 0, combine them into a new tree with a node labeled with X, then push the new subtree back onto 0. |
Experiments | Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees . |
Forest-based translation | Shown in Figure 3(a), these two parse trees can be represented as a single forest by sharing common subtrees such as NPB0,1 and VPB3,6. |
Introduction | However, a k-best list, with its limited scope, often has too few variations and too many redundancies; for example, a 50—best list typically encodes a combination of 5 or 6 binary ambiguities (since 25 < 50 < 26), and many subtrees are repeated across different parses (Huang, 2008). |
Introduction | Large-scale experiments (Section 4) show an improvement of 1.7 BLEU points over the l-best baseline, which is also 0.8 points higher than decoding with 30-best trees, and takes even less time thanks to the sharing of common subtrees . |
Tree-based systems | (Liu et al., 2007) was a misnomer which actually refers to a set of several unrelated subtrees over disjoint spans, and should not be confused with the standard concept of packed forest. |
Tree-based systems | which results in two unfinished subtrees in (c). |
Tree-based systems | which perform phrasal translations for the two remaining subtrees , respectively, and get the Chinese translation in (e). |
Shift-Reduce with Beam-Search | Following Zhang and Clark (2011), we define each item in the parser as a pair (3, q), where q is a queue of remaining input, consisting of words and a set of possible lexical categories for each word (with go being the front word), and s is the stack that holds subtrees so, 31, (with so at the top). |
Shift-Reduce with Beam-Search | Subtrees on the stack are partial deriva- |
The Dependency Model | If |s| > 0 and the subtrees on s can lead to a correct derivation in (D9 using further actions, we say 3 is a partial-realization of G, denoted as s N G. And we define s N G for |s| = 0. |
The Dependency Model | 2a; then a stack containing the two subtrees in Fig. |
The Dependency Model | 3a is a partial-realization, while a stack containing the three subtrees in Fig. |
The effect of the Italian connectives on the LIS translation | In effect, since we have hypothesized that the presence of a connective can affect the translation of the two subtrees that it connects, we would like to be able to align each of those subtrees to its translation. |
The effect of the Italian connectives on the LIS translation | We make the observation that, if two words belong to two different subtrees linked by a connective, so that the path between the two words goes through the connective, then the frontier between the LIS counterparts of those two subtrees should also lie along the path between the signs aligned with those two words. |
The effect of the Italian connectives on the LIS translation | Then, each pair of words belonging to different subtrees is linked by a path that goes through the connective in the original tree. |
Experiments | Indeed, this demonstrates the severe redundancies as another disadvantage of n-best lists, where many subtrees are repeated across different parses, while the packed forest reduces space dramatically by sharing common sub-derivations (see Fig. |
Forest Reranking | unit NGramTree instance is for the pair (wj_1, 2123-) on the boundary between the two subtrees , whose smallest common ancestor is the current node. |
Forest Reranking | Other unit NGramTree instances within this span have already been computed in the subtrees , except those for the boundary words of the whole node, 212,- and wk,_1, which will be computed when this node is further combined with other nodes in the future. |
Introduction | The key idea is to compute nonlocal features incrementally from bottom up, so that we can rerank the n-best subtrees at all internal nodes, instead of only at the root node as in conventional reranking (see Table 1). |
Supporting Forest Algorithms | In other words, the optimal F-score tree in a forest is not guaranteed to be composed of two optimal F-score subtrees . |
Supervised Dependency Parsing | Under the graph-based model, the score of a dependency tree is factored into the scores of small subtrees p. |
Supervised Dependency Parsing | Figure 2: Two types of scoring subtrees in our second-order graph-based parsers. |
Supervised Dependency Parsing | We adopt the second-order graph-based dependency parsing model of McDonald and Pereira (2006) as our core parser, which incorporates features from the two kinds of subtrees in Fig. |
Introduction | They define distributions over the trees specified by a context-free grammar, but unlike probabilistic context-free grammars, they “learn” distributions over the possible subtrees of a user-specified set of “adapted” nonterminals. |
Introduction | set of parameters, if the set of possible subtrees of the adapted nonterminals is infinite). |
Introduction | Informally, Adaptor Grammars can be viewed as caching entire subtrees of the adapted nonterminals. |
Word segmentation with Adaptor Grammars | Because Word is an adapted nonterminal, the adaptor grammar memoises Word subtrees , which corresponds to learning the phone sequences for the words of the language. |
Generation Systems | Similar to the syntax component, the REG module is implemented as a ranker that selects surface RE subtrees for a given referential slot in a deep or shallow dependency tree. |
The Data Set | In the final representation of our data set, we integrate the RE and deep syntax annotation by replacing subtrees corresponding to an RE span. |
The Data Set | All RE subtrees for a referent in a text are collected in a candidate list which is initialized with three default RES: (i) a pronoun, (ii) a default nominal (e. g. “the Victim”), (iii) the empty RE. |
The Data Set | In contrast to the GREC data sets, our RE candidates are not represented as the original surface strings, but as non-linearized subtrees . |
Experiments | First, only subtrees containing no more than 10 words were used to induce English patterns. |
Proposed Method | To induce the aligned patterns, we first induce the English patterns using the subtrees and partial subtrees . |
Proposed Method | 1Note that, a subtree may contain several partial subtrees . |
Proposed Method | In this paper, all the possible partial subtrees are considered when extracting paraphrase patterns. |
Hierarchical Coreference | More specificaly, for each MH step, we first randomly select two subtrees headed by node- |
Hierarchical Coreference | Otherwise T7; and rj are subtrees in the same entity tree, then the following proposals are used instead: 0 Split Right - Make the subtree rj the root of a new entity by detaching it from its parent 0 Collapse - If 73; has a parent, then move 73’s children to ri’s parent and then delete m. |
Introduction | each step of inference is computationally efficient because evaluating the cost of attaching (or detaching) subtrees requires computing just a single compatibility function (as seen in Figure 1). |
Introduction | Finally, if memory is limited, redundant mentions can be pruned by replacing subtrees with their roots. |
Conclusion | The system reuses standard techniques for building projective trees by combining adjacent nodes (representing subtrees with adjacent yields), but adds a simple mechanism for swapping the order of nodes on the stack, which gives a system that is sound and complete for the set of all dependency trees over a given label set but behaves exactly like the standard system for the subset of projective trees. |
Introduction | This is not the case for the tree in Figure l, where the subtrees rooted at node 2 (hearing) and node 4 (scheduled) both have discontinuous yields. |
Transitions for Dependency Parsing | The fact that we can swap the order of nodes, implicitly representing subtrees , means that we can construct non-projective trees by applying |
Transitions for Dependency Parsing | LEFT-ARC; or RIGHT-ARC; to subtrees whose yields are not adjacent according to the original word order. |
Model | This algorithm proceeds in top-down fashion by sampling individual split points using the marginal probabilities of all possible subtrees . |
Model | To do so directly would involve simultaneously marginalizing over all possible subtrees as well as all possible alignments between such subtrees when sampling upper-level split points. |
Model | For every pair of nodes n1 6 T1,n2 6 T2, a table stores the marginal probability of the subtrees rooted at m and 712, respectively. |
Efficient Prediction | Variables 20 indicate edges in the parse tree that have been cut in order to remove subtrees . |
Efficient Prediction | Constraint 9 encodes the requirement that only full subtrees may be deleted. |
Joint Model | In our complete model, which jointly extracts and compresses sentences, we choose whether or not to cut individual subtrees in the constituency parses |
Joint Model | This ensures that only subtrees may be deleted. |
The Acquisition of Bracketing Instances | From a binary bracketing instance, we derive a unary bracketing instance ((9,710,119)), ignoring the subtrees 7(cz-nj) and flog-+1.19). |
The Syntax-Driven Bracketing Model 3.1 The Model | These features are to capture the relationship between a source phrase 3 and 7(3) or 7(3)’s subtrees . |
The Syntax-Driven Bracketing Model 3.1 The Model | There are three different scenarios3: l) exact match, where 3 exactly matches the boundaries of 7(3) (figure 3(a)), 2) inside match, where 3 exactly spans a sequence of 7(3)’s subtrees (figure 3(b)), and 3) crossing, where 3 crosses the boundaries of one or two subtrees of 7(3) (figure 3(c)). |
The Syntax-Driven Bracketing Model 3.1 The Model | The source phrase 32 exactly spans two subtrees VV and AS of VP, therefore CBMF is “VP-I”. |
Composed Rule Extraction | In order to ensure that this rule is used during decoding, we must generate subtrees with a height of 7 for CO. |
Composed Rule Extraction | 4For one (binarized) hyperedge e of a node, suppose there are a: subtrees in the left tail node and y subtrees in the right tail node. |
Composed Rule Extraction | Then the number of subtrees guided by e is (a: —|— 1) X (y —|— 1). |
Generating from the KBGen Knowledge-Base | First, the subtrees whose root node is indexed with an entity variable are extracted. |
Generating from the KBGen Knowledge-Base | Second, the subtrees capturing relations between variables are extracted. |
Generating from the KBGen Knowledge-Base | The minimal tree containing all and only the dependent variables D(X) of a variable X is then extracted and associated with the set of literals (I) such that (l) = {R(Y,Z) | (Y = X/\Z E D(X))\/(Y,Z E D(X This procedure extracts the subtrees relating the argument variables of a semantic func-tors such as an event or a role e.g., a tree describing a verb and its arguments as shown in the top |
Bottom-up tree-building | Rather, each relation node R j attempts to model the relation of one single constituent U], by taking U j’s left and right subtrees U j; and U LR as its first-layer nodes; if U j is a single EDU, then the first-layer node of R J- is simply U j, and R j is a special relation symbol LEAF3. |
Conclusions | In future work, we wish to further explore the idea of post-editing, since currently we use only the depth of the subtrees as upper-level information. |
Features | Substructure features: The root node of the left and right discourse subtrees of each unit. |
Introduction | However, while the former problem can be solved efficiently using the dynamic programming approach of McDonald (2006), there are no efficient algorithms to recover maximum weighted non-projective subtrees in a general directed graph. |
Multi-Structure Sentence Compression | 2.4 Dependency subtrees |
Multi-Structure Sentence Compression | In addition, to avoid producing multiple disconnected subtrees , only one dependency is permitted to attach to the ROOT pseudo-token. |
Discourse Dependency Parsing | The algorithm begins by initializing all length-one subtrees to a score of 0.0. |
Discourse Dependency Parsing | over all the internal indices (iSqu) in the span, and calculating the value of merging the two subtrees and adding one new arc. |
Discourse Dependency Parsing | This algorithm considers all the possible subtrees . |
Adaptor Grammars | Adaptor grammars are an example of this approach (Johnson et al., 2007b), where entire subtrees generated by a “base grammar” can be viewed as distinct rules (in that we learn a separate probability for each subtree). |
Adaptor Grammars | The inference task is nonparametric if there are an unbounded number of such subtrees . |
Adaptor Grammars | (Word s i) (Word d 6) (Word b u k) Because the Word nonterminal is adapted (indicated here by underlining) the adaptor grammar learns the probability of the entire Word subtrees (e.g., the probability that b a k is a Word); see Johnson (2008) for further details. |
Our Discourse-Based Measures | In the present work, we use the convolution TK defined in (Collins and Duffy, 2001), which efficiently calculates the number of common subtrees in two trees. |
Our Discourse-Based Measures | Note that this kernel was originally designed for syntactic parsing, where the subtrees are subject to the constraint that their nodes are taken with either all or none of the children. |
Our Discourse-Based Measures | the nuclearity and the relations, in order to allow the tree kernel to give partial credit to subtrees that differ in labels but match in their skeletons. |
Introduction | Note, question decomposition only operates on the original question and question spans covered by complete dependency subtrees . |
Introduction | 0 hsyntaasubtreefl), which counts the number of spans in Q that are (I) converted to formal triples, whose predicates are not Null, and (2) covered by complete dependency subtrees at the same time. |
Introduction | The underlying intuition is that, dependency subtrees of Q should be treated as units for question translation. |
Experiments | General setup: In order to test the accuracy of structured prediction on medium-sized full-domain taxonomies, we extracted from WordNet 3.0 all bottomed-out full subtrees which had a tree-height of 3 (i.e., 4 nodes from root to leaf), and contained (10, 50] terms.11 This gives us 761 non-overlapping trees, which we partition into |
Experiments | 13We tried this training regimen as different from that of the general setup (which contains only bottomed-out subtrees ), so as to match the animal test tree, which is of depth 12 and has intermediate nodes from higher up in WordNet. |
Experiments | For scaling the 2nd order sibling model, one can use approximations, e. g., pruning the set of sibling factors based on lst order link marginals, or a hierarchical coarse-to-fine approach based on taxonomy induction on subtrees , or a greedy approach of adding a few sibling factors at a time. |
Analysis Scheme | We use “term” to refer to text expressions, and “components” to refer to nodes, edges, and subtrees . |
Integrating Discourse References into Entailment Recognition | Figure 1: The Substitution transformation, demonstrated on the relevant subtrees of Example (i). |
Integrating Discourse References into Entailment Recognition | For each bridging relation, it adds a specific subtrees 87" via an edge labeled with labr. |
Introduction | Our hypothesis is a generalization of the original hypothesis since it allows a reducible sequence to form several adjacent subtrees . |
Related Work | Our dependency model contained a submodel which directly prioritized subtrees that form reducible sequences of POS tags. |
STOP-probability estimation | Or, in terms of dependency structure: A reducible sequence consists of one or more adjacent subtrees . |
Notation | We also need a substitution that replaces subtrees . |
Notation | Then t[pi <— 75, | l g i g n] denotes the tree that is obtained from t by replacing (in parallel) the subtrees at pi by ti for every 2' E |
Preservation of regularity | This is necessary because those two mentioned subtrees must reproduce t1 and 752 from the end of the ‘X’-chain. |
IRTG binarization | 5b): (i) they are equivalent to h1(a) and h2(a), respectively, and (ii) at each node at most two subtrees contain variables. |
IRTG binarization | The variable set of t is the set of all variables that occur in t. The set S(t) of subtree variables of 75 consists of the nonempty variable sets of all subtrees of t. We represent S(t) as a tree v(t), which we call variable tree as follows. |
IRTG binarization | at most two subtrees with variables; and (iii) the terms t1, . |
Background | The PTB tree is constructed from the CCG bottom-up, creating leaves with lexical schemas, then merg-ing/adding subtrees using rule schemas at each step. |
Our Approach | (8* f {a}) or default to X f, Place subtrees (PP f0 (S f1” k a)) |
Our Approach | The subscripts indicate which subtrees to place where. |