Cohesive Decoding | The key to checking for interruptions quickly is knowing which subtrees T(r) to check for qualities (l:a,b,c). |
Cohesive Decoding | A na‘1've approach would check every subtree that has begun translation in Figure 3a highlights the roots of all such subtrees for a hypothetical T and Fortunately, with a little analysis that accounts for fh+1, we can show that at most two subtrees need to be checked. |
Cohesive Decoding | For a given interruption-free flh, we call subtrees that have begun translation, but are not yet complete, open subtrees . |
Cohesive Phrasal Output | This alignment is used to project the spans of subtrees from the source tree onto the target sentence. |
Introduction | Equivalently, one can say that phrases in the source, defined by subtrees in its parse, remain contiguous after translation. |
Abstract | Adaptor grammars (Johnson et al., 2007b) are a nonparametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees . |
From PCFGs to Adaptor Grammars | An adaptor grammar can be viewed as a kind of PCFG in which each subtree of each adapted nonterminal A E M is a potential rule, with its own probability, so an adaptor grammar is nonparametric if there are infinitely many possible adapted subtrees . |
From PCFGs to Adaptor Grammars | But any finite set of sample parses for any finite corpus can only involve a finite number of such subtrees , so the corresponding PCFG approximation only involves a finite number of rules, which permits us to build MCMC samplers for adaptor grammars. |
From PCFGs to Adaptor Grammars | this independence assumption by in effect learning the probability of the subtrees rooted in a specified subset M of the nonterminals known as the adapted nonterminals. |
Introduction | Second, we can generalize over arbitrary subtrees rather than local trees in much the way done in DOP or tree substitution grammar (Bod, 1998; J oshi, 2003), which leads to adaptor grammars. |
Introduction | Informally, the units of generalization of adaptor grammars are entire subtrees , rather than just local trees, as in PCFGs. |
Introduction | Just as in tree substitution grammars, each of these subtrees behaves as a new context-free rule that expands the subtree’s root node to its leaves, but unlike a tree substitution grammar, in which the subtrees are specified in advance, in an adaptor grammar the subtrees , as well as their probabilities, are learnt from the training data. |
Word segmentation with adaptor grammars | Depending on the run, between 1, 100 and 1, 400 subtrees (i.e., new rules) were found for Word. |
Experiments | Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees . |
Forest-based translation | Shown in Figure 3(a), these two parse trees can be represented as a single forest by sharing common subtrees such as NPB0,1 and VPB3,6. |
Introduction | However, a k-best list, with its limited scope, often has too few variations and too many redundancies; for example, a 50—best list typically encodes a combination of 5 or 6 binary ambiguities (since 25 < 50 < 26), and many subtrees are repeated across different parses (Huang, 2008). |
Introduction | Large-scale experiments (Section 4) show an improvement of 1.7 BLEU points over the l-best baseline, which is also 0.8 points higher than decoding with 30-best trees, and takes even less time thanks to the sharing of common subtrees . |
Tree-based systems | (Liu et al., 2007) was a misnomer which actually refers to a set of several unrelated subtrees over disjoint spans, and should not be confused with the standard concept of packed forest. |
Tree-based systems | which results in two unfinished subtrees in (c). |
Tree-based systems | which perform phrasal translations for the two remaining subtrees , respectively, and get the Chinese translation in (e). |
Experiments | Indeed, this demonstrates the severe redundancies as another disadvantage of n-best lists, where many subtrees are repeated across different parses, while the packed forest reduces space dramatically by sharing common sub-derivations (see Fig. |
Forest Reranking | unit NGramTree instance is for the pair (wj_1, 2123-) on the boundary between the two subtrees , whose smallest common ancestor is the current node. |
Forest Reranking | Other unit NGramTree instances within this span have already been computed in the subtrees , except those for the boundary words of the whole node, 212,- and wk,_1, which will be computed when this node is further combined with other nodes in the future. |
Introduction | The key idea is to compute nonlocal features incrementally from bottom up, so that we can rerank the n-best subtrees at all internal nodes, instead of only at the root node as in conventional reranking (see Table 1). |
Supporting Forest Algorithms | In other words, the optimal F-score tree in a forest is not guaranteed to be composed of two optimal F-score subtrees . |
Experiments | First, only subtrees containing no more than 10 words were used to induce English patterns. |
Proposed Method | To induce the aligned patterns, we first induce the English patterns using the subtrees and partial subtrees . |
Proposed Method | 1Note that, a subtree may contain several partial subtrees . |
Proposed Method | In this paper, all the possible partial subtrees are considered when extracting paraphrase patterns. |