Abstract | As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. |
Introduction | 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering. |
Introduction | 5. resolving local parsing ambiguity: as dependency trees for phrases are memorized in rules, our approach avoids resolving local parsing ambiguity and explores in a smaller search space than parsing word-by-word on the fly in decoding (Galley and Manning, 2009). |
Introduction | Figure 1 shows a training example consisting of a (romanized) Chinese sentence, an English dependency tree , and the word alignment between them. |
Background | Poon & Domingos (2009, 2010) induce a meaning representation by clustering synonymous lambda-calculus forms stemming from partitions of dependency trees . |
Background | USP defines a probabilistic model over the dependency tree and semantic parse using Markov logic (Domingos and Lowd, 2009), and recursively clusters and composes synonymous dependency treelets using a hard EM-like procedure. |
Background | Top: the dependency tree of the sentence is annotated with latent semantic states by GUSP. |
Grounded Unsupervised Semantic Parsing | GUSP produces a semantic parse of the question by annotating its dependency tree with latent semantic states. |
Grounded Unsupervised Semantic Parsing | Second, in contrast to most existing approaches for semantic parsing, GUSP starts directly from dependency trees and focuses on translating them into semantic parses. |
Grounded Unsupervised Semantic Parsing | To combat this problem, GUSP introduces a novel dependency-based meaning representation with an augmented state space to account for semantic relations that are nonlocal in the dependency tree . |
Introduction | GUSP starts with the dependency tree of a sentence and produces a semantic parse by annotating the nodes and edges with latent semantic states derived from the database. |
Dependency Parser | In this section we present a novel transition-based parser for projective dependency trees , implementing a dynamic parsing strategy. |
Dependency Parser | A dependency tree for w is a directed, ordered tree Tu, = (Vw, Aw), where Vw = | z' E |
Dependency Parser | Figure 2: A dependency tree with left spine (2114, 2112,2111) and right spine (2114, 7.07). |
Model and Training | The training data set consists of pairs (212,149), where w is a sentence and A9 is the set of arcs of the gold (desired) dependency tree for 212. |
Model and Training | We remark here that this abstraction also makes the feature representation more similar to the ones typically found in graph-based parsers, which are centered on arcs or subgraphs of the dependency tree . |
Abstract | This paper proposes a nonparametric Bayesian method for inducing Part-of-Speech (POS) tags in dependency trees to improve the performance of statistical machine translation (SMT). |
Abstract | In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilingual scenario: each hidden state (POS tag) of a source-side dependency tree emits a source word together with its aligned target word, either jointly (joint model), or independently (independent model). |
Abstract | Evaluations of J apanese-to-English translation on the NTCIR-9 data show that our induced Japanese POS tags for dependency trees improve the performance of a forest-to-string SMT system. |
Bilingual Infinite Tree Model | Specifically, the proposed model introduces bilingual observations by embedding the aligned target words in the source-side dependency trees . |
Bilingual Infinite Tree Model | We have assumed a completely unsupervised way of inducing POS tags in dependency trees . |
Bilingual Infinite Tree Model | Specifically, we introduce an auxiliary variable ut for each node in a dependency tree to limit the number of possible transitions. |
Discussion | Note that the dependency accuracies are measured on the automatically parsed dependency trees , not on the syntactically correct gold standard trees. |
Experiment | Since we focus on the word-level POS induction, each bunsetsu-based dependency tree is converted into its corresponding word-based dependency tree using the following heuristic9: first, the last function word inside each bunsetsu is identified as the head wordlo; then, the remaining words are treated as dependents of the head word in the same bunsetsu; finally, a bunsetsu-based dependency structure is transformed to a word-based dependency structure by preserving the head/modifier relationships of the determined head words. |
Experiment | 9We could use other word-based dependency trees such as trees by the infinite PCFG model (Liang et al., 2007) and syntactic-head or semantic-head dependency trees in Nakazawa and Kurohashi (2012), although it is not our major focus. |
Experiment | In this step, we train a Japanese dependency parser from the 10,000 Japanese dependency trees with the induced POS tags which are derived from Step 2. |
Introduction | Experiments are carried out on the NTCIR-9 Japanese-to-English task using a binarized forest-to-string SMT system with dependency trees as its source side. |
Experiments | This is quite high, but still, it is one of the lowest among other more frequent tags, and thus verbs tend to be the roots of the dependency trees . |
Inference | A random projective dependency tree is assigned to each sentence in the corpus. |
Inference | For each sentence, we sample a new dependency tree based on all other trees that are currently in the corpus. |
Inference | Parsing: Based on the collected counts, we compute the final dependency trees using the Chu-Liu/Edmonds’ algorithm (1965) for finding maximum directed spanning trees. |
Introduction | 1The adjacent-word baseline is a dependency tree in which each word is attached to the previous (or the following) word. |
Introduction | Figure 1: Example of a dependency tree . |
Introduction | Figure 1 shows an example of a dependency tree . |
Model | The probability of the whole dependency tree T is the following: |
Bk <— BESTS(X,Bk_1,W) | The algorithm proceeds until only one subtree left which is the dependency tree of the input sentence (see the example in figure 2). |
Bk <— BESTS(X,Bk_1,W) | Once an incorrect action is selected, it can never yield the correct dependency tree . |
Bk <— BESTS(X,Bk_1,W) | Finally, it returns the dependency tree built by the top action sequence in Bn_1. |
Easy-first dependency parsing | The easy-first dependency parsing algorithm (Goldberg and Elhadad, 2010) builds a dependency tree by performing two types of actions LEFT(i) and RIGHT(i) to a list of subtree structures p1,. |
Easy-first dependency parsing | Input: sentence x of n words, beam width s Output: one best dependency tree |
Training | 4 As shown in (Goldberg and Nivre 2012), most transition-based dependency parsers (Nivre et al., 2003; Huang and Sagae 2010;Zhang and Clark 2008) ignores spurious ambiguity by using a static oracle which maps a dependency tree to a single action sequence. |
Abstract | Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. |
Abstract | Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin. |
Experiments | Smith (2010) proposed a discriminative approach that first computes a tree kernel function between the dependency trees of the question and candidate sentence, and then learns a classifier based on the tree-edit features extracted. |
Problem Definition | Typically, the “ideal” alignment structure is not available in the data, and previous work exploited mostly syntactic analysis (e.g., dependency trees ) to reveal the latent mapping structure. |
Related Work | (2007) proposed a syntax-driven approach, where each pair of question and sentence are matched by their dependency trees . |
Related Work | Heilman and Smith (2010) proposed a discriminative approach that first computes a tree kernel function between the dependency trees of the question and candidate sentence, and then learns a classifier based on the tree-edit features extracted. |
The Constrained Optimization Task | The linear constraints in (3) will ensure that the arc variables for each sentence es encode a valid latent dependency tree , and that the f variables count up the features of these trees. |
The Constrained Optimization Task | This generative model defines a joint distribution over the sentences and their dependency trees . |
The Constrained Optimization Task | The constraints must declaratively specify that the arcs form a valid dependency tree and that the resulting feature values are as defined by the DMV. |
Bilingually-Guided Dependency Grammar Induction | From line 4-9, the objective is optimized with a generic optimization step in the subroutine climb(-, -, -, -, For each sentence we parse its dependency tree , and update the tree into the treebank (step 3). |
Experiments | The source sentences are then parsed by an implementation of 2nd-ordered MST model of McDonald and Pereira (2006), which is trained on dependency trees extracted from Penn Treebank. |
Introduction | The monolingual likelihood is similar to the optimization objectives of conventional unsupervised models, while the bilingually-projected likelihood is the product of the projected probabilities of dependency trees . |
Unsupervised Dependency Grammar Induction | Z(d€ij) : 2 WM: )‘n ° f’n(d€ij 7 (2) y n Given a sentence E, parsing a dependency tree is to find a dependency tree D E with maximum probability PE: |
Unsupervised Dependency Grammar Induction | Figure 2: Projecting a Chinese dependency tree to English side according to DPA. |
Joint Model of Extraction and Compression | In Japanese, syntactic subtrees that contain the root of the dependency tree of the original sentence often make grammatical sentences. |
Joint Model of Extraction and Compression | In this joint model, we generate a compressed sentence by extracting an arbitrary subtree from a dependency tree of a sentence. |
Joint Model of Extraction and Compression | To avoid generating such ungrammatical sentences, we need to detect and retain the obligatory dependency relations in the dependency tree . |
Introduction | The dominating solution in treebank design is to introduce artificial rules for the encoding of coordination structures within dependency trees using the same means that express dependencies, i.e., by using edges and by labeling of nodes or edges. |
Related work | Even this simplest case is difficult to represent within a dependency tree because, in the words of Lombardo and Lesmo (1998): Dependency paradigms exhibit obvious difi‘iculties with coordination because, diflerently from most linguistic structures, it is not possible to characterize the coordination construct with a general schema involving a head and some modifiers of it. |
Variations in representing coordination structures | In accordance with the usual conventions, we assume that each sentence is represented by one dependency tree , in which each node corresponds to one token (word or punctuation mark). |
Variations in representing coordination structures | Apart from that, we deliberately limit ourselves to CS representations that have shapes of connected subgraphs of dependency trees . |
Variations in representing coordination structures | We limit our inventory of means of expressing CSs within dependency trees to (i) tree topology (presence or absence of a directed edge between two nodes, Section 3.1), and (ii) node labeling (additional attributes stored insided nodes, Section 3.2).8 Further, we expect that the set of possible variations can be structured along several dimensions, each of which corresponds to a certain simple characteristic (such as choosing the leftmost conjunct as the CS head, or attaching shared modifiers below the nearest conjunct). |
Semi-supervised Parsing with Large Data | To simplify the extraction process, we can convert auto-parsed constituency trees into dependency trees by using Penn2Malt. |
Semi-supervised Parsing with Large Data | 2 From the dependency trees , we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl). |
Semi-supervised Parsing with Large Data | Formally, given a dependency tree 3/ of an input sentence :5, we can denote by H the set of words that have at least one dependent. |
The First Stage: Sentiment Graph Walking Algorithm | For a given sentence, we first obtain its dependency tree . |
The First Stage: Sentiment Graph Walking Algorithm | Figure 1 gives a dependency tree example generated by Minipar (Lin, 1998). |
The First Stage: Sentiment Graph Walking Algorithm | Figure l: The dependency tree of the sentence “The style of the screen is gorgeous”. |
Comparative Study | This structure is represented by a source sentence dependency tree . |
Comparative Study | The algorithm is as follows: given the source sentence and its dependency tree, during the translation process, once a hypothesis is extended, check if the source dependency tree contains a subtree T such that: |
Introduction | (Cherry, 2008) uses information from dependency trees to make the decoding process keep syntactic cohesion. |
Introduction | That is to say, despite them all being dependency treebanks, which annotate each sentence with a dependency tree , they subscribe to different annotation schemes. |
Towards A Universal Treebank | A sample dependency tree from the French data set is shown in Figure 1. |
Towards A Universal Treebank | For English, we used the Stanford parser (v1.6.8) (Klein and Manning, 2003) to convert the Wall Street J our-nal section of the Penn Treebank (Marcus et al., 1993) to basic dependency trees , including punctuation and with the copula verb as head in copula constructions. |
Sentence Compression | “Dependency Tree Features” encode the grammatical relations in which each word is involved as a dependent. |
Sentence Compression | For the “Syntactic Tree”, “Dependency Tree” and “Rule-Based” features, we also include features for the two words that precede and the two that follow the current word. |
The Framework | Dependency Tree Features in NP/VP/ADVP/ADJP chunk? |