Abstract | This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. |
Experiments | length, comparing with top-down, 2nd-MST and shift-reduce parsers (beam size: 8, pred size: 5) |
Experiments | We also implemented and experimented Huang and Sagae (2010)’s arc-standard shift-reduce parser. |
Experiments | We used an early update version of averaged perceptron algorithm (Collins and Roark, 2004) for training of shift-reduce and top-down parsers. |
Introduction | Transition-based parsing algorithms, such as shift-reduce algorithms (Nivre, 2004; Zhang and Clark, 2008), are widely used for dependency analysis because of the efficiency and comparatively good performance. |
Introduction | (2004) pointed out that the drawbacks of shift-reduce parser could be resolved by incorporating top-down information such as root finding. |
Introduction | This work presents an 0(n2) top-down head-driven transition-based parsing algorithm which can parse complex structures that are not trivial for shift-reduce parsers. |
Related Work | Yamada and Matsumoto (2003) applied a shift-reduce algorithm to dependency analysis, which is known as arc-standard transition-based algorithm (Nivre, 2004). |
Evaluations | Table 2: Features for the shift-reduce parser (SRP) and the graph-based maximum spanning tree (MST) parser. |
Evaluations | The Shift-Reduce parser (SRP; Section 4.1) and the graph-based, maximum spanning tree parser (MST; Section 4.2) are compared to these baselines. |
Evaluations | In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS. |
Feature Design | The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk. |
Feature Design | Note that both models share features that look at the nodes, while only the shift-reduce parser has features for previously classified edges. |
Parsing Models | We consider two different approaches to learning a temporal dependency parser: a shift-reduce model (Nivre, 2008) and a graph-based model (McDonald et al., 2005). |
Parsing Models | 4.1 Shift-Reduce Parsing Model |
Parsing Models | Shift-reduce dependency parsers start with an input queue of unlinked words, and link them into a tree by repeatedly choosing and performing actions like shifting a node to a stack, or popping two nodes from the stack and linking them. |
Abstract | We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation. |
Abstract | To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. |
Introduction | In this paper, we propose a shift-reduce parsing algorithm for phrase-based string-to-dependency translation. |
Introduction | 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering. |
Introduction | 2 Shift-Reduce Parsing for Phrase-based String-to-Dependency Translation |
Abstract | Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art. |
Abstract | One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. |
Abstract | We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. |
Baseline parser | We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search. |
Baseline parser | 2.1 Vanilla Shift-Reduce |
Baseline parser | Shift-reduce parsing is based on a left-to-right scan of the input sentence. |
Introduction | Transition-based parsers employ a set of shift-reduce actions and perform parsing using a sequence of state transitions. |
Introduction | We propose an extension to the shift-reduce process to address this problem, which gives significant improvements to the parsing accuracies. |
Semi-supervised Parsing with Large Data | This section discusses how to extract information from unlabeled data or auto-parsed data to further improve shift-reduce parsing accuracies. |
Semi-supervised Parsing with Large Data | Based on the information, we propose a set of novel features specifically designed for shift-reduce constituent parsing. |
Abstract | The resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank. |
Introduction | Our method is implemented as a shift-reduce discourse parser (Marcu, 1999; Sagae, 2009). |
Model | 2.1 Shift-reduce discourse parsing |
Model | We construct RST Trees using shift-reduce parsing, as first proposed by Marcu (1999). |
Model | C Total number of classes, which correspond to possible shift-reduce operations |
Abstract | This paper presents the first dependency model for a shift-reduce CCG parser. |
Introduction | In this paper, we fill a gap in the literature by developing the first dependency model for a shift-reduce CCG parser. |
Introduction | Shift-reduce parsing applies naturally to CCG (Zhang and Clark, 2011), and the left-to-right, incremental nature of the decoding fits with CCG’s cognitive claims. |
Introduction | Results on the standard CCGBank tests show that our parser achieves absolute labeled F-score gains of up to 0.5 over the shift-reduce parser of Zhang and Clark (2011); and up to 1.05 and 0.64 over the normal-form and hybrid models of Clark and Curran (2007), respectively. |
Shift-Reduce with Beam-Search | This section describes how shift-reduce techniques can be applied to CCG, following Zhang and Clark (2011). |
Shift-Reduce with Beam-Search | First we describe the deterministic process which a parser would follow when tracing out a single, correct derivation; then we describe how a model of normal-form derivations — or, more accurately, a sequence of shift-reduce actions leading to a normal-form derivation —can be used with beam-search to develop a nondeterministic parser which selects the highest scoring sequence of actions. |
Shift-Reduce with Beam-Search | Note this section only describes a normal-form derivation model for shift-reduce parsing. |
Dependency Parsing: Baseline | In detail, a shift-reduce method is adopted as in (Nivre, 2003), where a classifier is used to make a parsing decision step by step. |
Dependency Parsing: Baseline | While memory-based and margin-based learning approaches such as support vector machines are popularly applied to shift-reduce parsing, we apply maximum entropy model as the learning model for efficient training and adopting overlapped features as our work in (Zhao and Kit, 2008), especially, those character-level ones for Chinese parsing. |
Dependency Parsing: Baseline | Without this type of features, a shift-reduce parser may directly scan through an input sequence in linear time. |
Evaluation Results | beam search algorithm with width 5 is used for parsing, otherwise, a simple shift-reduce decoding is used. |
Incorporating Syntactic Structures | A similar idea of equivalent states is used by Huang and Sagae (2010), except they use equivalence to facilitate dynamic programming for shift-reduce parsing, whereas we generalize it for improving the processing time of similar hypotheses in general models. |
Incorporating Syntactic Structures | We use the best-first probabilistic shift-reduce dependency parser of Sagae and Tsujii (2007), a transition-based parser (Kubler et al., 2009) with a MaXEnt classifier. |
Incorporating Syntactic Structures | Algorithm 1 Best—first shift-reduce dependency parsing |
Related Work | While Huang and Sagae (2010) use the notion of “equivalent states”, they do so for dynamic programming in a shift-reduce parser to broaden the search space. |
Introduction | With regard to task of parsing itself, an important advantage of the character-level syntax trees is that they allow word segmentation, part-of-speech (POS) tagging and parsing to be performed jointly, using an efficient CKY-style or shift-reduce algorithm. |
Introduction | Our model is based on the discriminative shift-reduce parser of Zhang and Clark (2009; 2011), which is a state-of-the-art word-based phrase-structure parser for Chinese. |
Introduction | We extend their shift-reduce framework, adding more transition actions for word segmentation and POS tagging, and defining novel features that capture character information. |
Related Work | Our work is based on the shift-reduce operations of their work, while we introduce additional operations for segmentation and POS tagging. |
Decoding | The algorithm bears a close resemblance to the shift-reduce algorithm where a stack is used to accumulate (partial) information about a, M L and M R for each a E A in the derivation. |
Introduction | We implement an efficient shift-reduce algorithm that facilitates the accumulation of partial context in a bottom-up fashion, allowing our model to influence the translation process even in the absence of full context. |
Introduction | In Section 6, we describe our shift-reduce algorithm which inte- |