Introduction | We propose a novel joint event extraction algorithm to predict the triggers and arguments simultaneously, and use the structured perceptron (Collins, 2002) to train the joint model. |
Introduction | Therefore we employ beam search in decoding, and train the model using the early-update perceptron variant tailored for beam search (Collins and Roark, 2004; Huang et al., 2012). |
Joint Framework for Event Extraction | Based on the hypothesis that facts are interdependent, we propose to use structured perceptron with inexact search to jointly extract triggers and arguments that co-occur in the same sentence. |
Joint Framework for Event Extraction | 3.1 Structured perceptron with beam search |
Joint Framework for Event Extraction | Structured perceptron is an extension to the standard linear perceptron for structured prediction, which was proposed in (Collins, 2002). |
Character Classification Model | Algorithm 1 Perceptron training algorithm. |
Character Classification Model | The classifier can be trained with online learning algorithms such as perceptron , or offline learning models such as support vector machines. |
Character Classification Model | We choose the perceptron algorithm (Collins, 2002) to train the classifier for the character classification-based word segmentation model. |
Experiments | Figure 3: Learning curve of the averaged perceptron classifier on the CTB developing set. |
Experiments | We train the baseline perceptron classifier for word segmentation on the training set of CTB 5.0, using the developing set to determine the best training iterations. |
Experiments | Figure 3 shows the learning curve of the averaged perceptron on the developing set. |
Knowledge in Natural Annotations | Algorithm 2 Perceptron learning with natural annotations. |
Learning with Natural Annotations | Algorithm 3 Online version of perceptron learning with natural annotations. |
Learning with Natural Annotations | Considering the online characteristic of the perceptron algorithm, if we are able to leverage much more (than the Chinese wikipedia) data with natural annotations, an online version of learning procedure shown in Algorithm 3 would be a better choice. |
Conclusions | In addition, since this response-based supervision is weak and ambiguous, we have also presented a method for using multiple reference parses to perform perceptron weight updates and shown a clear further improvement in end-task performance with this approach. |
Experimental Evaluation | Table 2 presents reranking results for our proposed response-based weight update (Single) for the averaged perceptron (cf. |
Modified Reranking Algorithm | Reranking using an averaged perceptron (Collins, 2002a) has been successfully applied to a variety of NLP tasks. |
Modified Reranking Algorithm | Algorithm 1 AVERAGED PERCEPTRON TRAINING WITH RESPONSE-BASED UPDATE Input: A set of training examples (ei,y§‘), where 6, is a NL sentence and = arg maXyEGENQai) EXEC _ Output: The parameter vector W, averaged over all iterations 1. . |
Modified Reranking Algorithm | 1: procedure PERCEPTRON |
Abstract | We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets. |
Conclusion | Our approach was superior to previous approaches across 12 multilingual cross-domain POS tagging datasets, with an average error reduction of 4% over a structured perceptron baseline. |
Experiments | Learning with antagonistic adversaries performs significantly better than structured perceptron (SP) learning, L00 -regularization and LRA across the board. |
Experiments | We note that on the in-domain dataset (PTB-biomedical), L00 -regularization performs best, but our approach also performs better than the structured perceptron baseline on this dataset. |
Experiments | The number of zero weights or very small weights is significantly lower for learning with antagonistic adversaries than for the baseline structured perceptron . |
Introduction | Section 2 introduces previous work on robust perceptron learning, as well as the methods dicussed in the paper. |
Robust perceptron learning | Our framework will be averaged perceptron learning (Freund and Schapire, 1999; Collins, 2002). |
Experiment | The feature templates in (Zhao et al., 2006) and (Zhang and Clark, 2007) are used in training the CRFs model and Perceptrons model, respectively. |
Segmentation Models | This section briefly reviews two supervised models in these categories, a character-based CRFs model, and a word-based Perceptrons model, which are used in our approach. |
Segmentation Models | 2.2 Word-based Perceptrons Model |
Segmentation Models | Zhang and Clark (2007) first proposed a word-based segmentation model using a discriminative Perceptrons algorithm. |
Semi-supervised Learning via Co-regularizing Both Models | The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively. |
Semi-supervised Learning via Co-regularizing Both Models | Afterwards, the agreements A are used as a set of constraints to bias the learning of CRFs (§ 3.2) and Perceptron (§ 3.3) on the unlabeled data. |
Semi-supervised Learning via Co-regularizing Both Models | 3.3 Perceptrons with Constraints |
Abstract | Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. |
Data | Number of perceptron epochs T. |
Data | A function PerceptronEpoch(T, 0, L) that runs a single epoch of the hidden-variable structured perceptron algorithm on training set T with initial parameters 0, returning a new parameter vector 0’ . |
Learning | We use the hidden variable structured perceptron algorithm to learn 6 from a list of (question :5, query 2) training examples. |
Learning | We adopt the iterative parameter mixing variation of the perceptron (McDonald et al., 2010) to scale to a large number of training examples. |
Learning | Because the number of training examples is large, we adopt a parallel perceptron approach. |
Baseline parser | We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search. |
Baseline parser | The model parameter 5 is trained with the averaged perceptron algorithm, applied to state items (sequence of actions) globally. |
Experiments | The optimal iteration number of perceptron learning is determined |
Improved hypotheses comparison | This turns out to have a significant empirical influence on perceptron training with early-update, where the training of the model interacts with search (Daume III, 2006). |
Abstract | In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron . |
Abstract | The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. |
Training | Current work (Huang et al., 2012) rigorously explained that only valid update ensures convergence of any perceptron variants. |