Index of papers in Proc. ACL that mention
  • perceptron
Huang, Liang
Conclusion
With efficient approximate decoding, perceptron training on the whole Treebank becomes practical, which can be done in about a day even with a Python implementation.
Experiments
This result confirms that our feature set design is appropriate, and the averaged perceptron learner is a reasonable candidate for reranking.
Experiments
We use the development set to determine the optimal number of iterations for averaged perceptron , and report the F1 score on the test set.
Experiments
column is for feature extraction, and training column shows the number of perceptron iterations that achieved best results on the dev set, and average time per iteration.
Forest Reranking
3.1 Generic Reranking with the Perceptron
Forest Reranking
In this work we use the averaged perceptron algorithm (Collins, 2002) since it is an online algorithm much simpler and orders of magnitude faster than Boosting and MaxEnt methods.
Forest Reranking
Shown in Pseudocode l, the perceptron algorithm makes several passes over the whole training data, and in each iteration, for each sentence 3,, it tries to predict a best parse 3),- among the candidates cand(si) using the current weight setting.
Introduction
his parser, and Wenbin Jiang for guidance on perceptron averaging.
perceptron is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Beigman, Eyal and Beigman Klebanov, Beata
Abstract
We show that these are relatively unproblematic for an algorithm operating under the 0-1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.
Introduction
For example, the perceptron family of algorithms handle random classification noise well (Cohen, 1997).
Introduction
We show in section 3.4 that the widely used Freund and Schapire (1999) voted perceptron algorithm could face a constant hard case bias when confronted with annotation noise in training data, irrespective of the size of the dataset.
Introduction
3 Voted Perceptron
perceptron is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Duan, Manjuan and White, Michael
Abstract
Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make.
Background
Using the averaged perceptron algorithm (Collins, 2002), White & Rajkumar (2009) trained a structured prediction ranking model to combine these existing syntactic models with several n-gram language models.
Introduction
Rajkumar & White (2011; 2012) have recently shown that some rather egregious surface realization errors—in the sense that the reader would likely end up with the wrong interpretation—can be avoided by making use of features inspired by psycholinguistics research together with an otherwise state-of-the-art averaged perceptron realization ranking model (White and Rajkumar, 2009), as reviewed in the next section.
Introduction
With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model.
Introduction
Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature.
Simple Reranking
The first one is the baseline generative model (hereafter, generative model) used in training the averaged perceptron model.
Simple Reranking
The second one is the averaged perceptron model (hereafter, perceptron model), which uses all the features reviewed in Section 2.
Simple Reranking
Table 2: Devset BLEU scores for simple ranking on top of n-best perceptron model realizations
perceptron is mentioned in 29 sentences in this paper.
Topics mentioned in this paper:
Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru
Introduction
In the past decade, sequence labeling algorithms such as HMMs, CRFs, and Collins’ perceptrons have been extensively studied in the field of NLP (Rabiner, 1989; Lafferty et al., 2001; Collins, 2002).
Introduction
Among them, we focus on the perceptron algorithm (Collins, 2002).
Introduction
In the perceptron , the score function f (:13, y) is given as f(a:,y) = w - qb(a:,y) where w is the weight vector, and qb(a:, y) is the feature vector representation of the pair (:13, By making the first-order Markov assumption, we have
perceptron is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Abstract
We investigate different ways of learning structured perceptron models for coreference resolution when using nonlocal features and beam search.
Conclusion
We evaluated standard perceptron learning techniques for this setting both using early updates and LaSO.
Conclusion
In the special case where only local features are used, this method coincides with standard structured perceptron learning that uses exact search.
Experimental Setup
Unless otherwise stated we use 25 iterations of perceptron training and a beam size of 20.
Introduction
This paper studies and extends previous work using the structured perceptron (Collins, 2002) for complex NLP tasks.
Related Work
Perceptrons for coreference.
Related Work
The perceptron has previously been used to train coreference resolvers either by casting the problem as a binary classification problem that considers pairs of mentions in isolation (Bengtson and Roth, 2008; Stoyanov et al., 2009; Chang et al., 2012, inter alia) or in the structured manner, where a clustering for an entire document is predicted in one go (Fernandes et al., 2012).
Related Work
Stoyanov and Eisner (2012) train an Easy-First coreference system with the perceptron to learn a sequence of join operations between arbitrary mentions in a document and accesses nonlocal features through previous merge operations in later stages.
Representation and Learning
We find the weight vector 21) by online learning using a variant of the structured perceptron (Collins, 2002).
Representation and Learning
The structured perceptron iterates over training instances (55,, 3),), where :10, are inputs and y, are outputs.
Representation and Learning
If 7' is set to l, the update reduces to the standard structured perceptron update.
perceptron is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng and Huang, Liang
Introduction
We propose a novel joint event extraction algorithm to predict the triggers and arguments simultaneously, and use the structured perceptron (Collins, 2002) to train the joint model.
Introduction
Therefore we employ beam search in decoding, and train the model using the early-update perceptron variant tailored for beam search (Collins and Roark, 2004; Huang et al., 2012).
Joint Framework for Event Extraction
Based on the hypothesis that facts are interdependent, we propose to use structured perceptron with inexact search to jointly extract triggers and arguments that co-occur in the same sentence.
Joint Framework for Event Extraction
3.1 Structured perceptron with beam search
Joint Framework for Event Extraction
Structured perceptron is an extension to the standard linear perceptron for structured prediction, which was proposed in (Collins, 2002).
perceptron is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Character Classification Model
Algorithm 1 Perceptron training algorithm.
Character Classification Model
The classifier can be trained with online learning algorithms such as perceptron , or offline learning models such as support vector machines.
Character Classification Model
We choose the perceptron algorithm (Collins, 2002) to train the classifier for the character classification-based word segmentation model.
Experiments
Figure 3: Learning curve of the averaged perceptron classifier on the CTB developing set.
Experiments
We train the baseline perceptron classifier for word segmentation on the training set of CTB 5.0, using the developing set to determine the best training iterations.
Experiments
Figure 3 shows the learning curve of the averaged perceptron on the developing set.
Knowledge in Natural Annotations
Algorithm 2 Perceptron learning with natural annotations.
Learning with Natural Annotations
Algorithm 3 Online version of perceptron learning with natural annotations.
Learning with Natural Annotations
Considering the online characteristic of the perceptron algorithm, if we are able to leverage much more (than the Chinese wikipedia) data with natural annotations, an online version of learning procedure shown in Algorithm 3 would be a better choice.
perceptron is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Kim, Joohyun and Mooney, Raymond
Conclusions
In addition, since this response-based supervision is weak and ambiguous, we have also presented a method for using multiple reference parses to perform perceptron weight updates and shown a clear further improvement in end-task performance with this approach.
Experimental Evaluation
Table 2 presents reranking results for our proposed response-based weight update (Single) for the averaged perceptron (cf.
Modified Reranking Algorithm
Reranking using an averaged perceptron (Collins, 2002a) has been successfully applied to a variety of NLP tasks.
Modified Reranking Algorithm
Algorithm 1 AVERAGED PERCEPTRON TRAINING WITH RESPONSE-BASED UPDATE Input: A set of training examples (ei,y§‘), where 6, is a NL sentence and = arg maXyEGENQai) EXEC _ Output: The parameter vector W, averaged over all iterations 1. .
Modified Reranking Algorithm
1: procedure PERCEPTRON
perceptron is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Sogaard, Anders
Abstract
We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
Conclusion
Our approach was superior to previous approaches across 12 multilingual cross-domain POS tagging datasets, with an average error reduction of 4% over a structured perceptron baseline.
Experiments
Learning with antagonistic adversaries performs significantly better than structured perceptron (SP) learning, L00 -regularization and LRA across the board.
Experiments
We note that on the in-domain dataset (PTB-biomedical), L00 -regularization performs best, but our approach also performs better than the structured perceptron baseline on this dataset.
Experiments
The number of zero weights or very small weights is significantly lower for learning with antagonistic adversaries than for the baseline structured perceptron .
Introduction
Section 2 introduces previous work on robust perceptron learning, as well as the methods dicussed in the paper.
Robust perceptron learning
Our framework will be averaged perceptron learning (Freund and Schapire, 1999; Collins, 2002).
perceptron is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Experiment
The feature templates in (Zhao et al., 2006) and (Zhang and Clark, 2007) are used in training the CRFs model and Perceptrons model, respectively.
Segmentation Models
This section briefly reviews two supervised models in these categories, a character-based CRFs model, and a word-based Perceptrons model, which are used in our approach.
Segmentation Models
2.2 Word-based Perceptrons Model
Segmentation Models
Zhang and Clark (2007) first proposed a word-based segmentation model using a discriminative Perceptrons algorithm.
Semi-supervised Learning via Co-regularizing Both Models
The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
Semi-supervised Learning via Co-regularizing Both Models
Afterwards, the agreements A are used as a set of constraints to bias the learning of CRFs (§ 3.2) and Perceptron (§ 3.3) on the unlabeled data.
Semi-supervised Learning via Co-regularizing Both Models
3.3 Perceptrons with Constraints
perceptron is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Mohler, Michael and Bunescu, Razvan and Mihalcea, Rada
Answer Grading System
The scoring function is trained on a small set of manually aligned graphs using the averaged perceptron algorithm.
Answer Grading System
In order to learn the parameter vector w, we use the averaged version of the perceptron algorithm (Freund and Schapire, 1999; Collins, 2002).
Answer Grading System
The pseudocode for the learning algorithm is shown in Table l. After training the perceptron , these 32 student answers are removed from the dataset, not used as training further along in the pipeline, and are not included in the final results.
Data Set
In addition, the student answers used to train the perceptron are removed from the pipeline after the perceptron training stage.
Related Work
Following the same line of work in the textual entailment world are (Raina et al., 2005), (MacCartney et al., 2006), (de Marneffe et al., 2007), and (Chambers et al., 2007), which experiment variously with using diverse knowledge sources, using a perceptron to learn alignment decisions, and exploiting natural logic.
Results
We independently test two components of our overall grading system: the node alignment detection scores found by training the perceptron , and the overall grades produced in the final stage.
Results
5.1 Perceptron Alignment
Results
However, as the perceptron is designed to minimize error rate, this may not reflect an optimal objective when seeking to detect matches.
perceptron is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Experiments
For comparison, we also investigated training the reranker with Perceptron and MIRA.
Experiments
The f -scores of the held-out and evaluation set given by T-MIRA as well as the Perceptron and
Experiments
When very few labeled data are available for training (compared with the number of features), T-MIRA performs much better than the vector-based models MIRA and Perceptron .
Introduction
Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
Tensor Model Construction
As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron ) on the training data and estimate a weight vector, which serves as a “surro-
perceptron is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Abstract
We present an incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search.
Algorithm 3.1 The Model
To estimate the feature weights, we use structured perceptron (Collins, 2002), an extension of the standard perceptron for structured prediction, as the learning framework.
Algorithm 3.1 The Model
(2012) proved the convergency of structured perceptron when inexact search is applied with violation-fixing update methods such as early-update (Collins and Roark, 2004).
Algorithm 3.1 The Model
Figure 4 shows the pseudocode for structured perceptron training with early-update.
Conclusions and Future Work
For the first time, we addressed this challenging task by an incremental beam-search algorithm in conjunction with structured perceptron .
Introduction
Following the above intuitions, we introduce a joint framework based on structured perceptron (Collins, 2002; Collins and Roark, 2004) with beam-search to extract entity mentions and relations simultaneously.
Introduction
Our previous work (Li et al., 2013) used perceptron model with token-based tagging to jointly extract event triggers and arguments.
Related Work
Our previous work (Li et al., 2013) used structured perceptron with token-based decoder to jointly predict event triggers and arguments based on the assumption that entity mentions and other argument candidates are given as part of the input.
perceptron is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Baseline System
(2009) in using a decision tree classifier rather than an averaged linear perceptron .
Baseline System
We find the decision tree classifier to work better than the default averaged perceptron (used by Stoyanov et al.
Experiments
Angerc is the averaged perceptron baseline, DecTree is the decision tree baseline, and the +Feature rows show the effect of adding a particular feature incrementally (not in isolation) to the DecTree baseline.
Experiments
We start with the Reconcile baseline but employ the decision tree (DT) classifier, because it has significantly better performance than the default averaged perceptron classifier used in Stoyanov et a1.
Experiments
(2009).9 Table 2 compares the baseline perceptron results to the DT results and then shows the incremental addition of the Web features to the DT baseline (on the ACE04 development set).
perceptron is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
The baseline learner in our experiments is a pairwise ranking perceptron that is used on various features and training data and plugged into various meta-
Experiments
The perceptron algorithm itself compares favorably to related learning techniques such as the MIRA adaptation of Chiang et a1.
Experiments
In contrast, the perceptron is deterministic when started from a zero-vector of weights and achieves favorable 28.0 BLEU on the news-commentary test set.
Joint Feature Selection in Distributed Stochastic Learning
The resulting algorithms can be seen as variants of the perceptron algorithm.
Joint Feature Selection in Distributed Stochastic Learning
subgradient leads to the perceptron algorithm for pairwise ranking3 (Shen and Joshi, 2005):
Joint Feature Selection in Distributed Stochastic Learning
(2010) also present an iterative mixing algorithm where weights are mixed from each shard after training a single epoch of the perceptron in parallel on each shard.
Related Work
Examples for adapted algorithms include Maximum-Entropy Models (Och and Ney, 2002; Blunsom et al., 2008), Pairwise Ranking Perceptrons (Shen et al., 2004; Watanabe et al., 2006; Hopkins and May, 2011), Structured Perceptrons (Liang et al., 2006a), Boosting (Duh and Kirchhoff, 2008; Wellington et al., 2009), Structured SVMs (Tillmann and Zhang, 2006; Hayashi et al., 2009), MIRA (Watanabe et al., 2007; Chiang et al., 2008; Chiang et al., 2009), and others.
perceptron is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Fader, Anthony and Zettlemoyer, Luke and Etzioni, Oren
Abstract
Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme.
Data
Number of perceptron epochs T.
Data
A function PerceptronEpoch(T, 0, L) that runs a single epoch of the hidden-variable structured perceptron algorithm on training set T with initial parameters 0, returning a new parameter vector 0’ .
Learning
We use the hidden variable structured perceptron algorithm to learn 6 from a list of (question :5, query 2) training examples.
Learning
We adopt the iterative parameter mixing variation of the perceptron (McDonald et al., 2010) to scale to a large number of training examples.
Learning
Because the number of training examples is large, we adopt a parallel perceptron approach.
perceptron is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sauper, Christina and Barzilay, Regina
Abstract
We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview.
Introduction
We estimate the parameters of our model using the perceptron algorithm augmented with an integer linear programming (ILP) formulation, run over a training set of example articles in the given domain.
Method
Using the perceptron framework augmented with an ILP formulation for global optimization, the system is trained to select the best excerpt for each document d, and each topic tj.
Method
We implement this algorithm using the perceptron framework, as it can be easily modified for structured prediction while preserving convergence guarantees (Daume III and Marcu, 2005; Snyder and Barzilay, 2007).
Method
Training Procedure Our algorithm is a modification of the perceptron ranking algorithm (Collins, 2002), which allows for joint learning across several ranking problems (Daumé III and Marcu, 2005; Snyder and Barzilay, 2007).
Rank(eij1...€ij7~,Wj)
Our third baseline, Disjoint, uses the ranking perceptron framework as in our full system; however, rather than perform an optimization step during training and decoding, we simply select the highest-ranked excerpt for each topic.
perceptron is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Experiments
Due to the long CRF training time (days to weeks even for stochastic gradient descent training) for these large label size datasets, we choose the perceptron algorithm for training.
Experiments
We note that the selection of training algorithm does not affect the decoding process: the decoding is identical for both CRF and perceptron training algorithms.
Introduction
Sequence tagging algorithms including HMMs (Ra-biner, 1989), CRFs (Lafferty et al., 2001), and Collins’s perceptron (Collins, 2002) have been widely employed in NLP applications.
Problem formulation
In this section, we formulate the sequential decoding problem in the context of perceptron algorithm (Collins, 2002) and CRFs (Lafferty et al., 2001).
Problem formulation
Formally, a perceptron model is
Problem formulation
If x is given, the decoding is to find the best y which maximizes the score of f (y, x) for perceptron or the probability of p(y|x) for CRFs.
perceptron is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Xu, Wenduan and Clark, Stephen and Zhang, Yue
Introduction
The discrimina-tive model is global and trained with the structured perceptron .
Introduction
We also show how perceptron learning with beam-search (Collins and Roark, 2004) can be extended to handle the additional ambiguity, by adapting the “violation-fixing” perceptron of Huang et al.
The Dependency Model
We also show, in Section 3.3, how perceptron training with early-update (Collins and Roark, 2004) can be used in this setting.
The Dependency Model
We use the averaged perceptron (Collins, 2002) to train a global linear model and score each action.
The Dependency Model
Since there are potentially many gold items, and one gold item is required for the perceptron update, a decision needs
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev
Incorporating Syntactic Structures
The choice of action w is given by a learning algorithm, such as a maximum-entropy classifier, support vector machine or Perceptron , trained on labeled data.
Incorporating Syntactic Structures
(2011), a transition based model with a Perceptron and a lookahead heuristic process.
Introduction
We demonstrate our approach on a local Perceptron based part of speech tagger (Tsuruoka et al., 2011) and a shift reduce dependency parser (Sagae and Tsujii, 2007), yielding significantly faster tagging and parsing of ASR hypotheses.
Syntactic Language Models
(2005) uses the Perceptron algorithm to train a global linear discriminative model
Syntactic Language Models
We use a global linear model with Perceptron training.
Syntactic Language Models
Perceptron training learns the parameters a.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Konstas, Ioannis and Lapata, Mirella
Conclusions
We find the best scoring derivation via forest reranking using both local and nonlocal features, that we train using the perceptron algorithm.
Conclusions
Finally, distributed training strategies have been developed for the perceptron algorithm (McDonald et al., 2010), which would allow our generator to scale to even larger datasets.
Problem Formulation
Algorithm 1: Averaged Structured Perceptron N i=1
Problem Formulation
We estimate the weights oc using the averaged structured perceptron algorithm (Collins, 2002), which is well known for its speed and good performance in similar large-parameter NLP tasks (Liang et al., 2006; Huang, 2008).
Problem Formulation
As shown in Algorithm l, the perceptron makes several passes over the training scenarios, and in each iteration it computes the best scoring (w,h) among the candidate derivations, given the current weights oc.
Related Work
Our model is closest to Huang (2008) who also performs forest reranking on a hypergraph, using both local and nonlocal features, whose weights are tuned with the averaged perceptron algorithm (Collins, 2002).
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sassano, Manabu and Kurohashi, Sadao
Conclusion
It is observed that active learning of parsing with the averaged perceptron , which is one of the large margin classifiers, works also well for Japanese dependency analysis.
Experimental Evaluation and Discussion
6.2 Averaged Perceptron
Experimental Evaluation and Discussion
We used the averaged perceptron (AP) (Freund and Schapire, 1999) with polynomial kernels.
Experimental Evaluation and Discussion
We found the best value of the epoch T of the averaged perceptron by using the development set.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Riesa, Jason and Marcu, Daniel
Discriminative training
We incorporate all our new features into a linear model and learn weights for each using the online averaged perceptron algorithm (Collins, 2002) with a few modifications for structured outputs inspired by Chiang et al.
Experiments
We use 1,000 sentence pairs and gold alignments from LDC2006E86 to train model parameters: 800 sentences for training, 100 for testing, and 100 as a second held-out development set to decide when to stop perceptron training.
Experiments
Figure 8: Learning curves for 10 random restarts over time for parallel averaged perceptron training.
Experiments
Perceptron training here is quite stable, converging to the same general neighborhood each time.
Introduction
We train the parameters of the model using averaged perceptron (Collins, 2002) modified for structured outputs, but can easily fit into a max-margin or related framework.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Experiments
5.1 Baseline Perceptron Classifier
Experiments
We first report experimental results of the single perceptron classifier on CTB 5.0.
Experiments
The first 3 rows in each subtable of Table 3 show the performance of the single perceptron
Segmentation and Tagging as Character Classification
Algorithm 1 Perceptron training algorithm.
Segmentation and Tagging as Character Classification
Several classification models can be adopted here, however, we choose the averaged perceptron algorithm (Collins, 2002) because of its simplicity and high accuracy.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
Related Work
A few other state-of-the-art CWS systems are using semi-Markov perceptron methods or voting systems based on multiple semi-Markov
Related Work
perceptron segmenters (Zhang and Clark, 2007; Sun, 2010).
Related Work
Those semi-Markov perceptron systems are moderately faster than the heavy probabilistic systems using semi-Markov conditional random fields or latent variable conditional random fields.
perceptron is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Baseline parser
We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search.
Baseline parser
The model parameter 5 is trained with the averaged perceptron algorithm, applied to state items (sequence of actions) globally.
Experiments
The optimal iteration number of perceptron learning is determined
Improved hypotheses comparison
This turns out to have a significant empirical influence on perceptron training with early-update, where the training of the model interacts with search (Daume III, 2006).
perceptron is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew
Citation Extraction Data
We then use the development set to learn the penalties for the soft constraints, using the perceptron algorithm described in section 3.1.
Soft Constraints in Dual Decomposition
All we need to employ the structured perceptron algorithm (Collins, 2002) or the structured SVM algorithm (Tsochantaridis et al., 2004) is a black-box procedure for performing MAP inference in the structured linear model given an arbitrary cost vector.
Soft Constraints in Dual Decomposition
This can be ensured by simple modifications of the perceptron and subgradient descent optimization of the structured SVM objective simply by truncating c coordinate-wise to be nonnegative at every learning iteration.
Soft Constraints in Dual Decomposition
Intuitively, the perceptron update increases the penalty for a constraint if it is satisfied in the ground truth and not in an inferred prediction, and decreases the penalty if the constraint is satisfied in the prediction and not the ground truth.
perceptron is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
In this case, learning can follow the online structured perceptron learning procedure by Collins (2002), where weights updates for the k’th training example (x09), y("’)) are given as:
Introduction
While the Viterbi algorithm can be used for tagging optimal state-sequences given the weights, the structured perceptron can learn optimal model weights given gold-standard sequence labels.
Introduction
In the M-step, we take the decoded state-sequences in the E—step as observed, and run perceptron learning to update feature weights wi.
perceptron is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Experiments
We trained the parsers using the averaged perceptron (Freund and Schapire, 1999; Collins, 2002), which represents a balance between strong performance and fast training times.
Experiments
of iterations of perceptron training, we performed up to 30 iterations and chose the iteration which optimized accuracy on the development set.
Experiments
12Due to the sparsity of the perceptron updates, however, only a small fraction of the possible features were active in our trained models.
perceptron is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
parameter estimation of 6, we use the averaged perceptron as described in (Collins, 2002).
Abstract
POS-/+: perceptron trained without/with POS.
Abstract
These results are compared to the core perceptron trained without POS in (J iang et al., 2008a).
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhu, Jingbo and Xiao, Tong and Yang, Nan
Abstract
In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron .
Abstract
The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity.
Training
Current work (Huang et al., 2012) rigorously explained that only valid update ensures convergence of any perceptron variants.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Results
In our first experiment, we trained supertagger models using Generalised Iterative Scaling (GIS) (Darroch and Ratcliff, 1972), the limited memory BFGS method (BFGS) (Nocedal and Wright, 1999), the averaged perceptron (Collins, 2002), and the margin infused relaxed algorithm (MIRA) (Crammer and Singer, 2003).
Results
GIS 96.34 96.43 96.53 96.62 85.3 Perceptron 95.82 95.99 96.30 - 85.2 MIRA 96.23 96.29 96.46 96.63 85 .4
Results
For all four algorithms the training time is proportional to the amount of data, but the GIS and BFGS models trained on only CCGbank took 4,500 and 4,200 seconds to train, while the equivalent perceptron and MIRA models took 90 and 95 seconds to train.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Collins, Michael
Parsing experiments
7.2 Averaged perceptron training
Parsing experiments
We chose the averaged structured perceptron (Freund and Schapire, 1999; Collins, 2002) as it combines highly competitive performance with fast training times, typically converging in 5—10 iterations.
Parsing experiments
Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Accl/Acc2 = UAS of Models 1/2 on validation data, and Timel/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Okuma, Hideharu and Hara, Kazuo and Shimbo, Masashi and Matsumoto, Yuji
Alignment-based coordinate structure analysis
Shimbo and Hara defined this measure as a linear function of many features associated to arcs, and used perceptron training to optimize the weight coefficients for these features from corpora.
Improvements
The weight of these features, which eventually determines the score of the bypass, is tuned by perceptron just like the weights of other features.
Introduction
Recently, Shimbo and Hara (2007) proposed to use a large number of features to model this symmetry, and optimize the feature weights with perceptron training.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo
Approach
For this reason we choose as a ranking algorithm the Perceptron which is both accurate and efficient and can be trained with online protocols.
Approach
Specifically, we implement the ranking Perceptron proposed by Shen and J oshi (2005), which reduces the ranking problem to a binary classification problem.
Approach
For regularization purposes, we use as a final model the average of all Perceptron models posited
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: