Index of papers in Proc. ACL 2012 that mention
  • perceptron
Bansal, Mohit and Klein, Dan
Baseline System
(2009) in using a decision tree classifier rather than an averaged linear perceptron .
Baseline System
We find the decision tree classifier to work better than the default averaged perceptron (used by Stoyanov et al.
Experiments
Angerc is the averaged perceptron baseline, DecTree is the decision tree baseline, and the +Feature rows show the effect of adding a particular feature incrementally (not in isolation) to the DecTree baseline.
Experiments
We start with the Reconcile baseline but employ the decision tree (DT) classifier, because it has significantly better performance than the default averaged perceptron classifier used in Stoyanov et a1.
Experiments
(2009).9 Table 2 compares the baseline perceptron results to the DT results and then shows the incremental addition of the Web features to the DT baseline (on the ACE04 development set).
perceptron is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
The baseline learner in our experiments is a pairwise ranking perceptron that is used on various features and training data and plugged into various meta-
Experiments
The perceptron algorithm itself compares favorably to related learning techniques such as the MIRA adaptation of Chiang et a1.
Experiments
In contrast, the perceptron is deterministic when started from a zero-vector of weights and achieves favorable 28.0 BLEU on the news-commentary test set.
Joint Feature Selection in Distributed Stochastic Learning
The resulting algorithms can be seen as variants of the perceptron algorithm.
Joint Feature Selection in Distributed Stochastic Learning
subgradient leads to the perceptron algorithm for pairwise ranking3 (Shen and Joshi, 2005):
Joint Feature Selection in Distributed Stochastic Learning
(2010) also present an iterative mixing algorithm where weights are mixed from each shard after training a single epoch of the perceptron in parallel on each shard.
Related Work
Examples for adapted algorithms include Maximum-Entropy Models (Och and Ney, 2002; Blunsom et al., 2008), Pairwise Ranking Perceptrons (Shen et al., 2004; Watanabe et al., 2006; Hopkins and May, 2011), Structured Perceptrons (Liang et al., 2006a), Boosting (Duh and Kirchhoff, 2008; Wellington et al., 2009), Structured SVMs (Tillmann and Zhang, 2006; Hayashi et al., 2009), MIRA (Watanabe et al., 2007; Chiang et al., 2008; Chiang et al., 2009), and others.
perceptron is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Experiments
Due to the long CRF training time (days to weeks even for stochastic gradient descent training) for these large label size datasets, we choose the perceptron algorithm for training.
Experiments
We note that the selection of training algorithm does not affect the decoding process: the decoding is identical for both CRF and perceptron training algorithms.
Introduction
Sequence tagging algorithms including HMMs (Ra-biner, 1989), CRFs (Lafferty et al., 2001), and Collins’s perceptron (Collins, 2002) have been widely employed in NLP applications.
Problem formulation
In this section, we formulate the sequential decoding problem in the context of perceptron algorithm (Collins, 2002) and CRFs (Lafferty et al., 2001).
Problem formulation
Formally, a perceptron model is
Problem formulation
If x is given, the decoding is to find the best y which maximizes the score of f (y, x) for perceptron or the probability of p(y|x) for CRFs.
perceptron is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Konstas, Ioannis and Lapata, Mirella
Conclusions
We find the best scoring derivation via forest reranking using both local and nonlocal features, that we train using the perceptron algorithm.
Conclusions
Finally, distributed training strategies have been developed for the perceptron algorithm (McDonald et al., 2010), which would allow our generator to scale to even larger datasets.
Problem Formulation
Algorithm 1: Averaged Structured Perceptron N i=1
Problem Formulation
We estimate the weights oc using the averaged structured perceptron algorithm (Collins, 2002), which is well known for its speed and good performance in similar large-parameter NLP tasks (Liang et al., 2006; Huang, 2008).
Problem Formulation
As shown in Algorithm l, the perceptron makes several passes over the training scenarios, and in each iteration it computes the best scoring (w,h) among the candidate derivations, given the current weights oc.
Related Work
Our model is closest to Huang (2008) who also performs forest reranking on a hypergraph, using both local and nonlocal features, whose weights are tuned with the averaged perceptron algorithm (Collins, 2002).
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev
Incorporating Syntactic Structures
The choice of action w is given by a learning algorithm, such as a maximum-entropy classifier, support vector machine or Perceptron , trained on labeled data.
Incorporating Syntactic Structures
(2011), a transition based model with a Perceptron and a lookahead heuristic process.
Introduction
We demonstrate our approach on a local Perceptron based part of speech tagger (Tsuruoka et al., 2011) and a shift reduce dependency parser (Sagae and Tsujii, 2007), yielding significantly faster tagging and parsing of ASR hypotheses.
Syntactic Language Models
(2005) uses the Perceptron algorithm to train a global linear discriminative model
Syntactic Language Models
We use a global linear model with Perceptron training.
Syntactic Language Models
Perceptron training learns the parameters a.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
Related Work
A few other state-of-the-art CWS systems are using semi-Markov perceptron methods or voting systems based on multiple semi-Markov
Related Work
perceptron segmenters (Zhang and Clark, 2007; Sun, 2010).
Related Work
Those semi-Markov perceptron systems are moderately faster than the heavy probabilistic systems using semi-Markov conditional random fields or latent variable conditional random fields.
perceptron is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
parameter estimation of 6, we use the averaged perceptron as described in (Collins, 2002).
Abstract
POS-/+: perceptron trained without/with POS.
Abstract
These results are compared to the core perceptron trained without POS in (J iang et al., 2008a).
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: