Index of papers in Proc. ACL 2010 that mention
  • perceptron
Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru
Introduction
In the past decade, sequence labeling algorithms such as HMMs, CRFs, and Collins’ perceptrons have been extensively studied in the field of NLP (Rabiner, 1989; Lafferty et al., 2001; Collins, 2002).
Introduction
Among them, we focus on the perceptron algorithm (Collins, 2002).
Introduction
In the perceptron , the score function f (:13, y) is given as f(a:,y) = w - qb(a:,y) where w is the weight vector, and qb(a:, y) is the feature vector representation of the pair (:13, By making the first-order Markov assumption, we have
perceptron is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Riesa, Jason and Marcu, Daniel
Discriminative training
We incorporate all our new features into a linear model and learn weights for each using the online averaged perceptron algorithm (Collins, 2002) with a few modifications for structured outputs inspired by Chiang et al.
Experiments
We use 1,000 sentence pairs and gold alignments from LDC2006E86 to train model parameters: 800 sentences for training, 100 for testing, and 100 as a second held-out development set to decide when to stop perceptron training.
Experiments
Figure 8: Learning curves for 10 random restarts over time for parallel averaged perceptron training.
Experiments
Perceptron training here is quite stable, converging to the same general neighborhood each time.
Introduction
We train the parameters of the model using averaged perceptron (Collins, 2002) modified for structured outputs, but can easily fit into a max-margin or related framework.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sassano, Manabu and Kurohashi, Sadao
Conclusion
It is observed that active learning of parsing with the averaged perceptron , which is one of the large margin classifiers, works also well for Japanese dependency analysis.
Experimental Evaluation and Discussion
6.2 Averaged Perceptron
Experimental Evaluation and Discussion
We used the averaged perceptron (AP) (Freund and Schapire, 1999) with polynomial kernels.
Experimental Evaluation and Discussion
We found the best value of the epoch T of the averaged perceptron by using the development set.
perceptron is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Collins, Michael
Parsing experiments
7.2 Averaged perceptron training
Parsing experiments
We chose the averaged structured perceptron (Freund and Schapire, 1999; Collins, 2002) as it combines highly competitive performance with fast training times, typically converging in 5—10 iterations.
Parsing experiments
Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Accl/Acc2 = UAS of Models 1/2 on validation data, and Timel/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Results
In our first experiment, we trained supertagger models using Generalised Iterative Scaling (GIS) (Darroch and Ratcliff, 1972), the limited memory BFGS method (BFGS) (Nocedal and Wright, 1999), the averaged perceptron (Collins, 2002), and the margin infused relaxed algorithm (MIRA) (Crammer and Singer, 2003).
Results
GIS 96.34 96.43 96.53 96.62 85.3 Perceptron 95.82 95.99 96.30 - 85.2 MIRA 96.23 96.29 96.46 96.63 85 .4
Results
For all four algorithms the training time is proportional to the amount of data, but the GIS and BFGS models trained on only CCGbank took 4,500 and 4,200 seconds to train, while the equivalent perceptron and MIRA models took 90 and 95 seconds to train.
perceptron is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: