Index of papers in Proc. ACL 2012 that mention

perceptron

Seen in text as:

perceptron (35)
Perceptron (7)

Seen in 43 sentences in 7 papers.

1. Coreference Semantics from Web Features

Bansal, Mohit and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline System	(2009) in using a decision tree classifier rather than an averaged linear perceptron .
Baseline System	We find the decision tree classifier to work better than the default averaged perceptron (used by Stoyanov et al.
Experiments	Angerc is the averaged perceptron baseline, DecTree is the decision tree baseline, and the +Feature rows show the effect of adding a particular feature incrementally (not in isolation) to the DecTree baseline.
Experiments	We start with the Reconcile baseline but employ the decision tree (DT) classifier, because it has significantly better performance than the default averaged perceptron classifier used in Stoyanov et a1.
Experiments	(2009).9 Table 2 compares the baseline perceptron results to the DT results and then shows the incremental addition of the Web features to the DT baseline (on the ACE04 development set).

perceptron is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

2. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The baseline learner in our experiments is a pairwise ranking perceptron that is used on various features and training data and plugged into various meta-
Experiments	The perceptron algorithm itself compares favorably to related learning techniques such as the MIRA adaptation of Chiang et a1.
Experiments	In contrast, the perceptron is deterministic when started from a zero-vector of weights and achieves favorable 28.0 BLEU on the news-commentary test set.
Joint Feature Selection in Distributed Stochastic Learning	The resulting algorithms can be seen as variants of the perceptron algorithm.
Joint Feature Selection in Distributed Stochastic Learning	subgradient leads to the perceptron algorithm for pairwise ranking3 (Shen and Joshi, 2005):
Joint Feature Selection in Distributed Stochastic Learning	(2010) also present an iterative mixing algorithm where weights are mixed from each shard after training a single epoch of the perceptron in parallel on each shard.
Related Work	Examples for adapted algorithms include Maximum-Entropy Models (Och and Ney, 2002; Blunsom et al., 2008), Pairwise Ranking Perceptrons (Shen et al., 2004; Watanabe et al., 2006; Hopkins and May, 2011), Structured Perceptrons (Liang et al., 2006a), Boosting (Duh and Kirchhoff, 2008; Wellington et al., 2009), Structured SVMs (Tillmann and Zhang, 2006; Hayashi et al., 2009), MIRA (Watanabe et al., 2007; Chiang et al., 2008; Chiang et al., 2009), and others.

perceptron is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

3. Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Due to the long CRF training time (days to weeks even for stochastic gradient descent training) for these large label size datasets, we choose the perceptron algorithm for training.
Experiments	We note that the selection of training algorithm does not affect the decoding process: the decoding is identical for both CRF and perceptron training algorithms.
Introduction	Sequence tagging algorithms including HMMs (Ra-biner, 1989), CRFs (Lafferty et al., 2001), and Collins’s perceptron (Collins, 2002) have been widely employed in NLP applications.
Problem formulation	In this section, we formulate the sequential decoding problem in the context of perceptron algorithm (Collins, 2002) and CRFs (Lafferty et al., 2001).
Problem formulation	Formally, a perceptron model is
Problem formulation	If x is given, the decoding is to find the best y which maximizes the score of f (y, x) for perceptron or the probability of p(y\|x) for CRFs.

perceptron is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

Viterbi (72)
CoNLL (15)
beam search (9)

4. Concept-to-text Generation via Discriminative Reranking

Konstas, Ioannis and Lapata, Mirella

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	We find the best scoring derivation via forest reranking using both local and nonlocal features, that we train using the perceptron algorithm.
Conclusions	Finally, distributed training strategies have been developed for the perceptron algorithm (McDonald et al., 2010), which would allow our generator to scale to even larger datasets.
Problem Formulation	Algorithm 1: Averaged Structured Perceptron N i=1
Problem Formulation	We estimate the weights oc using the averaged structured perceptron algorithm (Collins, 2002), which is well known for its speed and good performance in similar large-parameter NLP tasks (Liang et al., 2006; Huang, 2008).
Problem Formulation	As shown in Algorithm l, the perceptron makes several passes over the training scenarios, and in each iteration it computes the best scoring (w,h) among the candidate derivations, given the current weights oc.
Related Work	Our model is closest to Huang (2008) who also performs forest reranking on a hypergraph, using both local and nonlocal features, whose weights are tuned with the averaged perceptron algorithm (Collins, 2002).

perceptron is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Incorporating Syntactic Structures	The choice of action w is given by a learning algorithm, such as a maximum-entropy classifier, support vector machine or Perceptron , trained on labeled data.
Incorporating Syntactic Structures	(2011), a transition based model with a Perceptron and a lookahead heuristic process.
Introduction	We demonstrate our approach on a local Perceptron based part of speech tagger (Tsuruoka et al., 2011) and a shift reduce dependency parser (Sagae and Tsujii, 2007), yielding significantly faster tagging and parsing of ASR hypotheses.
Syntactic Language Models	(2005) uses the Perceptron algorithm to train a global linear discriminative model
Syntactic Language Models	We use a global linear model with Perceptron training.
Syntactic Language Models	Perceptron training learns the parameters a.

perceptron is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	A few other state-of-the-art CWS systems are using semi-Markov perceptron methods or voting systems based on multiple semi-Markov
Related Work	perceptron segmenters (Zhang and Clark, 2007; Sun, 2010).
Related Work	Those semi-Markov perceptron systems are moderately faster than the heavy probabilistic systems using semi-Markov conditional random fields or latent variable conditional random fields.

perceptron is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	parameter estimation of 6, we use the averaged perceptron as described in (Collins, 2002).
Abstract	POS-/+: perceptron trained without/with POS.
Abstract	These results are compared to the core perceptron trained without POS in (J iang et al., 2008a).

perceptron is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)