Index of papers in Proc. ACL 2014 that mention

**perceptron**

Abstract | We investigate different ways of learning structured perceptron models for coreference resolution when using nonlocal features and beam search. |

Conclusion | We evaluated standard perceptron learning techniques for this setting both using early updates and LaSO. |

Conclusion | In the special case where only local features are used, this method coincides with standard structured perceptron learning that uses exact search. |

Experimental Setup | Unless otherwise stated we use 25 iterations of perceptron training and a beam size of 20. |

Introduction | This paper studies and extends previous work using the structured perceptron (Collins, 2002) for complex NLP tasks. |

Related Work | Perceptrons for coreference. |

Related Work | The perceptron has previously been used to train coreference resolvers either by casting the problem as a binary classification problem that considers pairs of mentions in isolation (Bengtson and Roth, 2008; Stoyanov et al., 2009; Chang et al., 2012, inter alia) or in the structured manner, where a clustering for an entire document is predicted in one go (Fernandes et al., 2012). |

Related Work | Stoyanov and Eisner (2012) train an Easy-First coreference system with the perceptron to learn a sequence of join operations between arbitrary mentions in a document and accesses nonlocal features through previous merge operations in later stages. |

Representation and Learning | We find the weight vector 21) by online learning using a variant of the structured perceptron (Collins, 2002). |

Representation and Learning | The structured perceptron iterates over training instances (55,, 3),), where :10, are inputs and y, are outputs. |

Representation and Learning | If 7' is set to l, the update reduces to the standard structured perceptron update. |

perceptron is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

- coreference (25)
- CoNLL (19)
- weight vector (14)

Abstract | Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make. |

Background | Using the averaged perceptron algorithm (Collins, 2002), White & Rajkumar (2009) trained a structured prediction ranking model to combine these existing syntactic models with several n-gram language models. |

Introduction | Rajkumar & White (2011; 2012) have recently shown that some rather egregious surface realization errors—in the sense that the reader would likely end up with the wrong interpretation—can be avoided by making use of features inspired by psycholinguistics research together with an otherwise state-of-the-art averaged perceptron realization ranking model (White and Rajkumar, 2009), as reviewed in the next section. |

Introduction | With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model. |

Introduction | Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature. |

Simple Reranking | The first one is the baseline generative model (hereafter, generative model) used in training the averaged perceptron model. |

Simple Reranking | The second one is the averaged perceptron model (hereafter, perceptron model), which uses all the features reviewed in Section 2. |

Simple Reranking | Table 2: Devset BLEU scores for simple ranking on top of n-best perceptron model realizations |

perceptron is mentioned in 29 sentences in this paper.

Topics mentioned in this paper:

- perceptron (29)
- SVM (23)
- BLEU (20)

Experiments | For comparison, we also investigated training the reranker with Perceptron and MIRA. |

Experiments | The f -scores of the held-out and evaluation set given by T-MIRA as well as the Perceptron and |

Experiments | When very few labeled data are available for training (compared with the number of features), T-MIRA performs much better than the vector-based models MIRA and Perceptron . |

Introduction | Many learning algorithms applied to NLP problems, such as the Perceptron (Collins, |

Tensor Model Construction | As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron ) on the training data and estimate a weight vector, which serves as a “surro- |

perceptron is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- feature weights (18)
- learning algorithm (9)
- Perceptron (9)

Abstract | We present an incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search. |

Algorithm 3.1 The Model | To estimate the feature weights, we use structured perceptron (Collins, 2002), an extension of the standard perceptron for structured prediction, as the learning framework. |

Algorithm 3.1 The Model | (2012) proved the convergency of structured perceptron when inexact search is applied with violation-fixing update methods such as early-update (Collins and Roark, 2004). |

Algorithm 3.1 The Model | Figure 4 shows the pseudocode for structured perceptron training with early-update. |

Conclusions and Future Work | For the first time, we addressed this challenging task by an incremental beam-search algorithm in conjunction with structured perceptron . |

Introduction | Following the above intuitions, we introduce a joint framework based on structured perceptron (Collins, 2002; Collins and Roark, 2004) with beam-search to extract entity mentions and relations simultaneously. |

Introduction | Our previous work (Li et al., 2013) used perceptron model with token-based tagging to jointly extract event triggers and arguments. |

Related Work | Our previous work (Li et al., 2013) used structured perceptron with token-based decoder to jointly predict event triggers and arguments based on the assumption that entity mentions and other argument candidates are given as part of the input. |

perceptron is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- entity mentions (64)
- entity type (17)
- relation extraction (16)

Introduction | The discrimina-tive model is global and trained with the structured perceptron . |

Introduction | We also show how perceptron learning with beam-search (Collins and Roark, 2004) can be extended to handle the additional ambiguity, by adapting the “violation-fixing” perceptron of Huang et al. |

The Dependency Model | We also show, in Section 3.3, how perceptron training with early-update (Collins and Roark, 2004) can be used in this setting. |

The Dependency Model | We use the averaged perceptron (Collins, 2002) to train a global linear model and score each action. |

The Dependency Model | Since there are potentially many gold items, and one gold item is required for the perceptron update, a decision needs |

perceptron is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- CCG (31)
- gold-standard (23)
- shift-reduce (21)

Citation Extraction Data | We then use the development set to learn the penalties for the soft constraints, using the perceptron algorithm described in section 3.1. |

Soft Constraints in Dual Decomposition | All we need to employ the structured perceptron algorithm (Collins, 2002) or the structured SVM algorithm (Tsochantaridis et al., 2004) is a black-box procedure for performing MAP inference in the structured linear model given an arbitrary cost vector. |

Soft Constraints in Dual Decomposition | This can be ensured by simple modifications of the perceptron and subgradient descent optimization of the structured SVM objective simply by truncating c coordinate-wise to be nonnegative at every learning iteration. |

Soft Constraints in Dual Decomposition | Intuitively, the perceptron update increases the penalty for a constraint if it is satisfied in the ground truth and not in an inferred prediction, and decreases the penalty if the constraint is satisfied in the prediction and not the ground truth. |

perceptron is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- soft constraints (25)
- CRF (19)
- ground truth (10)

Introduction | In this case, learning can follow the online structured perceptron learning procedure by Collins (2002), where weights updates for the k’th training example (x09), y("’)) are given as: |

Introduction | While the Viterbi algorithm can be used for tagging optimal state-sequences given the weights, the structured perceptron can learn optimal model weights given gold-standard sequence labels. |

Introduction | In the M-step, we take the decoded state-sequences in the E—step as observed, and run perceptron learning to update feature weights wi. |

perceptron is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- segmentation model (17)
- embeddings (8)
- distributional semantics (7)