Abstract | We show that these are relatively unproblematic for an algorithm operating under the 0-1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time. |
Introduction | For example, the perceptron family of algorithms handle random classification noise well (Cohen, 1997). |
Introduction | We show in section 3.4 that the widely used Freund and Schapire (1999) voted perceptron algorithm could face a constant hard case bias when confronted with annotation noise in training data, irrespective of the size of the dataset. |
Introduction | 3 Voted Perceptron |
Abstract | We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview. |
Introduction | We estimate the parameters of our model using the perceptron algorithm augmented with an integer linear programming (ILP) formulation, run over a training set of example articles in the given domain. |
Method | Using the perceptron framework augmented with an ILP formulation for global optimization, the system is trained to select the best excerpt for each document d, and each topic tj. |
Method | We implement this algorithm using the perceptron framework, as it can be easily modified for structured prediction while preserving convergence guarantees (Daume III and Marcu, 2005; Snyder and Barzilay, 2007). |
Method | Training Procedure Our algorithm is a modification of the perceptron ranking algorithm (Collins, 2002), which allows for joint learning across several ranking problems (Daumé III and Marcu, 2005; Snyder and Barzilay, 2007). |
Rank(eij1...€ij7~,Wj) | Our third baseline, Disjoint, uses the ranking perceptron framework as in our full system; however, rather than perform an optimization step during training and decoding, we simply select the highest-ranked excerpt for each topic. |
Experiments | 5.1 Baseline Perceptron Classifier |
Experiments | We first report experimental results of the single perceptron classifier on CTB 5.0. |
Experiments | The first 3 rows in each subtable of Table 3 show the performance of the single perceptron |
Segmentation and Tagging as Character Classification | Algorithm 1 Perceptron training algorithm. |
Segmentation and Tagging as Character Classification | Several classification models can be adopted here, however, we choose the averaged perceptron algorithm (Collins, 2002) because of its simplicity and high accuracy. |
Alignment-based coordinate structure analysis | Shimbo and Hara defined this measure as a linear function of many features associated to arcs, and used perceptron training to optimize the weight coefficients for these features from corpora. |
Improvements | The weight of these features, which eventually determines the score of the bypass, is tuned by perceptron just like the weights of other features. |
Introduction | Recently, Shimbo and Hara (2007) proposed to use a large number of features to model this symmetry, and optimize the feature weights with perceptron training. |