Introduction | Following the common theme of “more data is better data” we also use both a limited labeled corpora and a plentiful unlabeled data resource. |
Introduction | (2006a) successfully applied self-training to parsing by exploiting available unlabeled data , and obtained remarkable results when the same technique was applied to parser adaptation (McClosky et al., 2006b). |
Introduction | However, the standard objective of an S3VM is non-convex on the unlabeled data , thus requiring sophisticated global optimization heuristics to obtain reasonable solutions. |
Semi-supervised Convex Training for Structured SVM | for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data . |
Semi-supervised Structured Large Margin Objective | The objective of standard semi-supervised structured SVM is a combination of structured large margin losses on both labeled and unlabeled data . |
Semi-supervised Structured Large Margin Objective | We introduce an efficient approximation—least squares loss—for the structured large margin loss on unlabeled data below. |
Introduction | In general, semi-supervised learning can be motivated by two concerns: first, given a fixed amount of supervised data, we might wish to leverage additional unlabeled data to facilitate the utilization of the supervised corpus, increasing the performance of the model in absolute terms. |
Introduction | Second, given a fixed target performance level, we might wish to use unlabeled data to reduce the amount of annotated data necessary to reach this target. |
Related Work | Crucially, however, these methods do not exploit unlabeled data when leam-ing their representations. |
Data | The set of 3,123 words that were not annotated was the unlabeled data for the EM algorithm. |
Experiments and Results | Our post-experimental analysis reveals that the parameter updation process using the unlabeled data has an effect of overly separating the two overlapping distributions. |
Finding the Homographs in a Lexicon | In Model II, the semi-supervised setup, the training data is used to initialize the Expectation-Maximization (EM) algorithm (Dempster et al., 1977) and the unlabeled data , described in Section 3.1, updates the initial estimates. |