Index of papers in Proc. ACL 2009 that mention
  • labeled data
Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng
Abstract
With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data , we show improvement in generalization performance over state-of-the-art QA models.
Experiments
In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data .
Experiments
Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM.
Experiments
Especially in Hybrid graph-Summary SSL, Hybrid gSum SSL, when the number of labeled data is small (7133 < 25% >x< 7113) performance improvement is better than rest
Graph Summarization
The labeled data points, i.e., X L, are appended to each of these selected X 5 datasets, X5 = {mi,...mfn_l} U XL.
Graph Summarization
The local density constraints become crucial for inference where summarized labeled data are used instead of overall dataset.
Graph Summarization
As a result q number of summary datasets XS each of which with nb labeled data points are combined to form a representative sample of X, X = {X8 }:=1 reducing the number of data from n to a much smaller number of data, p = q * nb << n. So the new summary of the X can be represented with X = {Xi}§=1.
Introduction
One of the challenges we face with is that we have very limited amount of labeled data , i.e., correctly labeled (true/false entailment) sentences.
Introduction
We consider situations where there are much more unlabeled data, X U, than labeled data , X L, i.e., 711; << 711].
Introduction
— application of a graph-summarization method to enable learning from a very large unlabeled and rather small labeled data , which would not have been feasible for most sophisticated learning tools in section 4.
labeled data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Dasgupta, Sajib and Ng, Vincent
Evaluation
Owing to the randomness involved in the choice of labeled data , all baseline results are averaged over ten independent runs for each fold.
Evaluation
We implemented Kamvar et al.’s (2003) semi-supervised spectral clustering algorithm, which incorporates labeled data into the clustering framework in the form of must-link and cannot-link constraints.
Evaluation
We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data.
Introduction
perimental results on five sentiment classification datasets demonstrate that our system can generate high-quality labeled data from unambiguous reviews, which, together with a small number of manually labeled reviews selected by the active learner, can be used to effectively classify ambiguous reviews in a discriminative fashion.
Our Approach
However, in the absence of labeled data , it is not easy to assess feature relevance.
Our Approach
Even if labeled data were present, the ambiguous points might be better handled by a discriminative leam-ing system than a clustering algorithm, as discriminative learners are more sophisticated, and can handle ambiguous feature space more effectively.
Our Approach
In self-training, we iteratively train a classifier on the data labeled so far, use it to classify the unlabeled instances, and augment the labeled data with the most confidently labeled instances.
labeled data is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Manshadi, Mehdi and Li, Xiao
A grammar for semantic tagging
For these reasons, we evaluate our grammar model on the task of automatic tagging of queries for which we have labeled data available.
A grammar for semantic tagging
The model, however, extends the lexicon by including words discovered from labeled data (if available).
A grammar for semantic tagging
It is true that the word “vs” plays a critical role in this query, representing that the user’s intention is to compare the two brands; but as mentioned above in our labeled data such words has left unlabeled.
Discriminative re-ranking
In particular, when there is no or a very small amount of labeled data , a parser could still work by using unsupervised learning approaches to learn the rules, or by simply using a set of hand-built rules (as we did above for the task of semantic tagging).
Discriminative re-ranking
When there is enough labeled data, then a discriminative model can be trained on the labeled data to learn contextual information and to further enhance the tagging performance.
Introduction
Preparing labeled data , however, is very expensive.
Introduction
Therefore in cases where there is no or a small amount of labeled data available, these models do a poor job.
Introduction
As seen later, in the case where there is not a large amount of labeled data available, the parser part is the dominant part of the module and performs reasonably well.
Summary
This is a big advantage of the parser model, because in practice providing labeled data is very expensive but very often the lexicons can be easily extracted from the structured data on the web (for example extracting movie titles from imdb or book titles from Amazon).
labeled data is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Abstract
To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data.
Abstract
Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data .
Baseline Approaches
mate goal is to evaluate the effectiveness of our bootstrapping algorithm, the baseline approaches only make use of small amounts of labeled data for acquiring classifiers.
Baseline Approaches
To ensure a fair comparison with the first baseline, we do not employ additional labeled data for parameter tuning; rather, we reserve 25% of the available training data for tuning, and use the remaining 75% for classifier
Introduction
The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
Introduction
Such methods, however, are unlikely to perform equally well for our cause identification task given our small labeled set, as the minority class prediction problem is complicated by the scarcity of labeled data .
Introduction
More specifically, given the scarcity of labeled data , many words that are potentially correlated with a shaper (especially a minority shaper) may not appear in the training set, and the lack of such useful indicators could hamper the acquisition of an accurate classifier via supervised learning techniques.
Our Bootstrapping Algorithm
One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data .
Our Bootstrapping Algorithm
The situation is somewhat aggravated by the fact that we are adopting a one-versus-all scheme for generating training instances for a particular shaper, which, together with the small amount of labeled data , implies that only a couple of positive instances may be available for training the classifier for a minority class.
Our Bootstrapping Algorithm
The reason we impose the “at least three” requirement is precision: we want to ensure, with a reasonable level of confidence, that the unlabeled documents chosen to augment P should indeed be labeled with the shaper under consideration, as incorrectly labeled documents would contaminate the labeled data, thus accelerating the deterioration of the quality of the automatically labeled data in subsequent bootstrapping iterations and adversely affecting the accuracy of the classifier trained on it (Pierce and Cardie, 2001).
labeled data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
Experiments
We subsampled 5,000 sentences as the labeled data 11.. and 20,000 sentences as [U for the pool of untranslated sentences (while hiding the English part).
Introduction
However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
Introduction
In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
Introduction
In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
Sentence Selection: Single Language Pair
The more frequent a phrase is in the labeled data , the more unimportant it is; since probably we have observed most of its translations.
Sentence Selection: Single Language Pair
In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
Sentence Selection: Single Language Pair
two multinomials, one for labeled data and the other one for unlabeled data.
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Experiments
In chunking, there is a clear trend toward larger increases in performance as words become rarer in the labeled data set, from a 0.02 improvement on words of frequency 2, to an improvement of 0.21 on OOV words.
Experiments
To measure the sample complexity of the supervised CRF, we use the same experimental setup as in the chunking experiment on WSJ text, but we vary the amount of labeled data available to the CRF.
Experiments
Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work
Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data .
Related Work
Our technique lets the HMM find parameters that maximize cross-entropy, and then uses labeled data to learn the best mapping from the HMM categories to the POS categories.
Related Work
Our technique uses unlabeled training data from the target domain, and is thus applicable more generally, including in web processing, where the domain and vocabulary is highly variable, and it is extremely difficult to obtain labeled data that is representative of the test distribution.
labeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Abstract
In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection
Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data.
Conclusions
In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results
Among labeled data , 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
Experiments and results
We can see that the performance of co-training for these three tasks is slightly worse than supervised learning using all the labeled data, but is significantly better than the original performance using 3% of hand labeled data .
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Discussion and Related Work
Ando and Zhang (2005) defined an objective function that combines the original problem on the labeled data with a set of auxiliary problems on unlabeled data.
Discussion and Related Work
One is to leverage a large amount of unsupervised data to train an adequate classifier with a small amount of labeled data .
Introduction
While the labeled data is generally very costly to obtain, there is a vast amount of unlabeled textual data freely available on the web.
Introduction
Under this approach, even if a word is not found in the training data, it may still fire cluster-based features as long as it shares cluster assignments with some words in the labeled data .
Introduction
Since the clusters are obtained without any labeled data , they may not correspond directly to concepts that are useful for decision making in the problem domain.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Experiments of Parsing
Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing
Table 7 shows the performance of self-trained generative parser and updated reranker on the test set, with CTB and CDTfs as labeled data .
Experiments of Parsing
All the works in Table 8 used CTB articles 1-270 as labeled data .
Introduction
It is important to acquire additional labeled data for the target grammar parsing through exploitation of existing source treebanks since there is often a shortage of labeled data .
Introduction
When coupled with self-training technique, a reranking parser with CTB and converted CDT as labeled data achieves 85.2% f-score on CTB test set, an absolute 1.0% improvement (6% error reduction) over the previous best result for Chinese parsing.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Mann, Gideon and McCallum, Andrew
Generalized Expectation Criteria
2In general, the objective function could also include the likelihood of available labeled data , but throughout this paper we assume we have no parsed sentences.
Generalized Expectation Criteria
If there are constraint functions G for all model feature functions F3, and the target expectations G are estimated from labeled data , then the globally optimal parameter setting under the GE obj ec-tive function is equivalent to the maximum likelihood solution.
Linguistic Prior Knowledge
For some experiments that follow we use “oracle” constraints that are estimated from labeled data .
Related Work
(2006) both use modified forms of self-training to bootstrap parsers from limited labeled data .
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Jing
A multitask transfer learning solution
We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.
A multitask transfer learning solution
It is general for any transfer learning problem with auxiliary labeled data from similar tasks.
Introduction
However, supervised learning heavily relies on a sufficient amount of labeled data for training, which is not always available in practice due to the labor-intensive nature of human annotation.
Introduction
Inspired by recent work on transfer learning and domain adaptation, in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: