Index of papers in Proc. ACL 2009 that mention
  • unlabeled data
Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng
Abstract
We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA.
Abstract
We create a graph for labeled and unlabeled data using match-scores of textual entailment features as similarity weights between data points.
Experiments
We show that as we increase the number of unlabeled data , with our graph-summarization, it is feasible to extract information that can improve the performance of QA models.
Experiments
In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data.
Experiments
Table 3: The effect of number of unlabeled data on MRR from Hybrid graph Summarization SSL.
Graph Summarization
Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
Graph Summarization
Since we do not know estimated local density constraints of unlabeled data points, we use constants to construct local density constraint column vector for X’ dataset as follows:
Introduction
Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
Introduction
We consider situations where there are much more unlabeled data , X U, than labeled data, X L, i.e., 711; << 711].
unlabeled data is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Abstract
We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method.
Abstract
In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection
Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data .
Co-training strategy for prosodic event detection
There are two issues: (1) the accurate self-labeling method for unlabeled data and (2) effective heuristics to se-
Co-training strategy for prosodic event detection
Given a set L of labeled training data and a set U of unlabeled data Randomly select U’ from U, |U’ |=u while iteration < k do Use L to train classifiers h] and [12 Apply h] and 112 to assign labels for all examples in U’ Select n self-labeled samples and add to L Remove these it samples from U Recreate U’ by choosing u instances randomly from U
Conclusions
We introduced a confidence-based method to assign possible labels to unlabeled data and evaluated the performance combined with informative sample selection methods.
Conclusions
This suggests that the use of unlabeled data can lead to significant improvement for prosodic event detection.
Experiments and results
Our goal is to determine whether the co-training algorithm described above could successfully use the unlabeled data for prosodic event detection.
Introduction
We propose a confidence-based method to assign labels to unlabeled data in training iterations and evaluate its performance combined with different informative sample selection methods.
Introduction
Our experiments on the Boston Radio News corpus show that the use of unlabeled data can lead to significant improvement of prosodic event detection compared to using the original small training set, and that the semi-supervised learning result is comparable with supervised learning with similar amount of training data.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
(Ueffing et al., 2007; Haffari et al., 2009) show that treating [0+ as a source for a new feature function in a log-linear model for SMT (Och and Ney, 2004) allows us to maximally take advantage of unlabeled data by finding a weight for this feature using minimum error-rate training (MERT) (Och, 2003).
Introduction
In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data .
Related Work
(Reichart et al., 2008) introduces multitask active learning where unlabeled data require annotations for multiple tasks, e. g. they consider named-entities and parse trees, and showed that multiple tasks helps selection compared to individual tasks.
Sentence Selection: Multiple Language Pairs
Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task.
Sentence Selection: Single Language Pair
The more frequent a phrase (not a phrase pair) is in the unlabeled data , the more important it is to
Sentence Selection: Single Language Pair
know its translation; since it is more likely to see it in test data (specially when the test data is in-domain with respect to unlabeled data ).
Sentence Selection: Single Language Pair
In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
unlabeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Dasgupta, Sajib and Ng, Vincent
Evaluation
We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data .
Evaluation
This could be attributed to (l) the unlabeled data , which may have provided the transductive learner with useful information that are not accessible to the other learners, and (2) the ensemble, which is more noise-tolerant to the imperfect seeds.
Evaluation
Specifically, we used the 500 seeds to guide the selection of active leam-ing points, but trained a transductive SVM using only the active learning points as labeled data (and the rest as unlabeled data ).
Our Approach
Given that we now have a labeled set (composed of 100 manually labeled points selected by active learning and 500 unambiguous points) as well as a larger set of points that are yet to be labeled (i.e., the remaining unlabeled points in the training folds and those in the test fold), we aim to train a better classifier by using a weakly supervised learner to learn from both the labeled and unlabeled data .
Our Approach
points in Lj, where i 75 j) as unlabeled data .
Our Approach
Since the points in the test fold are included in the unlabeled data , they are all classified in this step.
unlabeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Discussion and Related Work
In earlier work on semi-supervised learning, e. g., (Blum and Mitchell 1998), the classifiers learned from unlabeled data were used directly.
Discussion and Related Work
Recent research shows that it is better to use whatever is learned from the unlabeled data as features in a discriminative classifier.
Discussion and Related Work
Wong and Ng (2007) and Suzuki and Isozaki (2008) are similar in that they run a baseline discriminative classifier on unlabeled data to generate pseudo examples, which are then used to train a different type of classifier for the same problem.
Introduction
two-stage strategy: first create word clusters with unlabeled data and then use the clusters as features in supervised training.
unlabeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Abstract
Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.
Experiments of Parsing
4.3 Using Unlabeled Data for Parsing
Experiments of Parsing
Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing
2000) (PDC) as unlabeled data for parsing.
unlabeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Experiments
N0 peak in performance is reached, so further improvements are possible with more unlabeled data .
Experiments
Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work
Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data.
Related Work
Unlike these systems, our efforts are aimed at using unlabeled data to find distributional representations that work well on rare terms, making the supervised systems more applicable to other domains and decreasing their sample complexity.
Smoothing Natural Language Sequences
Using Expectation-Maximization (Dempster et al., 1977), it is possible to estimate the distributions for and P |yi_1) from unlabeled data .
unlabeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Tao and Zhang, Yi and Sindhwani, Vikas
Abstract
We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents.
Conclusion
To more effectively utilize unlabeled data and induce domain-specific adaptation of our models, several extensions are possible: facilitating learning from related domains, incorporating hyperlinks between documents, incorporating synonyms or co-occurences between words etc.
Incorporating Lexical Knowledge
It should be noted, that this list was constructed without a specific domain in mind; which is further motivation for using training examples and unlabeled data to learn domain specific connotations.
Related Work
The goal of the former theme is to learn from few labeled examples by making use of unlabeled data , while the goal of the latter theme is to utilize weak prior knowledge about term-class affinities (e.g., the term “awful” indicates negative sentiment and therefore may be considered as a negatively labeled feature).
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Abstract
To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data .
Evaluation
To get a sense of the accuracy of the bootstrapped documents without further manual labeling, recall that our experimental setup resembles a transductive setting where the test documents are part of the unlabeled data , and consequently, some of them may have been automatically labeled by the bootstrapping algorithm.
Our Bootstrapping Algorithm
To alleviate the data scarcity problem and improve the accuracy of the classifiers, we propose in this section a bootstrapping algorithm that automatically augments a training set by exploiting a large amount of unlabeled data .
Related Work
Minority classes can be expanded without the availability of unlabeled data as well.
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
The Related Work
Typical domain adaptation tasks often assume annotated data in new domain absent or insufficient and a large scale unlabeled data available.
The Related Work
As unlabeled data are concerned, semi-supervised or unsupervised methods will be naturally adopted.
The Related Work
The first is usually focus on exploiting automatic generated labeled data from the unlabeled data (Steedman et al., 2003; McClosky et al., 2006; Reichart and Rappoport, 2007; Sagae and Tsujii, 2007; Chen et al., 2008), the second is on combining supervised and unsupervised methods, and only unlabeled data are considered (Smith and Eisner, 2006; Wang and Schuurmans, 2008; Koo et al., 2008).
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: