SciSurf: Index of "labeled data" in Proc. ACL 2009

Index of papers in Proc. ACL 2009 that mention

labeled data

Seen in text as:

labeled data (92)

Seen in 88 sentences in 11 papers.

1. A Graph-based Semi-Supervised Learning for Question-Answering

Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data , we show improvement in generalization performance over state-of-the-art QA models.
Experiments	In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >\|< 711;, 5% >\|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data .
Experiments	Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM.
Experiments	Especially in Hybrid graph-Summary SSL, Hybrid gSum SSL, when the number of labeled data is small (7133 < 25% >x< 7113) performance improvement is better than rest
Graph Summarization	The labeled data points, i.e., X L, are appended to each of these selected X 5 datasets, X5 = {mi,...mfn_l} U XL.
Graph Summarization	The local density constraints become crucial for inference where summarized labeled data are used instead of overall dataset.
Graph Summarization	As a result q number of summary datasets XS each of which with nb labeled data points are combined to form a representative sample of X, X = {X8 }:=1 reducing the number of data from n to a much smaller number of data, p = q * nb << n. So the new summary of the X can be represented with X = {Xi}§=1.
Introduction	One of the challenges we face with is that we have very limited amount of labeled data , i.e., correctly labeled (true/false entailment) sentences.
Introduction	We consider situations where there are much more unlabeled data, X U, than labeled data , X L, i.e., 711; << 711].
Introduction	— application of a graph-summarization method to enable learning from a very large unlabeled and rather small labeled data , which would not have been feasible for most sophisticated learning tools in section 4.

labeled data is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

2. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification

Dasgupta, Sajib and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Owing to the randomness involved in the choice of labeled data , all baseline results are averaged over ten independent runs for each fold.
Evaluation	We implemented Kamvar et al.’s (2003) semi-supervised spectral clustering algorithm, which incorporates labeled data into the clustering framework in the form of must-link and cannot-link constraints.
Evaluation	We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data.
Introduction	perimental results on five sentiment classification datasets demonstrate that our system can generate high-quality labeled data from unambiguous reviews, which, together with a small number of manually labeled reviews selected by the active learner, can be used to effectively classify ambiguous reviews in a discriminative fashion.
Our Approach	However, in the absence of labeled data , it is not easy to assess feature relevance.
Our Approach	Even if labeled data were present, the ambiguous points might be better handled by a discriminative leam-ing system than a clustering algorithm, as discriminative learners are more sophisticated, and can handle ambiguous feature space more effectively.
Our Approach	In self-training, we iteratively train a classifier on the data labeled so far, use it to classify the unlabeled instances, and augment the labeled data with the most confidently labeled instances.

labeled data is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

3. Semantic Tagging of Web Search Queries

Manshadi, Mehdi and Li, Xiao

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A grammar for semantic tagging	For these reasons, we evaluate our grammar model on the task of automatic tagging of queries for which we have labeled data available.
A grammar for semantic tagging	The model, however, extends the lexicon by including words discovered from labeled data (if available).
A grammar for semantic tagging	It is true that the word “vs” plays a critical role in this query, representing that the user’s intention is to compare the two brands; but as mentioned above in our labeled data such words has left unlabeled.
Discriminative re-ranking	In particular, when there is no or a very small amount of labeled data , a parser could still work by using unsupervised learning approaches to learn the rules, or by simply using a set of hand-built rules (as we did above for the task of semantic tagging).
Discriminative re-ranking	When there is enough labeled data, then a discriminative model can be trained on the labeled data to learn contextual information and to further enhance the tagging performance.
Introduction	Preparing labeled data , however, is very expensive.
Introduction	Therefore in cases where there is no or a small amount of labeled data available, these models do a poor job.
Introduction	As seen later, in the case where there is not a large amount of labeled data available, the parser part is the dominant part of the module and performs reasonably well.
Summary	This is a big advantage of the parser model, because in practice providing labeled data is very expensive but very often the lexicons can be easily extracted from the structured data on the web (for example extracting movie titles from imdb or book titles from Amazon).

labeled data is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Semi-Supervised Cause Identification from Aviation Safety Reports

Persing, Isaac and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data.
Abstract	Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data .
Baseline Approaches	mate goal is to evaluate the effectiveness of our bootstrapping algorithm, the baseline approaches only make use of small amounts of labeled data for acquiring classifiers.
Baseline Approaches	To ensure a fair comparison with the first baseline, we do not employ additional labeled data for parameter tuning; rather, we reserve 25% of the available training data for tuning, and use the remaining 75% for classifier
Introduction	The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
Introduction	Such methods, however, are unlikely to perform equally well for our cause identification task given our small labeled set, as the minority class prediction problem is complicated by the scarcity of labeled data .
Introduction	More specifically, given the scarcity of labeled data , many words that are potentially correlated with a shaper (especially a minority shaper) may not appear in the training set, and the lack of such useful indicators could hamper the acquisition of an accurate classifier via supervised learning techniques.
Our Bootstrapping Algorithm	One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data .
Our Bootstrapping Algorithm	The situation is somewhat aggravated by the fact that we are adopting a one-versus-all scheme for generating training instances for a particular shaper, which, together with the small amount of labeled data , implies that only a couple of positive instances may be available for training the classifier for a minority class.
Our Bootstrapping Algorithm	The reason we impose the “at least three” requirement is precision: we want to ensure, with a reasonable level of confidence, that the unlabeled documents chosen to augment P should indeed be labeled with the shaper under consideration, as incorrectly labeled documents would contaminate the labeled data, thus accelerating the deterioration of the quality of the automatically labeled data in subsequent bootstrapping iterations and adversely affecting the accuracy of the classifier trained on it (Pierce and Cardie, 2001).

labeled data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

5. Active Learning for Multilingual Statistical Machine Translation

Haffari, Gholamreza and Sarkar, Anoop

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

AL-SMT: Multilingual Setting	When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
Experiments	We subsampled 5,000 sentences as the labeled data 11.. and 20,000 sentences as [U for the pool of untranslated sentences (while hiding the English part).
Introduction	However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
Introduction	In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
Introduction	In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
Sentence Selection: Single Language Pair	The more frequent a phrase is in the labeled data , the more unimportant it is; since probably we have observed most of its translations.
Sentence Selection: Single Language Pair	In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
Sentence Selection: Single Language Pair	two multinomials, one for labeled data and the other one for unlabeled data.

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In chunking, there is a clear trend toward larger increases in performance as words become rarer in the labeled data set, from a 0.02 improvement on words of frequency 2, to an improvement of 0.21 on OOV words.
Experiments	To measure the sample complexity of the supervised CRF, we use the same experimental setup as in the chunking experiment on WSJ text, but we vary the amount of labeled data available to the CRF.
Experiments	Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work	Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data .
Related Work	Our technique lets the HMM find parameters that maximize cross-entropy, and then uses labeled data to learn the best mapping from the HMM categories to the POS categories.
Related Work	Our technique uses unlabeled training data from the target domain, and is thus applicable more generally, including in web processing, where the domain and vocabulary is highly variable, and it is extremely difficult to obtain labeled data that is representative of the test distribution.

labeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

7. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm

Jeon, Je Hun and Liu, Yang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection	Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data.
Conclusions	In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results	Among labeled data , 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
Experiments and results	We can see that the performance of co-training for these three tasks is slightly worse than supervised learning using all the labeled data, but is significantly better than the original performance using 3% of hand labeled data .

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Related Work	Ando and Zhang (2005) defined an objective function that combines the original problem on the labeled data with a set of auxiliary problems on unlabeled data.
Discussion and Related Work	One is to leverage a large amount of unsupervised data to train an adequate classifier with a small amount of labeled data .
Introduction	While the labeled data is generally very costly to obtain, there is a vast amount of unlabeled textual data freely available on the web.
Introduction	Under this approach, even if a word is not found in the training data, it may still fire cluster-based features as long as it shares cluster assignments with some words in the labeled data .
Introduction	Since the clusters are obtained without any labeled data , they may not correspond directly to concepts that are useful for decision making in the problem domain.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments of Parsing	Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing	Table 7 shows the performance of self-trained generative parser and updated reranker on the test set, with CTB and CDTfs as labeled data .
Experiments of Parsing	All the works in Table 8 used CTB articles 1-270 as labeled data .
Introduction	It is important to acquire additional labeled data for the target grammar parsing through exploitation of existing source treebanks since there is often a shortage of labeled data .
Introduction	When coupled with self-training technique, a reranking parser with CTB and converted CDT as labeled data achieves 85.2% f-score on CTB test set, an absolute 1.0% improvement (6% error reduction) over the previous best result for Chinese parsing.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria

Druck, Gregory and Mann, Gideon and McCallum, Andrew

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generalized Expectation Criteria	2In general, the objective function could also include the likelihood of available labeled data , but throughout this paper we assume we have no parsed sentences.
Generalized Expectation Criteria	If there are constraint functions G for all model feature functions F3, and the target expectations G are estimated from labeled data , then the globally optimal parameter setting under the GE obj ec-tive function is equivalent to the maximum likelihood solution.
Linguistic Prior Knowledge	For some experiments that follow we use “oracle” constraints that are estimated from labeled data .
Related Work	(2006) both use modified forms of self-training to bootstrap parsers from limited labeled data .

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction

Jiang, Jing

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A multitask transfer learning solution	We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.
A multitask transfer learning solution	It is general for any transfer learning problem with auxiliary labeled data from similar tasks.
Introduction	However, supervised learning heavily relies on a sufficient amount of labeled data for training, which is not always available in practice due to the labor-intensive nature of human annotation.
Introduction	Inspired by recent work on transfer learning and domain adaptation, in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: