Index of papers in Proc. ACL that mention

labeled data

Seen in text as:

labeled data (476)
Labeled Data (17)
labelled data (5)
Labeled data (4)

Seen in 463 sentences in 54 papers.

1. Semantic Tagging of Web Search Queries

Manshadi, Mehdi and Li, Xiao

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A grammar for semantic tagging	For these reasons, we evaluate our grammar model on the task of automatic tagging of queries for which we have labeled data available.
A grammar for semantic tagging	The model, however, extends the lexicon by including words discovered from labeled data (if available).
A grammar for semantic tagging	It is true that the word “vs” plays a critical role in this query, representing that the user’s intention is to compare the two brands; but as mentioned above in our labeled data such words has left unlabeled.
Discriminative re-ranking	In particular, when there is no or a very small amount of labeled data , a parser could still work by using unsupervised learning approaches to learn the rules, or by simply using a set of hand-built rules (as we did above for the task of semantic tagging).
Discriminative re-ranking	When there is enough labeled data, then a discriminative model can be trained on the labeled data to learn contextual information and to further enhance the tagging performance.
Introduction	Preparing labeled data , however, is very expensive.
Introduction	Therefore in cases where there is no or a small amount of labeled data available, these models do a poor job.
Introduction	As seen later, in the case where there is not a large amount of labeled data available, the parser part is the dominant part of the module and performs reasonably well.
Summary	This is a big advantage of the parser model, because in practice providing labeled data is very expensive but very often the lexicons can be easily extracted from the structured data on the web (for example extracting movie titles from imdb or book titles from Amazon).

labeled data is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Chinese) using labeled data in the source language (e.g.
Abstract	Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
Abstract	Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
Introduction	Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
Introduction	One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
Introduction	First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data .

labeled data is mentioned in 58 sentences in this paper.

Topics mentioned in this paper:

3. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
Introduction	In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain.
Introduction	lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain.
Introduction	labeled data be available in the target domain.

labeled data is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

4. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains.
Experiments	Figure 3: Effect of source domain labeled data .
Experiments	To investigate the impact of the quantity of source domain labeled data on our method, we vary the amount of data from zero to 800 reviews, with equal amounts of positive and negative labeled data .
Experiments	Note that source domain labeled data is used both to create the sentiment sensitive thesaurus as well as to train the sentiment classifier.
Introduction	Supervised learning algorithms that require labeled data have been successfully used to build sentiment classifiers for a specific domain (Pang et al., 2002).
Introduction	positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains.
Introduction	In particular, no labeled data is provided for the target domain.

labeled data is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

5. Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification

Li, Shoushan and Huang, Chu-Ren and Zhou, Guodong and Lee, Sophia Yat Mei

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	It is well-known that sentiment classification is very domain-specific (Blitzer et al., 2007), so it is critical to eliminate its dependence on a large-scale labeled data for its wide applications.
Related Work	Supervised methods consider sentiment classification as a standard classification problem in which labeled data in a domain are used to train a domain-specific classifier.
Unsupervised Mining of Personal and Impersonal Views	The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
Unsupervised Mining of Personal and Impersonal Views	Input: The labeled data L containing personal sentence set Sbmmwl and impersonal sentence set
Unsupervised Mining of Personal and Impersonal Views	U — persona SU —impersonal Output: New labeled data L Procedure:

labeled data is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

6. Semi-Supervised Cause Identification from Aviation Safety Reports

Persing, Isaac and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data.
Abstract	Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data .
Baseline Approaches	mate goal is to evaluate the effectiveness of our bootstrapping algorithm, the baseline approaches only make use of small amounts of labeled data for acquiring classifiers.
Baseline Approaches	To ensure a fair comparison with the first baseline, we do not employ additional labeled data for parameter tuning; rather, we reserve 25% of the available training data for tuning, and use the remaining 75% for classifier
Introduction	The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
Introduction	Such methods, however, are unlikely to perform equally well for our cause identification task given our small labeled set, as the minority class prediction problem is complicated by the scarcity of labeled data .
Introduction	More specifically, given the scarcity of labeled data , many words that are potentially correlated with a shaper (especially a minority shaper) may not appear in the training set, and the lack of such useful indicators could hamper the acquisition of an accurate classifier via supervised learning techniques.
Our Bootstrapping Algorithm	One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data .
Our Bootstrapping Algorithm	The situation is somewhat aggravated by the fact that we are adopting a one-versus-all scheme for generating training instances for a particular shaper, which, together with the small amount of labeled data , implies that only a couple of positive instances may be available for training the classifier for a minority class.
Our Bootstrapping Algorithm	The reason we impose the “at least three” requirement is precision: we want to ensure, with a reasonable level of confidence, that the unlabeled documents chosen to augment P should indeed be labeled with the shaper under consideration, as incorrectly labeled documents would contaminate the labeled data, thus accelerating the deterioration of the quality of the automatically labeled data in subsequent bootstrapping iterations and adversely affecting the accuracy of the classifier trained on it (Pierce and Cardie, 2001).

labeled data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

7. Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Joint Model with Unlabeled Parallel Text	where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
A Joint Model with Unlabeled Parallel Text	By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
A Joint Model with Unlabeled Parallel Text	where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Abstract	We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data.
Introduction	Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Introduction	The proposed maximum entropy-based EM approach jointly learns two monolingual sentiment classifiers by treating the sentiment labels in the unlabeled parallel text as unobserved latent variables, and maximizes the regularized joint likelihood of the language-specific labeled data together with the inferred sentiment labels of the parallel text.

labeled data is mentioned in 37 sentences in this paper.

Topics mentioned in this paper:

8. Collective Tweet Wikification based on Semi-supervised Graph Regularization

Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In addition, it is challenging to generate sufficient high quality labeled data for supervised models with low cost.
Abstract	Compared to the state-of-the-art supervised model trained from 100% labeled data, our proposed approach achieves comparable performance with 31% labeled data and obtains 5% absolute Fl gain with 50% labeled data .
Conclusions	By studying three novel fine-grained relations, detecting semantically-related information with semantic meta paths, and exploiting the data manifolds in both unlabeled and labeled data for collective inference, our work can dramatically save annotation cost and achieve better performance, thus shed light on the challenging wikification task for tweets.
Experiments	In comparision with the supervised baseline proposed by (Meij et al., 2012), our model SSRega1 relying on local compatibility already achieves comparable performance with 50% of labeled data .
Experiments	We can easily see that our proposed approach using 50% labeled data achieves similar performance with the state-of-the-art supervised model with 100% labeled data .
Experiments	6.4 Effect of Labeled Data Size
Introduction	Sufficient labeled data is crucial for supervised models.
Introduction	In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data .

labeled data is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

9. Robust Domain Adaptation for Relation Extraction via Clustering Consistency

Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions.
Introduction	However, most supervised learning algorithms require adequate labeled data for every relation type to be extracted.
Introduction	Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data .
Introduction	Together with imbalanced relation distributions inherent in the domain, this can cause some rarer relations to constitute only a very small proportion of the labeled data set.
Problem Statement	The target domain has a few labeled data D; 2 {(xi, yi)}:.
Problem Statement	For the sth source domain, we have an adequate labeled data set DS.
Related Work	However, purely supervised relation extraction methods assume the availability of sufficient labeled data , which may be costly to obtain for new domains.
Related Work	We address this by augmenting a small labeled data set with other information in the domain adaptation setting.
Related Work	To create labeled data , the texts are dependency-parsed, and the domain-independent patterns on the parses form the basis for extractions.
Robust Domain Adaptation	By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data .

labeled data is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

10. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification

Dasgupta, Sajib and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Owing to the randomness involved in the choice of labeled data , all baseline results are averaged over ten independent runs for each fold.
Evaluation	We implemented Kamvar et al.’s (2003) semi-supervised spectral clustering algorithm, which incorporates labeled data into the clustering framework in the form of must-link and cannot-link constraints.
Evaluation	We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data.
Introduction	perimental results on five sentiment classification datasets demonstrate that our system can generate high-quality labeled data from unambiguous reviews, which, together with a small number of manually labeled reviews selected by the active learner, can be used to effectively classify ambiguous reviews in a discriminative fashion.
Our Approach	However, in the absence of labeled data , it is not easy to assess feature relevance.
Our Approach	Even if labeled data were present, the ambiguous points might be better handled by a discriminative leam-ing system than a clustering algorithm, as discriminative learners are more sophisticated, and can handle ambiguous feature space more effectively.
Our Approach	In self-training, we iteratively train a classifier on the data labeled so far, use it to classify the unlabeled instances, and augment the labeled data with the most confidently labeled instances.

labeled data is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

11. A Graph-based Semi-Supervised Learning for Question-Answering

Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data , we show improvement in generalization performance over state-of-the-art QA models.
Experiments	In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >\|< 711;, 5% >\|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data .
Experiments	Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM.
Experiments	Especially in Hybrid graph-Summary SSL, Hybrid gSum SSL, when the number of labeled data is small (7133 < 25% >x< 7113) performance improvement is better than rest
Graph Summarization	The labeled data points, i.e., X L, are appended to each of these selected X 5 datasets, X5 = {mi,...mfn_l} U XL.
Graph Summarization	The local density constraints become crucial for inference where summarized labeled data are used instead of overall dataset.
Graph Summarization	As a result q number of summary datasets XS each of which with nb labeled data points are combined to form a representative sample of X, X = {X8 }:=1 reducing the number of data from n to a much smaller number of data, p = q * nb << n. So the new summary of the X can be represented with X = {Xi}§=1.
Introduction	One of the challenges we face with is that we have very limited amount of labeled data , i.e., correctly labeled (true/false entailment) sentences.
Introduction	We consider situations where there are much more unlabeled data, X U, than labeled data , X L, i.e., 711; << 711].
Introduction	— application of a graph-summarization method to enable learning from a very large unlabeled and rather small labeled data , which would not have been feasible for most sophisticated learning tools in section 4.

labeled data is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

12. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Datasets	(2006), we use sections 2 — 21 from Wall Street Journal (WSJ) as the source domain labeled data .
Distribution Prediction	Our distribution prediction learning method is unsupervised in the sense that it does not require manually labeled data for a particular task from any of the domains.
Domain Adaptation	The main reason that a model trained only on the source domain labeled data performs poorly in the target domain is the feature mismatch — few features in target domain test instances appear in source domain training instances.
Experiments and Results	For each domain, the accuracy obtained y a classifier trained using labeled data from that
Experiments and Results	This upper baseline represents the classification accuracy we could hope to obtain if we were to have labeled data for the target domain.
Introduction	Our proposed cross-domain word distribution prediction method is unsupervised in the sense that it does not require any labeled data in either of the two steps.
O \	Unlike our distribution prediction method, which is unsupervised, SST requires labeled data for the source domain to learn a feature mapping between a source and a target domain in the form of a thesaurus.
Related Work	(2006) append the source domain labeled data with predicted pivots (i.e.
Related Work	The unsupervised DA setting that we consider does not assume the availability of labeled data for the target domain.
Related Work	However, if a small amount of labeled data is available for the target domain, it can be used to further improve the performance of DA tasks (Xiao et al., 2013; Daume III, 2007).

labeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

13. Scaling Semi-supervised Naive Bayes with Feature Marginals

Lucas, Michael and Downey, Doug

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available.
Introduction	Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data , to learn a target function (Zhu, 2006; Chapelle et al., 2006).
Introduction	Then, for a given target class and labeled data set, we utilize the statistics to improve a classifier.
Introduction	The marginal statistics are used as a constraint to improve the class-conditional probability estimates P (w \| +) and P (w \| —) for the positive and negative classes, which are often noisy when estimated over sparse labeled data sets.
Problem Definition	In particular, SFE uses the equality P(+\|w) = P(+, and estimates the rhs using P computed over all the unlabeled data, rather than using only labeled data as in standard MNB.
Problem Definition	Further, it can be shown that as P(w) of a word 21) in the unlabeled data becomes larger than that in the labeled data , SFE’s estimate of the ratio P(w\|+) /P(w\|—) approaches one.
Problem Definition	Depending on the labeled data , such an estimate can be arbitrarily inaccurate.

labeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

14. Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing

Li, Zhenghua and Zhang, Min and Chen, Wenliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
Ambiguity-aware Ensemble Training	not sufficiently covered in manually labeled data .
Ambiguity-aware Ensemble Training	Since 13’ contains much more instances than D (1.7M vs. 40K for English, and 4M vs. 16K for Chinese), it is likely that the unlabeled data may overwhelm the labeled data during SGD training.
Ambiguity-aware Ensemble Training	1: Input: Labeled data D = {(Xi,di)}£\;1, and unlabeled data 13’ = {(u,,v,)}§‘:1;Parameters: I, N1, M1, b : Output: w : Initialization: Wm) = O, k: = 0; : forz' = 1 to I do {iterations} Randomly select N1 instances from D and M1 instances from D’ to compose a new dataset Di, and shuffle it.
Conclusions	The training objective is to maximize the mixed likelihood of both the labeled data and the auto-parsed unlabeled data with ambiguous labelings.
Introduction	Such sentences can provide more discriminative instances for training which may be unavailable in labeled data .
Introduction	Evaluation on labeled data shows the oracle accuracy of parse forest is much higher than that of l-best outputs of single parsers (see Table 3).
Introduction	Finally, using a conditional random field (CRF) based probabilistic parser, we train a better model by maximizing mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

15. Active Learning for Multilingual Statistical Machine Translation

Haffari, Gholamreza and Sarkar, Anoop

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

AL-SMT: Multilingual Setting	When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
Experiments	We subsampled 5,000 sentences as the labeled data 11.. and 20,000 sentences as [U for the pool of untranslated sentences (while hiding the English part).
Introduction	However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
Introduction	In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
Introduction	In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
Sentence Selection: Single Language Pair	The more frequent a phrase is in the labeled data , the more unimportant it is; since probably we have observed most of its translations.
Sentence Selection: Single Language Pair	In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
Sentence Selection: Single Language Pair	two multinomials, one for labeled data and the other one for unlabeled data.

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

16. Infusion of Labeled Data into Distant Supervision for Relation Extraction

Pershina, Maria and Min, Bonan and Xu, Wei and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	However, in some cases a small amount of human labeled data is available.
Available at http://nlp. stanford.edu/software/mimlre. shtml.	Thus, our approach outperforms state-of-the-art model for relation extraction using much less labeled data that was used by Zhang et al., (2012) to outper-
Introduction	In this paper, we present the first effective approach, Guided DS (distant supervision), to incorporate labeled data into distant supervision for extracting relations from sentences.
Introduction	(2012), we generalize the labeled data through feature selection and model this additional information directly in the latent variable approaches.
Introduction	While prior work employed tens of thousands of human labeled examples (Zhang et al., 2012) and only got a 6.5% increase in F-score over a logistic regression baseline, our approach uses much less labeled data (about 1/8) but achieves much higher improvement on performance over stronger baselines.
The Challenge	Simply taking the union of the hand-labeled data and the corpus labeled by distant supervision is not effective since hand-labeled data will be swamped by a larger amount of distantly labeled data .
The Challenge	An effective approach must recognize that the hand-labeled data is more reliable than the automatically labeled data and so must take precedence in cases of conflict.
The Challenge	Instead we propose to perform feature selection to generalize human labeled data into training guidelines, and integrate them into latent variable model.
Training	Upsam—pling the labeled data did not improve the performance either.

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

17. Reducing Wrong Labels in Distant Supervision for Relation Extraction

Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This noisy labeled data causes poor extraction performance.
Experiments	We compared the following methods: logistic regression with the labeled data cleaned by the proposed method (PROP), logistic regression with the standard DS labeled data (LR), and MultiR proposed in (Hoffmann et al., 2011) as a state-of-the-art multi-instance learning system.7 For logistic regression, when more than one relation is assigned to a sentence, we simply copied the feature vector and created a training example for each relation.
Introduction	Supervised approaches are limited in scalability because labeled data is expensive to produce.
Introduction	A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009).
Introduction	However, the DS assumption can fail, which results in noisy labeled data and this causes poor extraction performance.
Knowledge-based Distant Supervision	DS uses a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs.
Related Work	Previous works (Hoffmann et al., 2010; Yao et al., 2010) have pointed out that the DS assumption generates noisy labeled data , but did not directly address the problem.
Wrong Label Reduction	labeled data generated by DS: LD
Wrong Label Reduction	For relation extraction, we train a classifier for entity pairs using the resultant labeled data .

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

18. Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

Titov, Ivan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation	For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains.
Empirical Evaluation	We compare them with two supervised methods: a supervised model (Base) which is trained on the source domain data only, and another supervised model (In-domain) which is learned on the labeled data from the target domain.
Introduction	In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain.
Introduction	We use generative latent variable models (LVMs) learned on all the available data: unlabeled data for both domains and on the labeled data for the source domain.
Related Work	Second, their expectation constraints are estimated from labeled data , whereas we are trying to match expectations computed on unlabeled data for two domains.
Related Work	This approach bears some similarity to the adaptation methods standard for the setting where labelled data is available for both domains (Chelba and Acero, 2004; Daume and Marcu, 2006).
The Latent Variable Model	1Among the versions which do not exploit labeled data from the target domain.
The Latent Variable Model	The parameters of this model 6 = (12,10) can be estimated by maximizing joint likelihood L(6) of labeled data for the source domain {330), y(l)}l€3L
The Latent Variable Model	However, given that, first, amount of unlabeled data \|SU U TU\| normally vastly exceeds the amount of labeled data \|SL\| and, second, the number of features for each example \|a3(l)\| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e.

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

19. A Joint Model for Discovery of Aspects in Utterances

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data ).
Data and Approach Overview	Our algorithm assigns domairfldialog-act/slot labels to each topic at each layer in the hierarchy using labeled data (explained in §4.)
Experiments	Here, we not only want to demonstrate the performance of each component of MCM but also their performance under limited amount of labeled data .
Experiments	When the number of labeled data is small (niL £25%*nL), our WebPrior—MCM has a better performance on domain and act predictions compared to the two baselines.
Experiments	Adding labeled data improves the performance of all models however supervised models benefit more compared to MCM models.
Introduction	The contributions of this paper are as follows: (i) construction of a novel Bayesian framework for semantic parsing of natural language (NL) utterances in a unifying framework in §4, (ii) representation of seed labeled data and information from web queries as informative prior to design a novel utterance understanding model in §3 & §4, (iii) comparison of our results to supervised sequential and joint learning methods on NL utterances in §5.
Introduction	We conclude that our generative model achieves noticeable improvement compared to discriminative models when labeled data is scarce.

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

20. Bootstrapping coreference resolution using word associations

Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show that this unsuperVised system has better CoRe performance than other learning approaches that do not use manually labeled data .
Introduction	Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data .
Introduction	Self-training approaches usually include the use of some manually labeled data .
Introduction	In contrast, our self-trained system is not trained on any manually labeled data and is therefore a completely unsupervised system.
Related Work	not with approaches that make some limited use of labeled data .
Results and Discussion	Thus, this comparison of ACE-2/Ontonotes results is evidence that in a realistic scenario using association information in an unsupervised self-trained system is almost as good as a system trained on manually labeled data .
System Architecture	Automatically Labeled Data

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

21. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity.
Experiments	First a supervised learning algorithm is used to build a CRF model based on the labeled data .
Introduction	Thus, each latent semantic class corresponds to one of the semantic tags found in labeled data .
Markov Topic Regression - MTR	We assume a fixed K topics corresponding to semantic tags of labeled data .
Markov Topic Regression - MTR	K latent topics to the K semantic tags of our labeled data .
Markov Topic Regression - MTR	labeled data , 712?, based on the log-linear model in Eq.
Semi-Supervised Semantic Labeling	(5) is the loss on the labeled data and £2 regularization on parameters, Ag), from nth iteration, same as standard CRF.
Semi-Supervised Semantic Labeling	The labeled rows ml of the vocabulary matrix, m={wl,m“}, contain only {0,1} values, indicating the word’s observed semantic tags in the labeled data .

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

22. Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition

Lan, Man and Xu, Yu and Niu, Zhengyu

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To overcome the shortage of labeled data for implicit discourse relation recognition, previous works attempted to automatically generate training data by removing explicit discourse connectives from sentences and then built models on these synthetic implicit examples.
Implementation Details of Multitask Learning Method	nectives and relations in PDTB and generate synthetic labeled data by removing the connectives.
Implementation Details of Multitask Learning Method	BLLIP North American News Text (Complete) is used as unlabeled data source to generate synthetic labeled data .
Implementation Details of Multitask Learning Method	In comparison with the synthetic labeled data generated from the explicit relations in PDTB, the synthetic labeled data from BLLIP contains more noise.
Multitask Learning for Discourse Relation Prediction	Following these two principles, we create the auxiliary tasks by generating automatically labeled data as follows.
Multitask Learning for Discourse Relation Prediction	Previous work (Marcu and Echihabi, 2002) and (Sporleder and Lascarides, 2008) adopted predefined pattern-based approach to generate synthetic labeled data , where each predefined pattern has one discourse relation label.
Multitask Learning for Discourse Relation Prediction	In contrast, we adopt an automatic approach to generate synthetic labeled data , where each discourse connective between two texts serves as their relation label.

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

23. Medical Relation Extraction with Manifold Models

Wang, Chang and Fan, James

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Recently, “distant supervision” has emerged to be a popular choice for training relation extractors without using manually labeled data (Mintz et al., 2009; J iang, 2009; Chan and Roth, 2010; Wang et al., 2011; Riedel et al., 2010; Ji et al., 2011; Hoffmann et al., 2011; Sur-deanu et al., 2012; Takamatsu et al., 2012; Min et al., 2013).
Experiments	(1) Manifold Unlabeled: We combined the labeled data and unlabeled set 1 in training.
Experiments	(2) Manifold Predicted Labels: We combined labeled data and unlabeled set 2 in training.
Experiments	beled data and the data from unlabeled set 2 was used as labeled data (With Weights).
Identifying Key Medical Relations	Our current strategy is to integrate all associated types, and rely on the relation detector trained with the labeled data to decide how to weight different types based upon the context.
Introduction	When we build a naive model to detect relations, the model tends to overfit for the labeled data .
Relation Extraction with Manifold Models	Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient.

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

24. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In the past years, several proposed supervised joint models (Ng and Low, 2004; Zhang and Clark, 2008; Jiang et al., 2009; Zhang and Clark, 2010) achieved reasonably accurate results, but the outstanding problem among these models is that they rely heavily on a large amount of labeled data , i.e., segmented texts with POS tags.
Introduction	However, the production of such labeled data is extremely time-consuming and expensive (Jiao et al., 2006; J iang et al., 2009).
Introduction	Motivated by the works in (Subramanya et al., 2010; Das and Smith, 2011), for structured problems, graph-based label propagation can be employed to infer valuable syntactic information (n-gram-level label distributions) from labeled data to unlabeled data.
Method	It is especially helpful for the graph to make connections with trigrams that may not have been seen in labeled data but have similar label information.
Method	The first term in Equation (5) is the same as Equation (2), which is the traditional CRFs leam-ing objective function on the labeled data .
Method	To satisfy the characteristic of the semi-supervised learning problem, the train set, i.e., the labeled data , is formed by a relatively small amount of annotated texts sampled from CTB-7.

labeled data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

25. Semi-Supervised Convex Training for Dependency Parsing

Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	One obvious direction is to use the whole Penn Treebank as labeled data and use some other unannotated data source as unlabeled data for semi-supervised training.
Efficient Optimization Strategy	0 Step 2, based on the learned parameter weights from the labeled data , update 6 and Yj on each unlabeled sentence alternatively:
Introduction	However, a key drawback of supervised training algorithms is their dependence on labeled data , which is usually very difficult to obtain.
Introduction	This loss function has the advantage that the entire training objective on both the labeled and unlabeled data now becomes convex, since it consists of a convex structured large margin loss on labeled data and a convex least squares loss on unlabeled data.
Introduction	In particular, we investigate a semi-supervised approach for structured large margin training, where the objective is a combination of two convex functions, the structured large margin loss on labeled data and the least squares loss on unlabeled data.
Semi-supervised Convex Training for Structured SVM	for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data.
Semi-supervised Convex Training for Structured SVM	By combining the convex structured SVM loss on labeled data (shown in Equation (5)) and the convex least squares loss on unlabeled data (shown in Equation (8)), we obtain a semi-supervised structured large margin loss

labeled data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

26. Spice it up? Mining Refinements to Online Instructions from User Generated Content

Druck, Gregory and Pang, Bo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Labeled data is not readily available for these tasks, so we focus on the unsupervised setting.
Experiments	7Note that we do not consider this performance to be the upper—bound of supervised approaches; clearly, supervised approaches could benefit from additional labeled data .
Experiments	However, labeled data is relatively expensive to obtain for this task.
Introduction	There is no existing labeled data for the tasks of interest, and we would like the methods we develop to be easily applied in multiple domains.
Introduction	Motivated by this, we propose a generative model for solving these tasks jointly without labeled data .
Models	To identify refinements without labeled data , we propose a generative model of reviews (or more generally documents) with latent variables.

labeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

27. Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization

Yang, Bishan and Cardie, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
Approach	PR makes the assumption that the labeled data we have is not enough for learning good model parameters, but we have a set of constraints on the posterior distribution of the labels.
Experiments	For the MD dataset, we also used the dvd domain as additional labeled data for developing the constraints.
Experiments	We found that the PR model is able to correct many CRF errors caused by the lack of labeled data .
Experiments	However, with limited labeled data , the CRF learner can only associate very weak sentiment signals to these features.
Related Work	Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data , instead, distant supervision mainly comes from linguistically-motivated constraints.

labeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

28. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In chunking, there is a clear trend toward larger increases in performance as words become rarer in the labeled data set, from a 0.02 improvement on words of frequency 2, to an improvement of 0.21 on OOV words.
Experiments	To measure the sample complexity of the supervised CRF, we use the same experimental setup as in the chunking experiment on WSJ text, but we vary the amount of labeled data available to the CRF.
Experiments	Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work	Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data .
Related Work	Our technique lets the HMM find parameters that maximize cross-entropy, and then uses labeled data to learn the best mapping from the HMM categories to the POS categories.
Related Work	Our technique uses unlabeled training data from the target domain, and is thus applicable more generally, including in web processing, where the domain and vocabulary is highly variable, and it is extremely difficult to obtain labeled data that is representative of the test distribution.

labeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

29. Together We Can: Bilingual Bootstrapping for WSD

Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual Bootstrapping	Algorithm 1 Bilingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 UD1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2
Bilingual Bootstrapping	These projected models are then applied to the untagged data of L1 and L2 and the instances which get labeled with a high confidence are added to the labeled data of the respective languages.
Bilingual Bootstrapping	Algorithm 2 Monolingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 U D1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2
Experimental Setup	In each iteration only those words for which P(assigned_sense\|word) > 0.6 get moved to the labeled data .
Experimental Setup	Hence, we used a fixed threshold of 0.6 so that in each iteration only those words get moved to the labeled data for which the assigned sense is clearly a majority sense (P > 0.6).

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Wordnet (14)
synset (13)
F-score (12)

30. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Capturing Paradigmatic Relations via Word Clustering	Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010).
Capturing Paradigmatic Relations via Word Clustering	Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data .
Capturing Paradigmatic Relations via Word Clustering	In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm.
State-of-the-Art	In this paper, we use CTB 6.0 as the labeled data for the study.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We employ stacking models to incorporate features derived from heterogeneous analysis and apply them to convert heterogeneous labeled data for retraining.
Data-driven Annotation Conversion	It is possible to acquire high quality labeled data for a specific annotation standard by exploring existing heterogeneous corpora, since the annotations are normally highly compatible.
Data-driven Annotation Conversion	Moreover, the exploitation of additional (pseudo) labeled data aims to reduce the estimation error and enhances a NLP system in a different way from stacking.
Data-driven Annotation Conversion	_ CTa / _ gppd—wtb qu1re a new labeled data set D Ctb — Dppdflctb

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

32. Semi-Supervised SimHash for Efficient Document Similarity Search

Jiang, Qixia and Sun, Maosong

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	This is implemented by maximizing the empirical accuracy on the prior knowledge ( labeled data ) and the entropy of hash functions (estimated over labeled and unlabeled data).
Semi-Supervised SimHash	Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data , c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data.
Semi-Supervised SimHash	Given the labeled data XL, we construct two sets, attraction set @a and repulsion set 9?.
Semi-Supervised SimHash	Furthermore, we also hope to maximize the empirical accuracy on the labeled data @a and @r and
The direction is determined by concatenating w L times.	This is implemented by maximizing the empirical accuracy on labeled data together with the entropy of hash functions.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

33. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm

Jeon, Je Hun and Liu, Yang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection	Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data.
Conclusions	In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results	Among labeled data , 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
Experiments and results	We can see that the performance of co-training for these three tasks is slightly worse than supervised learning using all the labeled data, but is significantly better than the original performance using 3% of hand labeled data .

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

34. Query Weighting for Ranking Model Adaptation

Cai, Peng and Gao, Wei and Zhou, Aoying and Wong, Kam-Fai

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	All ranking models above were trained only on source domain training data and the labeled data of target domain was just used for testing.
Instance Weighting Scheme Review	(J iang and Zhai, 2007) used a small number of labeled data from target domain to weight source instances.
Introduction	To alleviate the lack of training data in the target domain, many researchers have proposed to transfer ranking knowledge from the source domain with plenty of labeled data to the target domain where only a few or no labeled data is available, which is known as ranking model adaptation (Chen et al., 2008a; Chen et al., 2010; Chen et al., 2008b; Geng et al., 2009; Gao et al., 2009).
Related Work	In (Geng et al., 2009; Chen et al., 2008b), the parameters of ranking model trained on the source domain was adjusted with the small set of labeled data in the target domain.
Related Work	al., 2008a) weighted source instances by using small amount of labeled data in the target domain.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

35. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Related Work	Ando and Zhang (2005) defined an objective function that combines the original problem on the labeled data with a set of auxiliary problems on unlabeled data.
Discussion and Related Work	One is to leverage a large amount of unsupervised data to train an adequate classifier with a small amount of labeled data .
Introduction	While the labeled data is generally very costly to obtain, there is a vast amount of unlabeled textual data freely available on the web.
Introduction	Under this approach, even if a word is not found in the training data, it may still fire cluster-based features as long as it shares cluster assignments with some words in the labeled data .
Introduction	Since the clusters are obtained without any labeled data , they may not correspond directly to concepts that are useful for decision making in the problem domain.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

36. Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Due to these reasons, there is a lack of sufficient and high quality labeled data for emotion research.
Experiments	Since in real world applications people are primarily concerned with how well the algorithm will work for new TV shows or movies that may not be included in the training data, we defined a test fold for each TV show or movie in our labeled data set.
Experiments	Each test fold corresponded to a training fold containing all the labeled data from all the other TV shows and movies.
Introduction	An active learner uses a small set of labeled data to iteratively select the most informative instances from a large pool of unlabeled data for human annotators to label (Settles, 2010).
Related Work	In Active Learning (Settles, 2010) a small set of labeled data is used to find documents that should be annotated from a large pool of unlabeled documents.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

37. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments of Parsing	Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing	Table 7 shows the performance of self-trained generative parser and updated reranker on the test set, with CTB and CDTfs as labeled data .
Experiments of Parsing	All the works in Table 8 used CTB articles 1-270 as labeled data .
Introduction	It is important to acquire additional labeled data for the target grammar parsing through exploitation of existing source treebanks since there is often a shortage of labeled data .
Introduction	When coupled with self-training technique, a reranking parser with CTB and converted CDT as labeled data achieves 85.2% f-score on CTB test set, an absolute 1.0% improvement (6% error reduction) over the previous best result for Chinese parsing.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

38. Which Noun Phrases Denote Which Concepts?

Krishnamurthy, Jayant and Mitchell, Tom

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data .
ConceptResolver	A is re-selected on each iteration because the initial labeled data set is extremely small, so the initial validation set is not necessarily representative of the actual data.
ConceptResolver	l. Initialize labeled data L with 10 positive and 50 negative examples (pairs of senses)
Prior Work	These approaches use large amounts of labeled data , which can be difficult to create.

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The data set consists of labelled data for both the source (Wall Street Journal portion of the Penn Treebank) and target (web) domains.
Experiments	Participants are not allowed to use web-domain labelled data for training.
Experiments	In addition to labelled data , a large amount of unlabelled data on the web domain is also provided.
Introduction	The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data).

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction

Jiang, Jing

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A multitask transfer learning solution	We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.
A multitask transfer learning solution	It is general for any transfer learning problem with auxiliary labeled data from similar tasks.
Introduction	However, supervised learning heavily relies on a sufficient amount of labeled data for training, which is not always available in practice due to the labor-intensive nature of human annotation.
Introduction	Inspired by recent work on transfer learning and domain adaptation, in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The proposed approach trains a character-based and word-based model on labeled data , respectively, as the initial models.
Introduction	The proposed approach begins by training a character-based and word-based model on labeled data respectively, and then both models are regularized from each view by their segmentation agreements, i.e., the identical outputs, of unlabeled data.
Semi-supervised Learning via Co-regularizing Both Models	This study proposes a co-regularized CWS model based on character-based and word-based models, built on a small amount of segmented sentences ( labeled data ) and a large amount of raw sentences (unlabeled data).
Semi-supervised Learning via Co-regularizing Both Models	The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria

Druck, Gregory and Mann, Gideon and McCallum, Andrew

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generalized Expectation Criteria	2In general, the objective function could also include the likelihood of available labeled data , but throughout this paper we assume we have no parsed sentences.
Generalized Expectation Criteria	If there are constraint functions G for all model feature functions F3, and the target expectations G are estimated from labeled data , then the globally optimal parameter setting under the GE obj ec-tive function is equivalent to the maximum likelihood solution.
Linguistic Prior Knowledge	For some experiments that follow we use “oracle” constraints that are estimated from labeled data .
Related Work	(2006) both use modified forms of self-training to bootstrap parsers from limited labeled data .

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

43. Cross-Language Text Classification Using Structural Correspondence Learning

Prettenhofer, Peter and Stein, Benno

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-Language Structural Correspondence Learning	The confidence, however, can only be determined for 2125 since the setting gives us access to labeled data from 8 only.
Related Work	In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
Related Work	Beyond this setting one can further distinguish whether a small amount of labeled data from the target domain is available (Daume, 2007; Finkel and Manning, 2009) or not (Blitzer et al., 2006; Jiang and Zhai, 2007).
Related Work	(2007) apply structural learning to image classification in settings where little labeled data is given.

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. Template-Based Information Extraction without the Templates

Chambers, Nathanael and Jurafsky, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data , to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template).
Previous Work	Weakly supervised approaches remove some of the need for fully labeled data .
Previous Work	Shinyama and Sekine (2006) describe an approach to template learning without labeled data .
Standard Evaluation	Our precision is as good as (and our Fl score near) two algorithms that require knowledge of the templates and/or labeled data .

labeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora

Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Usually the extraction performance depends heavily on the quality and quantity of the labeled data , however, the manual annotation of a large-scale corpus is labor-intensive and time-consuming.
Abstract	During iterations a batch of unlabeled instances are chosen in terms of their informativeness to the current classifier, labeled by an oracle and in turn added into the labeled data to retrain the classifier.
Abstract	Input: - L, labeled data set - U, unlabeled data set - n, batch size Output: - SVM, classifier Repeat: 1.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Titov, Ivan and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model achieves high accuracy, without any explicitly labeled data except the user provided opinion ratings.
Introduction	When labeled data exists, this problem can be solved effectively using a wide variety of methods available for text classification and information extraction (Manning and Schutze, 1999).
Introduction	However, labeled data is often hard to come by, especially when one considers all possible domains of products and services.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	0 Self-training Segmenters (STS): two variant models were defined by the approach reported in (Subramanya et al., 2010) that uses the supervised CRFs model’s decodings, incorporating empirical and constraint information, for unlabeled examples as additional labeled data to retrain a CRFs model.
Introduction	They leverage such mappings to either constitute a Chinese word dictionary for maximum-matching segmentation (Xu et al., 2004), or form labeled data for training a sequence labeling model (Paul et al., 2011).
Methodology	Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

48. Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data .
Our method	We randomly reuse some characters labeling ’N’ from labeled data until ratio 77 is reached.
Our method	In summary our algorithm tackles the problem by duplicating labeled data in source domain.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. Part-of-speech tagging with antagonistic adversaries

Sogaard, Anders

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	ting an antagonistic adversary corrupt our labeled data - somewhat surprisingly, maybe - leads to better cross-domain performance.
Introduction	The problem with out-of-vocabulary effects can be illustrated using a small labeled data set: {X1 = <1,<0,1,0>>,X2 I <1, <0,1,1>>,X3 = (0, <0,0,0>>,X4 = (1, (0,0, Say we train our model on X1_3 and evaluate it on the fourth data point.
Robust perceptron learning	Globerson and Roweis (2006) let an adversary corrupt labeled data during training to learn better models of test data with missing features.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. Sentiment Relevance

Scheible, Christian and Schütze, Hinrich

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distant Supervision	To summarize, the results of our experiments using distant supervision show that a sentiment relevance classifier can be trained successfully by labeling data with a few simple feature rules, with
Methods	Supervised optimization is impossible as we do not have any labeled data .
Related Work	In general, it is not possible to know what the underlying concepts of a statistical classification are if no detailed annotation guidelines exist and no direct evaluation of manually labeled data is performed.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

51. The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering for Cross Lingual Sentiment Analysis	Algorithm 1 Projection based on sense Input: Polarity labeled data in source language (S) and data in target language (T) to be labeled Output: Classified documents 1: Sense mark the polarity labeled data from S 2: Project the sense marked corpora from S to T using a Multidict 3: Model the sentiment classifier using the data obtained in step-2 4: Sense mark the unlabelled data from T 5: Test the sentiment classifier on data obtained in step-4 using model obtained in step-3
Introduction	Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009).
Related Work	In situations where labeled data is not present in a language, approaches based on cross-lingual sentiment analysis are used.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

Plank, Barbara and Moschitti, Alessandro

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
Introduction	However, the clear drawback of supervised methods is the need of training data, which can slow down the delivery of commercial applications in new domains: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
Related Work	These correspondences are then integrated as new features in the labeled data of the source domain.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Leveraging Domain-Independent Information in Semantic Parsing

Goldwasser, Dan and Roth, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Settings	The dataset was collected for the purpose of constructing semantic parsers from ambiguous supervision and consists of both “noisy” and gold labeled data .
Experimental Settings	The gold labeled labeled data consists of pairs (X, y).
Semantic Interpretation Model	Moreover, since learning this layer is a byproduct of the learning process (as it does not use any labeled data ) forcing the connection between the decisions is the mechanism that drives learning this model.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Aspect Extraction through Semi-Supervised Modeling

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Since ME-LDA used manually labeled training data for Max-Ent, we again randomly sampled 1000 terms from our corpus appearing at least 20 times and labeled them as aspect terms or sentiment terms, so this labeled data clearly has less noise than our automatically labeled data .
Proposed Seeded Models	Note that unlike traditional Max-Ent training, we do not need manually labeled data for training (see Section 4 for details).
Related Work	We adopt this method as well but with no use of manually labeled data in training.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: