Index of papers in Proc. ACL 2012 that mention
  • labeled data
Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan
Abstract
In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
Introduction
In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain.
Introduction
lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain.
Introduction
labeled data be available in the target domain.
labeled data is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Abstract
Chinese) using labeled data in the source language (e.g.
Abstract
Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
Abstract
Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
Introduction
Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
Introduction
One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
Introduction
First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data .
labeled data is mentioned in 58 sentences in this paper.
Topics mentioned in this paper:
Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi
Abstract
This noisy labeled data causes poor extraction performance.
Experiments
We compared the following methods: logistic regression with the labeled data cleaned by the proposed method (PROP), logistic regression with the standard DS labeled data (LR), and MultiR proposed in (Hoffmann et al., 2011) as a state-of-the-art multi-instance learning system.7 For logistic regression, when more than one relation is assigned to a sentence, we simply copied the feature vector and created a training example for each relation.
Introduction
Supervised approaches are limited in scalability because labeled data is expensive to produce.
Introduction
A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009).
Introduction
However, the DS assumption can fail, which results in noisy labeled data and this causes poor extraction performance.
Knowledge-based Distant Supervision
DS uses a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs.
Related Work
Previous works (Hoffmann et al., 2010; Yao et al., 2010) have pointed out that the DS assumption generates noisy labeled data , but did not directly address the problem.
Wrong Label Reduction
labeled data generated by DS: LD
Wrong Label Reduction
For relation extraction, we train a classifier for entity pairs using the resultant labeled data .
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Abstract
Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data ).
Data and Approach Overview
Our algorithm assigns domairfldialog-act/slot labels to each topic at each layer in the hierarchy using labeled data (explained in §4.)
Experiments
Here, we not only want to demonstrate the performance of each component of MCM but also their performance under limited amount of labeled data .
Experiments
When the number of labeled data is small (niL £25%*nL), our WebPrior—MCM has a better performance on domain and act predictions compared to the two baselines.
Experiments
Adding labeled data improves the performance of all models however supervised models benefit more compared to MCM models.
Introduction
The contributions of this paper are as follows: (i) construction of a novel Bayesian framework for semantic parsing of natural language (NL) utterances in a unifying framework in §4, (ii) representation of seed labeled data and information from web queries as informative prior to design a novel utterance understanding model in §3 & §4, (iii) comparison of our results to supervised sequential and joint learning methods on NL utterances in §5.
Introduction
We conclude that our generative model achieves noticeable improvement compared to discriminative models when labeled data is scarce.
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Pang, Bo
Abstract
Labeled data is not readily available for these tasks, so we focus on the unsupervised setting.
Experiments
7Note that we do not consider this performance to be the upper—bound of supervised approaches; clearly, supervised approaches could benefit from additional labeled data .
Experiments
However, labeled data is relatively expensive to obtain for this task.
Introduction
There is no existing labeled data for the tasks of interest, and we would like the methods we develop to be easily applied in multiple domains.
Introduction
Motivated by this, we propose a generative model for solving these tasks jointly without labeled data .
Models
To identify refinements without labeled data , we propose a generative model of reviews (or more generally documents) with latent variables.
labeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Uszkoreit, Hans
Capturing Paradigmatic Relations via Word Clustering
Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010).
Capturing Paradigmatic Relations via Word Clustering
Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data .
Capturing Paradigmatic Relations via Word Clustering
In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm.
State-of-the-Art
In this paper, we use CTB 6.0 as the labeled data for the study.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Wan, Xiaojun
Conclusion
We employ stacking models to incorporate features derived from heterogeneous analysis and apply them to convert heterogeneous labeled data for retraining.
Data-driven Annotation Conversion
It is possible to acquire high quality labeled data for a specific annotation standard by exploring existing heterogeneous corpora, since the annotations are normally highly compatible.
Data-driven Annotation Conversion
Moreover, the exploitation of additional (pseudo) labeled data aims to reduce the estimation error and enhances a NLP system in a different way from stacking.
Data-driven Annotation Conversion
_ CTa / _ gppd—wtb qu1re a new labeled data set D Ctb — Dppdflctb
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Experiments
Since ME-LDA used manually labeled training data for Max-Ent, we again randomly sampled 1000 terms from our corpus appearing at least 20 times and labeled them as aspect terms or sentiment terms, so this labeled data clearly has less noise than our automatically labeled data .
Proposed Seeded Models
Note that unlike traditional Max-Ent training, we do not need manually labeled data for training (see Section 4 for details).
Related Work
We adopt this method as well but with no use of manually labeled data in training.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: