SciSurf: Index of "labeled data" in Proc. ACL 2012

Index of papers in Proc. ACL 2012 that mention

labeled data

Seen in text as:

labeled data (121)
Labeled Data (5)

Seen in 110 sentences in 8 papers.

1. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
Introduction	In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain.
Introduction	lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain.
Introduction	labeled data be available in the target domain.

labeled data is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

2. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Chinese) using labeled data in the source language (e.g.
Abstract	Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
Abstract	Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
Introduction	Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
Introduction	One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
Introduction	First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data .

labeled data is mentioned in 58 sentences in this paper.

Topics mentioned in this paper:

3. Reducing Wrong Labels in Distant Supervision for Relation Extraction

Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This noisy labeled data causes poor extraction performance.
Experiments	We compared the following methods: logistic regression with the labeled data cleaned by the proposed method (PROP), logistic regression with the standard DS labeled data (LR), and MultiR proposed in (Hoffmann et al., 2011) as a state-of-the-art multi-instance learning system.7 For logistic regression, when more than one relation is assigned to a sentence, we simply copied the feature vector and created a training example for each relation.
Introduction	Supervised approaches are limited in scalability because labeled data is expensive to produce.
Introduction	A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009).
Introduction	However, the DS assumption can fail, which results in noisy labeled data and this causes poor extraction performance.
Knowledge-based Distant Supervision	DS uses a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs.
Related Work	Previous works (Hoffmann et al., 2010; Yao et al., 2010) have pointed out that the DS assumption generates noisy labeled data , but did not directly address the problem.
Wrong Label Reduction	labeled data generated by DS: LD
Wrong Label Reduction	For relation extraction, we train a classifier for entity pairs using the resultant labeled data .

labeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

4. A Joint Model for Discovery of Aspects in Utterances

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data ).
Data and Approach Overview	Our algorithm assigns domairfldialog-act/slot labels to each topic at each layer in the hierarchy using labeled data (explained in §4.)
Experiments	Here, we not only want to demonstrate the performance of each component of MCM but also their performance under limited amount of labeled data .
Experiments	When the number of labeled data is small (niL £25%*nL), our WebPrior—MCM has a better performance on domain and act predictions compared to the two baselines.
Experiments	Adding labeled data improves the performance of all models however supervised models benefit more compared to MCM models.
Introduction	The contributions of this paper are as follows: (i) construction of a novel Bayesian framework for semantic parsing of natural language (NL) utterances in a unifying framework in §4, (ii) representation of seed labeled data and information from web queries as informative prior to design a novel utterance understanding model in §3 & §4, (iii) comparison of our results to supervised sequential and joint learning methods on NL utterances in §5.
Introduction	We conclude that our generative model achieves noticeable improvement compared to discriminative models when labeled data is scarce.

labeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

5. Spice it up? Mining Refinements to Online Instructions from User Generated Content

Druck, Gregory and Pang, Bo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Labeled data is not readily available for these tasks, so we focus on the unsupervised setting.
Experiments	7Note that we do not consider this performance to be the upper—bound of supervised approaches; clearly, supervised approaches could benefit from additional labeled data .
Experiments	However, labeled data is relatively expensive to obtain for this task.
Introduction	There is no existing labeled data for the tasks of interest, and we would like the methods we develop to be easily applied in multiple domains.
Introduction	Motivated by this, we propose a generative model for solving these tasks jointly without labeled data .
Models	To identify refinements without labeled data , we propose a generative model of reviews (or more generally documents) with latent variables.

labeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Capturing Paradigmatic Relations via Word Clustering	Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010).
Capturing Paradigmatic Relations via Word Clustering	Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data .
Capturing Paradigmatic Relations via Word Clustering	In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm.
State-of-the-Art	In this paper, we use CTB 6.0 as the labeled data for the study.

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We employ stacking models to incorporate features derived from heterogeneous analysis and apply them to convert heterogeneous labeled data for retraining.
Data-driven Annotation Conversion	It is possible to acquire high quality labeled data for a specific annotation standard by exploring existing heterogeneous corpora, since the annotations are normally highly compatible.
Data-driven Annotation Conversion	Moreover, the exploitation of additional (pseudo) labeled data aims to reduce the estimation error and enhances a NLP system in a different way from stacking.
Data-driven Annotation Conversion	_ CTa / _ gppd—wtb qu1re a new labeled data set D Ctb — Dppdflctb

labeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Aspect Extraction through Semi-Supervised Modeling

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Since ME-LDA used manually labeled training data for Max-Ent, we again randomly sampled 1000 terms from our corpus appearing at least 20 times and labeled them as aspect terms or sentiment terms, so this labeled data clearly has less noise than our automatically labeled data .
Proposed Seeded Models	Note that unlike traditional Max-Ent training, we do not need manually labeled data for training (see Section 4 for details).
Related Work	We adopt this method as well but with no use of manually labeled data in training.

labeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: