Abstract | In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. |
Introduction | In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain. |
Introduction | lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain. |
Introduction | labeled data be available in the target domain. |
Abstract | Chinese) using labeled data in the source language (e.g. |
Abstract | Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. |
Abstract | Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available. |
Introduction | Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages. |
Introduction | One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g. |
Introduction | First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data . |
Abstract | This noisy labeled data causes poor extraction performance. |
Experiments | We compared the following methods: logistic regression with the labeled data cleaned by the proposed method (PROP), logistic regression with the standard DS labeled data (LR), and MultiR proposed in (Hoffmann et al., 2011) as a state-of-the-art multi-instance learning system.7 For logistic regression, when more than one relation is assigned to a sentence, we simply copied the feature vector and created a training example for each relation. |
Introduction | Supervised approaches are limited in scalability because labeled data is expensive to produce. |
Introduction | A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009). |
Introduction | However, the DS assumption can fail, which results in noisy labeled data and this causes poor extraction performance. |
Knowledge-based Distant Supervision | DS uses a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs. |
Related Work | Previous works (Hoffmann et al., 2010; Yao et al., 2010) have pointed out that the DS assumption generates noisy labeled data , but did not directly address the problem. |
Wrong Label Reduction | labeled data generated by DS: LD |
Wrong Label Reduction | For relation extraction, we train a classifier for entity pairs using the resultant labeled data . |
Abstract | Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data ). |
Data and Approach Overview | Our algorithm assigns domairfldialog-act/slot labels to each topic at each layer in the hierarchy using labeled data (explained in §4.) |
Experiments | Here, we not only want to demonstrate the performance of each component of MCM but also their performance under limited amount of labeled data . |
Experiments | When the number of labeled data is small (niL £25%*nL), our WebPrior—MCM has a better performance on domain and act predictions compared to the two baselines. |
Experiments | Adding labeled data improves the performance of all models however supervised models benefit more compared to MCM models. |
Introduction | The contributions of this paper are as follows: (i) construction of a novel Bayesian framework for semantic parsing of natural language (NL) utterances in a unifying framework in §4, (ii) representation of seed labeled data and information from web queries as informative prior to design a novel utterance understanding model in §3 & §4, (iii) comparison of our results to supervised sequential and joint learning methods on NL utterances in §5. |
Introduction | We conclude that our generative model achieves noticeable improvement compared to discriminative models when labeled data is scarce. |
Abstract | Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. |
Experiments | 7Note that we do not consider this performance to be the upper—bound of supervised approaches; clearly, supervised approaches could benefit from additional labeled data . |
Experiments | However, labeled data is relatively expensive to obtain for this task. |
Introduction | There is no existing labeled data for the tasks of interest, and we would like the methods we develop to be easily applied in multiple domains. |
Introduction | Motivated by this, we propose a generative model for solving these tasks jointly without labeled data . |
Models | To identify refinements without labeled data , we propose a generative model of reviews (or more generally documents) with latent variables. |
Capturing Paradigmatic Relations via Word Clustering | Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010). |
Capturing Paradigmatic Relations via Word Clustering | Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data . |
Capturing Paradigmatic Relations via Word Clustering | In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm. |
State-of-the-Art | In this paper, we use CTB 6.0 as the labeled data for the study. |
Conclusion | We employ stacking models to incorporate features derived from heterogeneous analysis and apply them to convert heterogeneous labeled data for retraining. |
Data-driven Annotation Conversion | It is possible to acquire high quality labeled data for a specific annotation standard by exploring existing heterogeneous corpora, since the annotations are normally highly compatible. |
Data-driven Annotation Conversion | Moreover, the exploitation of additional (pseudo) labeled data aims to reduce the estimation error and enhances a NLP system in a different way from stacking. |
Data-driven Annotation Conversion | _ CTa / _ gppd—wtb qu1re a new labeled data set D Ctb — Dppdflctb |
Experiments | Since ME-LDA used manually labeled training data for Max-Ent, we again randomly sampled 1000 terms from our corpus appearing at least 20 times and labeled them as aspect terms or sentiment terms, so this labeled data clearly has less noise than our automatically labeled data . |
Proposed Seeded Models | Note that unlike traditional Max-Ent training, we do not need manually labeled data for training (see Section 4 for details). |
Related Work | We adopt this method as well but with no use of manually labeled data in training. |