Abstract | We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. |
Experiments | Figure 3: Effect of source domain labeled data . |
Experiments | To investigate the impact of the quantity of source domain labeled data on our method, we vary the amount of data from zero to 800 reviews, with equal amounts of positive and negative labeled data . |
Experiments | Note that source domain labeled data is used both to create the sentiment sensitive thesaurus as well as to train the sentiment classifier. |
Introduction | Supervised learning algorithms that require labeled data have been successfully used to build sentiment classifiers for a specific domain (Pang et al., 2002). |
Introduction | positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains. |
Introduction | In particular, no labeled data is provided for the target domain. |
A Joint Model with Unlabeled Parallel Text | where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U. |
A Joint Model with Unlabeled Parallel Text | By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized: |
A Joint Model with Unlabeled Parallel Text | where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights. |
Abstract | We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data. |
Introduction | Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following |
Introduction | The proposed maximum entropy-based EM approach jointly learns two monolingual sentiment classifiers by treating the sentiment labels in the unlabeled parallel text as unobserved latent variables, and maximizes the regularized joint likelihood of the language-specific labeled data together with the inferred sentiment labels of the parallel text. |
Empirical Evaluation | For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains. |
Empirical Evaluation | We compare them with two supervised methods: a supervised model (Base) which is trained on the source domain data only, and another supervised model (In-domain) which is learned on the labeled data from the target domain. |
Introduction | In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain. |
Introduction | We use generative latent variable models (LVMs) learned on all the available data: unlabeled data for both domains and on the labeled data for the source domain. |
Related Work | Second, their expectation constraints are estimated from labeled data , whereas we are trying to match expectations computed on unlabeled data for two domains. |
Related Work | This approach bears some similarity to the adaptation methods standard for the setting where labelled data is available for both domains (Chelba and Acero, 2004; Daume and Marcu, 2006). |
The Latent Variable Model | 1Among the versions which do not exploit labeled data from the target domain. |
The Latent Variable Model | The parameters of this model 6 = (12,10) can be estimated by maximizing joint likelihood L(6) of labeled data for the source domain {330), y(l)}l€3L |
The Latent Variable Model | However, given that, first, amount of unlabeled data |SU U TU| normally vastly exceeds the amount of labeled data |SL| and, second, the number of features for each example |a3(l)| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e. |
Abstract | We show that this unsuperVised system has better CoRe performance than other learning approaches that do not use manually labeled data . |
Introduction | Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data . |
Introduction | Self-training approaches usually include the use of some manually labeled data . |
Introduction | In contrast, our self-trained system is not trained on any manually labeled data and is therefore a completely unsupervised system. |
Related Work | not with approaches that make some limited use of labeled data . |
Results and Discussion | Thus, this comparison of ACE-2/Ontonotes results is evidence that in a realistic scenario using association information in an unsupervised self-trained system is almost as good as a system trained on manually labeled data . |
System Architecture | Automatically Labeled Data |
Evaluation | All ranking models above were trained only on source domain training data and the labeled data of target domain was just used for testing. |
Instance Weighting Scheme Review | (J iang and Zhai, 2007) used a small number of labeled data from target domain to weight source instances. |
Introduction | To alleviate the lack of training data in the target domain, many researchers have proposed to transfer ranking knowledge from the source domain with plenty of labeled data to the target domain where only a few or no labeled data is available, which is known as ranking model adaptation (Chen et al., 2008a; Chen et al., 2010; Chen et al., 2008b; Geng et al., 2009; Gao et al., 2009). |
Related Work | In (Geng et al., 2009; Chen et al., 2008b), the parameters of ranking model trained on the source domain was adjusted with the small set of labeled data in the target domain. |
Related Work | al., 2008a) weighted source instances by using small amount of labeled data in the target domain. |
Introduction | This is implemented by maximizing the empirical accuracy on the prior knowledge ( labeled data ) and the entropy of hash functions (estimated over labeled and unlabeled data). |
Semi-Supervised SimHash | Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data , c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data. |
Semi-Supervised SimHash | Given the labeled data XL, we construct two sets, attraction set @a and repulsion set 9?. |
Semi-Supervised SimHash | Furthermore, we also hope to maximize the empirical accuracy on the labeled data @a and @r and |
The direction is determined by concatenating w L times. | This is implemented by maximizing the empirical accuracy on labeled data together with the entropy of hash functions. |
Bilingual Bootstrapping | Algorithm 1 Bilingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 UD1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2 |
Bilingual Bootstrapping | These projected models are then applied to the untagged data of L1 and L2 and the instances which get labeled with a high confidence are added to the labeled data of the respective languages. |
Bilingual Bootstrapping | Algorithm 2 Monolingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 U D1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2 |
Experimental Setup | In each iteration only those words for which P(assigned_sense|word) > 0.6 get moved to the labeled data . |
Experimental Setup | Hence, we used a fixed threshold of 0.6 so that in each iteration only those words get moved to the labeled data for which the assigned sense is clearly a majority sense (P > 0.6). |
Abstract | Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data , to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). |
Previous Work | Weakly supervised approaches remove some of the need for fully labeled data . |
Previous Work | Shinyama and Sekine (2006) describe an approach to template learning without labeled data . |
Standard Evaluation | Our precision is as good as (and our Fl score near) two algorithms that require knowledge of the templates and/or labeled data . |
Abstract | ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data . |
ConceptResolver | A is re-selected on each iteration because the initial labeled data set is extremely small, so the initial validation set is not necessarily representative of the actual data. |
ConceptResolver | l. Initialize labeled data L with 10 positive and 50 negative examples (pairs of senses) |
Prior Work | These approaches use large amounts of labeled data , which can be difficult to create. |