Abstract | We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. |
Experiments | Figure 4: Effect of source domain unlabeled data . |
Experiments | The amount of unlabeled data is held constant, so that any change in classification accu- |
Experiments | Figure 5: Effect of target domain unlabeled data . |
Introduction | positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains. |
Introduction | We use labeled data from multiple source domains and unlabeled data from source and target domains to represent the distribution of features. |
Introduction | Unlabeled data is cheaper to collect compared to labeled data and is often available in large quantities. |
A Joint Model with Unlabeled Parallel Text | where y,-ā is the unobserved class label for the i-th instance in the unlabeled data . |
A Joint Model with Unlabeled Parallel Text | By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized: |
A Joint Model with Unlabeled Parallel Text | where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data ; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights. |
Abstract | Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data ; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines. |
Experimental Setup 4.1 Data Sets and Preprocessing | MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used. |
Experimental Setup 4.1 Data Sets and Preprocessing | SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used. |
Introduction | maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)). |
Introduction | To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another. |
Related Work | Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009). |
Abstract | We consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain. |
Empirical Evaluation | For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains. |
Empirical Evaluation | Also, it is important to point out that the SCL method uses auxiliary tasks to induce the shared feature representation, these tasks are constructed on the basis of unlabeled data . |
Introduction | In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain. |
Introduction | In this paper we focus on a more challenging and arguably more realistic version of the domain-adaptation problem where only unlabeled data is available for the target domain. |
Introduction | (2006) use auxiliary tasks based on unlabeled data for both domains (called pivot features) and a dimensionality reduction technique to induce such shared representation. |
Learning and Inference | Intuitively, maximizing the likelihood of unlabeled data is closely related to minimizing the reconstruction error, that is training a model to discover such mapping parameters u that z encodes all the necessary information to accurately reproduce :13ā) from z for every training example :3ā). |
The Latent Variable Model | and unlabeled data for the source and target domain {m(l)}lā¬3UuTU, where SU and TU stand for the unlabeled datasets for the source and target domains, respectively. |
The Latent Variable Model | However, given that, first, amount of unlabeled data |SU U TU| normally vastly exceeds the amount of labeled data |SL| and, second, the number of features for each example |a3(l)| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e. |
Introduction | (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data. |
Related Work | Semi-supervised learning exploits both labeled and unlabeled data . |
Related Work | It proves useful when labeled data is scarce and hard to construct while unlabeled data is abundant and easy to access. |
Related Work | Another representative of semi-supervised learning is learning a robust representation of the input from unlabeled data . |
Introduction | This is implemented by maximizing the empirical accuracy on the prior knowledge (labeled data) and the entropy of hash functions (estimated over labeled and unlabeled data ). |
Semi-Supervised SimHash | Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data, c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data . |
Semi-Supervised SimHash | the labeled and unlabeled data , 26,; and XU. |
Related Work | Co-training puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance. |
Results and Discussion | For an unsupervised approach, which only needs unlabeled data , there is little cost to creating large training sets. |
System Architecture | Unlabeled Data |
Experiments | We also compare our method with the semi-supervised approaches, the semi-supervised approaches achieved very high accuracies by leveraging on large unlabeled data directly into the systems for joint learning and decoding, while in our method, we only explore the N-gram features to further improve supervised dependency parsing performance. |
Experiments | Some previous studies also found a log-linear relationship between unlabeled data (Suzuki and Isozaki, 2008; Suzuki et al., 2009; Bergsma et al., 2010; Pitler et al., 2010). |
Web-Derived Selectional Preference Features | All of our selectional preference features described in this paper rely on probabilities derived from unlabeled data . |