Abstract | Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data . |
Abstract | With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings. |
Introduction | In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data , have attracted more and more interest. |
Introduction | Previously, unlabeled data is explored to derive useful local-context features such as word clusters (Koo et al., 2008), subtree frequencies (Chen et al., 2009; Chen et al., 2013), and word co-occurrence counts (Zhou et al., 2011; Bansal and Klein, 2011). |
Introduction | A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009). |
Abstract | We consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain. |
Empirical Evaluation | For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains. |
Empirical Evaluation | Also, it is important to point out that the SCL method uses auxiliary tasks to induce the shared feature representation, these tasks are constructed on the basis of unlabeled data . |
Introduction | In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain. |
Introduction | In this paper we focus on a more challenging and arguably more realistic version of the domain-adaptation problem where only unlabeled data is available for the target domain. |
Introduction | (2006) use auxiliary tasks based on unlabeled data for both domains (called pivot features) and a dimensionality reduction technique to induce such shared representation. |
Learning and Inference | Intuitively, maximizing the likelihood of unlabeled data is closely related to minimizing the reconstruction error, that is training a model to discover such mapping parameters u that z encodes all the necessary information to accurately reproduce :13“) from z for every training example :3“). |
The Latent Variable Model | and unlabeled data for the source and target domain {m(l)}l€3UuTU, where SU and TU stand for the unlabeled datasets for the source and target domains, respectively. |
The Latent Variable Model | However, given that, first, amount of unlabeled data |SU U TU| normally vastly exceeds the amount of labeled data |SL| and, second, the number of features for each example |a3(l)| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e. |
A Joint Model with Unlabeled Parallel Text | where y,-’ is the unobserved class label for the i-th instance in the unlabeled data . |
A Joint Model with Unlabeled Parallel Text | By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized: |
A Joint Model with Unlabeled Parallel Text | where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data ; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights. |
Abstract | Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data ; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines. |
Experimental Setup 4.1 Data Sets and Preprocessing | MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used. |
Experimental Setup 4.1 Data Sets and Preprocessing | SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used. |
Introduction | maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)). |
Introduction | To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another. |
Related Work | Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009). |
Experiments | By integrating unlabeled data , the manifold model under setting (1) made a 15% improvement over linear regression model on F1 score, where the improvement was significant across all relations. |
Experiments | This tells us that estimating the label of unlabeled examples based upon the sampling result is one way to utilize unlabeled data and may help improve the relation extraction results. |
Experiments | On one hand, this result shows that using more unlabeled data can further improve the result. |
Identifying Key Medical Relations | Part of the resulting data was manually vetted by our annotators, and the remaining was held as unlabeled data for further experiments. |
Introduction | 0 From the perspective of relation extraction methodologies, we present a manifold model for relation extraction utilizing both labeled and unlabeled data . |
Relation Extraction with Manifold Models | Given a few labeled examples and many unlabeled examples for a relation, we want to build a relation detector leveraging both labeled and unlabeled data . |
Relation Extraction with Manifold Models | Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient. |
Relation Extraction with Manifold Models | When ,u = 0, the model disregards the unlabeled data , and the data manifold topology is not respected. |
Abstract | We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. |
Experiments | Figure 4: Effect of source domain unlabeled data . |
Experiments | The amount of unlabeled data is held constant, so that any change in classification accu- |
Experiments | Figure 5: Effect of target domain unlabeled data . |
Introduction | positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains. |
Introduction | We use labeled data from multiple source domains and unlabeled data from source and target domains to represent the distribution of features. |
Introduction | Unlabeled data is cheaper to collect compared to labeled data and is often available in large quantities. |
Introduction | Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time. |
Introduction | Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005). |
Introduction | Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification. |
Related Work | Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models. |
Unsupervised Mining of Personal and Impersonal Views | Semi-supervised learning is a strategy which combines unlabeled data with labeled training data to improve the models. |
Unsupervised Mining of Personal and Impersonal Views | The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998). |
Unsupervised Mining of Personal and Impersonal Views | L—impersonal The unlabeled data U containing personal sentence set S l and impersonal sentence set |
Abstract | Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data . |
Abstract | However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data , meaning the techniques are not applicable to large corpora being produced today. |
Abstract | In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. |
Introduction | Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data , combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006). |
Introduction | Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)). |
Introduction | Instead of utilizing unlabeled examples directly for each given target concept, our approach is to pre-compute a small set of statistics over the unlabeled data in advance. |
Problem Definition | MNB-FM attempts to improve MNB’s estimates of 6:; and 6,; , using statistics computed over the unlabeled data . |
Problem Definition | Equation 2 can be estimated in advance, Without knowledge of the target class, simply by counting the number of tokens of each word in the unlabeled data . |
Problem Definition | The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates. |
Abstract | One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions. |
Abstract | The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data . |
Introduction | Therefore, semi-supervised joint S&T appears to be a natural solution for easily incorporating accessible unlabeled data to improve the joint S&T model. |
Introduction | Motivated by the works in (Subramanya et al., 2010; Das and Smith, 2011), for structured problems, graph-based label propagation can be employed to infer valuable syntactic information (n-gram-level label distributions) from labeled data to unlabeled data . |
Introduction | labeled and unlabeled data to achieve the semi-supervised learning. |
Method | The emphasis of this work is on building a joint S&T model based on two different kinds of data sources, labeled and unlabeled data . |
Method | In essence, this learning problem can be treated as incorporating certain gainful information, e.g., prior knowledge or label constraints, of unlabeled data into the supervised model. |
Related Work | Sun and Xu (2011) enhanced a CWS model by interpolating statistical features of unlabeled data into the CRFs model. |
Related Work | (2011) proposed a semi-supervised pipeline S&T model by incorporating n-gram and lexicon features derived from unlabeled data . |
Related Work | Different from their concern, our emphasis is to learn the semi-supervised model by injecting the label information from a similarity graph constructed from labeled and unlabeled data . |
Abstract | Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data . |
Abstract | The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data . |
Introduction | Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model. |
Introduction | The crux of solving semi-supervised learning problem is the learning on unlabeled data . |
Introduction | Inspired by multi-view learning that exploits redundant views of the same input data (Ganchev et al., 2008), this paper proposes a semi-supervised CWS model of co-regularizing from two different views (intrinsically two different models), character-based and word-based, on unlabeled data . |
Semi-supervised Learning via Co-regularizing Both Models | As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data . |
Semi-supervised Learning via Co-regularizing Both Models | Obviously, the learning on unlabeled data does not come for “free”. |
Semi-supervised Learning via Co-regularizing Both Models | Very often, it is necessary to discover certain gainful information, e.g., label constraints of unlabeled data , that is incorporated to guide the learner toward a desired solution. |
Experiments | In addition to labelled data, a large amount of unlabelled data on the web domain is also provided. |
Experiments | about labelled and unlabelled data are summarized in Table 1 and Table 2, respectively. |
Experiments | unlabelled data . |
Introduction | The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data). |
Introduction | The idea of learning representations from unlabelled data and then fine-tuning a model with such representations according to some supervised criterion has been studied before (Turian et al., 2010; Collobert et al., 2011; Glorot et al., 2011). |
Related Work | (2011) propose to learn representations from the mixture of both source and target domain unlabelled data to improve cross-domain sentiment classification. |
Related Work | Such high dimensional input gives rise to high computational cost and it is not clear whether those approaches can be applied to large scale unlabelled data , with hundreds of millions of training examples. |
Related Work | The new representations are induced based on the auxiliary tasks defined on unlabelled data together with a dimensionality reduction technique. |
Experiments | the dependency annotations off the training portion of each treebank, and use that as the unlabeled data for that target language. |
Experiments | -U: Our approach training on only parallel data without unlabeled data for the target language. |
Experiments | +U: Our approach training on both parallel and unlabeled data . |
Introduction | We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data . |
Our Approach | Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach. |
Our Approach | However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser. |
Our Approach | In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U. |
Abstract | We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA. |
Abstract | We create a graph for labeled and unlabeled data using match-scores of textual entailment features as similarity weights between data points. |
Experiments | We show that as we increase the number of unlabeled data , with our graph-summarization, it is feasible to extract information that can improve the performance of QA models. |
Experiments | In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data. |
Experiments | Table 3: The effect of number of unlabeled data on MRR from Hybrid graph Summarization SSL. |
Graph Summarization | Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points. |
Graph Summarization | Since we do not know estimated local density constraints of unlabeled data points, we use constants to construct local density constraint column vector for X’ dataset as follows: |
Introduction | Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam- |
Introduction | We consider situations where there are much more unlabeled data , X U, than labeled data, X L, i.e., 711; << 711]. |
Introduction | Following the common theme of “more data is better data” we also use both a limited labeled corpora and a plentiful unlabeled data resource. |
Introduction | (2006a) successfully applied self-training to parsing by exploiting available unlabeled data , and obtained remarkable results when the same technique was applied to parser adaptation (McClosky et al., 2006b). |
Introduction | However, the standard objective of an S3VM is non-convex on the unlabeled data , thus requiring sophisticated global optimization heuristics to obtain reasonable solutions. |
Semi-supervised Convex Training for Structured SVM | for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data . |
Semi-supervised Structured Large Margin Objective | The objective of standard semi-supervised structured SVM is a combination of structured large margin losses on both labeled and unlabeled data . |
Semi-supervised Structured Large Margin Objective | We introduce an efficient approximation—least squares loss—for the structured large margin loss on unlabeled data below. |
Abstract | We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method. |
Abstract | In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data. |
Co-training strategy for prosodic event detection | Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data . |
Co-training strategy for prosodic event detection | There are two issues: (1) the accurate self-labeling method for unlabeled data and (2) effective heuristics to se- |
Co-training strategy for prosodic event detection | Given a set L of labeled training data and a set U of unlabeled data Randomly select U’ from U, |U’ |=u while iteration < k do Use L to train classifiers h] and [12 Apply h] and 112 to assign labels for all examples in U’ Select n self-labeled samples and add to L Remove these it samples from U Recreate U’ by choosing u instances randomly from U |
Conclusions | We introduced a confidence-based method to assign possible labels to unlabeled data and evaluated the performance combined with informative sample selection methods. |
Conclusions | This suggests that the use of unlabeled data can lead to significant improvement for prosodic event detection. |
Experiments and results | Our goal is to determine whether the co-training algorithm described above could successfully use the unlabeled data for prosodic event detection. |
Introduction | We propose a confidence-based method to assign labels to unlabeled data in training iterations and evaluate its performance combined with different informative sample selection methods. |
Introduction | Our experiments on the Boston Radio News corpus show that the use of unlabeled data can lead to significant improvement of prosodic event detection compared to using the original small training set, and that the semi-supervised learning result is comparable with supervised learning with similar amount of training data. |
Abstract | Our framework leverages on both labeled and unlabeled data in the target domain. |
Abstract | To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain. |
Experiments | negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data . |
Introduction | In the first phase, Supervised Voting is used to determine the relevance of each source domain to each region in the target domain, using both labeled and unlabeled data in the target domain. |
Introduction | By using also unlabeled data , we alleviate the lack of labeled samples for rarer relations due to imbalanced distributions in relation types. |
Introduction | reasonable predictive performance even when all the source domains are irrelevant and augments the rarer classes with examples in the unlabeled data . |
Problem Statement | :1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector, yi is the corresponding label (if available). |
Robust Domain Adaptation | With data from both the labeled and unlabeled data sets, we apply transductive inference or semi-supervised learning (Zhou et al., 2003) to achieve both (i) and (ii). |
Robust Domain Adaptation | By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data. |
Robust Domain Adaptation | Here, we have multiple objectives: the first term controls the training error; the second regularizes the complexity of the functions f js in the Reproducing Kernel Hilbert Space (RKHS) H; and the third prefers the predicted labels of the unlabeled data D; to be close to the reference predictions. |
Abstract | This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data . |
Conclusion | The research described in this paper involved text classification using positive and unlabeled data . |
Experiments | The rest of the positive and negative documents are used as unlabeled data . |
Experiments | The number of positive training data in other three methods depends on the value of 6, and the rest of the positive and negative documents were used as unlabeled data . |
Experiments | Our goal is to achieve classification accuracy from only positive documents and unlabeled data as high as that from labeled positive and negative data. |
Framework of the System | First, we randomly select documents from unlabeled data (U) where the number of documents is equal to that of the initial positive training documents (Pl). |
Framework of the System | For the result of correction (R001, we train SVM classifiers, and classify the remaining unlabeled data (U \ N1). |
Introduction | Several authors have attempted to improve classification accuracy using only positive and unlabeled data (Yu et al., 2002; Ho et al., 2011). |
Introduction | Our goal is to eliminate the need for manually collecting training documents, and hopefully achieve classification accuracy from positive and unlabeled data as high as that from labeled positive and labeled negative data. |
Introduction | Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data , and add the classification results to the training data. |
Experiments | As for unlabeled data we crawled the web and collected around 100,000 questions that are similar in style and length to the ones in QuestionBank, e.g. |
Experiments | A CRF model is used to decode the unlabeled data to generate more labeled examples for retraining. |
Experiments | smooth the semantic tag posteriors of a unlabeled data decoded by the CRF model using Eq. |
Introduction | To the best of our knowledge, our work is the first to explore the unlabeled data to iteratively adapt the semantic tagging models for target domains, preserving information from the previous iterations. |
Related Work and Motivation | Adapting the source domain using unlabeled data is the key to achieving good performance across domains. |
Semi-Supervised Semantic Labeling | The last term is the loss on unlabeled data from target domain with a hyper-parameter 7'. |
Semi-Supervised Semantic Labeling | After we decode the unlabeled data , we retrain a new CRF model at each iteration. |
Semi-Supervised Semantic Labeling | Each iteration makes predictions on the semantic tags of unlabeled data with varying posterior probabilities. |
Conclusion and Future Work | First, the proposed model can learn previously unseen sentiment words from large unlabeled data , which are not covered by the limited vocabulary in machine translation of the labeled data. |
Cross-Lingual Mixture Model for Sentiment Classification | where A and AWL) are weighting factor to control the influence of the unlabeled data . |
Cross-Lingual Mixture Model for Sentiment Classification | We set A, (AND) to A, (At) when d, belongs to unlabeled data , 1 otherwise. |
Cross-Lingual Mixture Model for Sentiment Classification | When d, belongs to unlabeled data , P(cj is computed according to Equation 5 or 6. |
Experiment | Larger weights indicate larger influence from the unlabeled data . |
Experiment | Second, the two classifiers make prediction on Chinese unlabeled data and their English translation, |
Experiment | Figure 3: Accuracy with different size of unlabeled data for NTICR-EN+NTCIR-CH |
Related Work | After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification. |
AL-SMT: Multilingual Setting | (Ueffing et al., 2007; Haffari et al., 2009) show that treating [0+ as a source for a new feature function in a log-linear model for SMT (Och and Ney, 2004) allows us to maximally take advantage of unlabeled data by finding a weight for this feature using minimum error-rate training (MERT) (Och, 2003). |
Introduction | In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data . |
Related Work | (Reichart et al., 2008) introduces multitask active learning where unlabeled data require annotations for multiple tasks, e. g. they consider named-entities and parse trees, and showed that multiple tasks helps selection compared to individual tasks. |
Sentence Selection: Multiple Language Pairs | Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task. |
Sentence Selection: Single Language Pair | The more frequent a phrase (not a phrase pair) is in the unlabeled data , the more important it is to |
Sentence Selection: Single Language Pair | know its translation; since it is more likely to see it in test data (specially when the test data is in-domain with respect to unlabeled data ). |
Sentence Selection: Single Language Pair | In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U? |
Abstract | We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling. |
Cross-Language Structural Correspondence Learning | Note that the support of 2125 and 2127 can be determined from the unlabeled data Du. |
Cross-Language Structural Correspondence Learning | Input: Labeled source data D5 Unlabeled data Du 2 D5,” U D1,, |
Experiments | Due to the use of task-specific, unlabeled data , relevant characteristics are captured by the pivot classifiers. |
Experiments | Unlabeled Data The first row of Figure 2 shows the performance of CL-SCL as a function of the ratio of labeled and unlabeled documents. |
Related Work | In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain. |
Related Work | SCL then models the correlation between the pivots and all other features by training linear classifiers on the unlabeled data from both domains. |
Related Work | Ando and Zhang (2005b) present a semi-supervised learning method based on this paradigm, which generates related tasks from unlabeled data . |
Introduction | By using unlabelled data to reduce data sparsity in the labeled training data, semi-supervised approaches improve generalization accuracy. |
Unlabled Data | Unlabeled data is used for inducing the word representations. |
Unlabled Data | (2009), we found that all word representations performed better on the supervised task when they were induced on the clean unlabeled data , both embeddings and Brown clusters. |
Unlabled Data | Note that cleaning is applied only to the unlabeled data , not to the labeled data used in the supervised tasks. |
Abstract | Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. |
Capturing Paradigmatic Relations via Word Clustering | Brown Clustering Our first choice is the bottom-up agglomerative word clustering algorithm of (Brown et al., 1992) which derives a hierarchical clustering of words from unlabeled data . |
Capturing Paradigmatic Relations via Word Clustering | The large-scale unlabeled data we use in our experiments comes from the Chinese Gigaword (LDC2005T14). |
Capturing Paradigmatic Relations via Word Clustering | Furthermore, as shown in (Sun and Xu, 2011), appropriate string knowledge acquired from large-scale unlabeled data can significantly enhance a supervised model, especially for the prediction of out-of-vocabulary (OOV) words. |
Introduction | First, we employ unsupervised word clustering to explore paradigmatic relations that are encoded in large-scale unlabeled data . |
Experiments | by the character-based alignment (VES-NO-GP), and the graph propagation (VES-GP—PL), are regarded as virtual evidences to bias CRFs model’s learning on the unlabeled data . |
Methodology | Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence. |
Methodology | In our setting, the CRFs model is required to learn from unlabeled data . |
Methodology | This work employs the posterior regularization (PR) framework3 (Ganchev et al., 2010) to bias the CRFs model’s learning on unlabeled data , under a constraint encoded by the graph propagation expression. |
Related Work | (2014), proposed GP for inferring the label information of unlabeled data , and then leverage these GP outcomes to learn a semi-supervised scalable model (e. g., CRFs). |
Related Work | One of our main objectives is to bias CRFs model’s learning on unlabeled data , under a nonlinear GP constraint encoding the bilingual knowledge. |
Related Work | (2008) described constraint driven learning (CODL) that augments model learning on unlabeled data by adding a cost for violating expectations of constraint features designed by domain knowledge. |
Evaluation | We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data . |
Evaluation | This could be attributed to (l) the unlabeled data , which may have provided the transductive learner with useful information that are not accessible to the other learners, and (2) the ensemble, which is more noise-tolerant to the imperfect seeds. |
Evaluation | Specifically, we used the 500 seeds to guide the selection of active leam-ing points, but trained a transductive SVM using only the active learning points as labeled data (and the rest as unlabeled data ). |
Our Approach | Given that we now have a labeled set (composed of 100 manually labeled points selected by active learning and 500 unambiguous points) as well as a larger set of points that are yet to be labeled (i.e., the remaining unlabeled points in the training folds and those in the test fold), we aim to train a better classifier by using a weakly supervised learner to learn from both the labeled and unlabeled data . |
Our Approach | points in Lj, where i 75 j) as unlabeled data . |
Our Approach | Since the points in the test fold are included in the unlabeled data , they are all classified in this step. |
Discussion and Related Work | In earlier work on semi-supervised learning, e. g., (Blum and Mitchell 1998), the classifiers learned from unlabeled data were used directly. |
Discussion and Related Work | Recent research shows that it is better to use whatever is learned from the unlabeled data as features in a discriminative classifier. |
Discussion and Related Work | Wong and Ng (2007) and Suzuki and Isozaki (2008) are similar in that they run a baseline discriminative classifier on unlabeled data to generate pseudo examples, which are then used to train a different type of classifier for the same problem. |
Introduction | two-stage strategy: first create word clusters with unlabeled data and then use the clusters as features in supervised training. |
Experiments | At each random selection, the rest of the utterances are used as unlabeled data to boost the performance of MCM. |
Experiments | Being Bayesian, our model can incorporate unlabeled data at training time. |
Experiments | Here, we evaluate the performance gain on domain, act and slot predictions as more unlabeled data is introduced at learning time. |
Abstract | Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result. |
Experiments of Parsing | 4.3 Using Unlabeled Data for Parsing |
Experiments of Parsing | Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007). |
Experiments of Parsing | 2000) (PDC) as unlabeled data for parsing. |
Experiments | In the supervised setting, we treated the test data as unlabeled data and performed transductive learning. |
Experiments | In the semi-supervised setting, our unlabeled data consists of |
Experiments | both the available unlabeled data and the test data. |
Introduction | In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning. |
Experiments | N0 peak in performance is reached, so further improvements are possible with more unlabeled data . |
Experiments | Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope. |
Related Work | Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data. |
Related Work | Unlike these systems, our efforts are aimed at using unlabeled data to find distributional representations that work well on rare terms, making the supervised systems more applicable to other domains and decreasing their sample complexity. |
Smoothing Natural Language Sequences | Using Expectation-Maximization (Dempster et al., 1977), it is possible to estimate the distributions for and P |yi_1) from unlabeled data . |
Experiment | To keep the experiment tractable, we first randomly choose 50,000 of all the texts as unlabeled data , which contain 2,420,037 characters. |
Experiment | We also experimented on different size of unlabeled data to evaluate the performance when adding unlabeled target domain data. |
Experiment | TABLE 5 shows different f-scores and OOV—Recalls on different unlabeled data set. |
Implementation Details of Multitask Learning Method | BLLIP North American News Text (Complete) is used as unlabeled data source to generate synthetic labeled data. |
Related Work | Due to the lack of benchmark data for implicit discourse relation analysis, earlier work used unlabeled data to generate synthetic implicit discourse data. |
Related Work | Research work in this category exploited both labeled and unlabeled data for discourse relation prediction. |
Related Work | (Hernault et al., 2010) presented a semi-supervised method based on the analysis of co-occurring features in labeled and unlabeled data . |
Abstract | We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents. |
Conclusion | To more effectively utilize unlabeled data and induce domain-specific adaptation of our models, several extensions are possible: facilitating learning from related domains, incorporating hyperlinks between documents, incorporating synonyms or co-occurences between words etc. |
Incorporating Lexical Knowledge | It should be noted, that this list was constructed without a specific domain in mind; which is further motivation for using training examples and unlabeled data to learn domain specific connotations. |
Related Work | The goal of the former theme is to learn from few labeled examples by making use of unlabeled data , while the goal of the latter theme is to utilize weak prior knowledge about term-class affinities (e.g., the term “awful” indicates negative sentiment and therefore may be considered as a negatively labeled feature). |
Abstract | Starting with a domain-independent, high-precision sentiment lexicon and a large pool of unlabeled data , we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. |
Lexicon Bootstrapping | To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process. |
Lexicon Bootstrapping | On each iteration i 2 1, tweets in the unlabeled data are labeled using the lexicon |
Related Work | Corpus-based methods extract subjectivity and sentiment lexicons from large amounts of unlabeled data using different similarity metrics to measure the relatedness between words. |
Abstract | To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data . |
Evaluation | To get a sense of the accuracy of the bootstrapped documents without further manual labeling, recall that our experimental setup resembles a transductive setting where the test documents are part of the unlabeled data , and consequently, some of them may have been automatically labeled by the bootstrapping algorithm. |
Our Bootstrapping Algorithm | To alleviate the data scarcity problem and improve the accuracy of the classifiers, we propose in this section a bootstrapping algorithm that automatically augments a training set by exploiting a large amount of unlabeled data . |
Related Work | Minority classes can be expanded without the availability of unlabeled data as well. |
The Related Work | Typical domain adaptation tasks often assume annotated data in new domain absent or insufficient and a large scale unlabeled data available. |
The Related Work | As unlabeled data are concerned, semi-supervised or unsupervised methods will be naturally adopted. |
The Related Work | The first is usually focus on exploiting automatic generated labeled data from the unlabeled data (Steedman et al., 2003; McClosky et al., 2006; Reichart and Rappoport, 2007; Sagae and Tsujii, 2007; Chen et al., 2008), the second is on combining supervised and unsupervised methods, and only unlabeled data are considered (Smith and Eisner, 2006; Wang and Schuurmans, 2008; Koo et al., 2008). |
A <— METRICLEARNER(X, 3,1?) | In this case, we treat the set of test instances (without their gold labels) as the unlabeled data . |
Introduction | IDML-IT (Dhillon et al., 2010) is another such method which exploits labeled as well as unlabeled data during metric learning. |
Metric Learning | The ITML metric learning algorithm, which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data . |
Metric Learning | In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined. |
Inference | An inference algorithm for an unsupervised model should be efficient enough to handle vast amounts of unlabeled data , as it can easily be obtained and is likely to improve results. |
Introduction | This suggests that the models scale to much larger corpora, which is an important property for a successful unsupervised learning method, as unlabeled data is abundant. |
Related Work | Most of the SRL research has focused on the supervised setting, however, lack of annotated resources for most languages and insufficient coverage provided by the existing resources motivates the need for using unlabeled data or other forms of weak supervision. |
Related Work | This includes methods based on graph alignment between labeled and unlabeled data (Furstenau and Lapata, 2009), using unlabeled data to improve lexical generalization (Deschacht and Moens, 2009), and projection of annotation across languages (Pado and Lapata, 2009; van der Plas et al., 2011). |
Introduction | (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data. |
Related Work | Semi-supervised learning exploits both labeled and unlabeled data . |
Related Work | It proves useful when labeled data is scarce and hard to construct while unlabeled data is abundant and easy to access. |
Related Work | Another representative of semi-supervised learning is learning a robust representation of the input from unlabeled data . |
Experiment | Previous work found that the performance can be improved by pre-training the character embeddings on large unlabeled data and using the obtained embeddings to initialize the character lookup table instead of random initialization |
Experiment | There are several ways to learn the embeddings on unlabeled data . |
Experiment | (2013), the bigram embeddings are pre-trained on unlabeled data with character embeddings, which significantly improves the model performance. |
Experiments | We also compare our method with the semi-supervised approaches, the semi-supervised approaches achieved very high accuracies by leveraging on large unlabeled data directly into the systems for joint learning and decoding, while in our method, we only explore the N-gram features to further improve supervised dependency parsing performance. |
Experiments | Some previous studies also found a log-linear relationship between unlabeled data (Suzuki and Isozaki, 2008; Suzuki et al., 2009; Bergsma et al., 2010; Pitler et al., 2010). |
Web-Derived Selectional Preference Features | All of our selectional preference features described in this paper rely on probabilities derived from unlabeled data . |
Abstract | The results suggest that it is possible to learn domain-specific entity types from unlabeled data . |
Conclusion | We evaluated an approach to learning domain-specific interpretable entity types from unlabeled data . |
Introduction | 0 we empirically evaluate an approach to learning types from unlabeled data |
Discussion and Future Work | We utilize unlabeled data in both generative and discriminative models for dependency syntax and in generative word clustering. |
Discussion and Future Work | Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision. |
Discussion and Future Work | We observe that careful use of these unlabeled data resources can improve performance on the end task. |
Abstract | Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data . |
Introduction | Traditionally, we would learn the embeddings for the target task jointly with whatever unlabeled data we may have, in an instance of semi-supervised learning, and/or we may leverage labels from multiple other related tasks in a multitask approach. |
Related Work | Our method is different in that the (potentially) massive amount of unlabeled data is not required a-priori, but only the resultant embedding. |
Introduction | In general, semi-supervised learning can be motivated by two concerns: first, given a fixed amount of supervised data, we might wish to leverage additional unlabeled data to facilitate the utilization of the supervised corpus, increasing the performance of the model in absolute terms. |
Introduction | Second, given a fixed target performance level, we might wish to use unlabeled data to reduce the amount of annotated data necessary to reach this target. |
Related Work | Crucially, however, these methods do not exploit unlabeled data when leam-ing their representations. |
Related Work | Co-training puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance. |
Results and Discussion | For an unsupervised approach, which only needs unlabeled data , there is little cost to creating large training sets. |
System Architecture | Unlabeled Data |
Introduction | This is implemented by maximizing the empirical accuracy on the prior knowledge (labeled data) and the entropy of hash functions (estimated over labeled and unlabeled data ). |
Semi-Supervised SimHash | Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data, c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data . |
Semi-Supervised SimHash | the labeled and unlabeled data , 26,; and XU. |
Data | The set of 3,123 words that were not annotated was the unlabeled data for the EM algorithm. |
Experiments and Results | Our post-experimental analysis reveals that the parameter updation process using the unlabeled data has an effect of overly separating the two overlapping distributions. |
Finding the Homographs in a Lexicon | In Model II, the semi-supervised setup, the training data is used to initialize the Expectation-Maximization (EM) algorithm (Dempster et al., 1977) and the unlabeled data , described in Section 3.1, updates the initial estimates. |