Index of papers in Proc. ACL that mention
  • unlabeled data
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Abstract
Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data .
Abstract
With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
Introduction
In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data , have attracted more and more interest.
Introduction
Previously, unlabeled data is explored to derive useful local-context features such as word clusters (Koo et al., 2008), subtree frequencies (Chen et al., 2009; Chen et al., 2013), and word co-occurrence counts (Zhou et al., 2011; Bansal and Klein, 2011).
Introduction
A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009).
unlabeled data is mentioned in 58 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan
Abstract
We consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain.
Empirical Evaluation
For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains.
Empirical Evaluation
Also, it is important to point out that the SCL method uses auxiliary tasks to induce the shared feature representation, these tasks are constructed on the basis of unlabeled data .
Introduction
In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain.
Introduction
In this paper we focus on a more challenging and arguably more realistic version of the domain-adaptation problem where only unlabeled data is available for the target domain.
Introduction
(2006) use auxiliary tasks based on unlabeled data for both domains (called pivot features) and a dimensionality reduction technique to induce such shared representation.
Learning and Inference
Intuitively, maximizing the likelihood of unlabeled data is closely related to minimizing the reconstruction error, that is training a model to discover such mapping parameters u that z encodes all the necessary information to accurately reproduce :13“) from z for every training example :3“).
The Latent Variable Model
and unlabeled data for the source and target domain {m(l)}l€3UuTU, where SU and TU stand for the unlabeled datasets for the source and target domains, respectively.
The Latent Variable Model
However, given that, first, amount of unlabeled data |SU U TU| normally vastly exceeds the amount of labeled data |SL| and, second, the number of features for each example |a3(l)| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e.
unlabeled data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
where y,-’ is the unobserved class label for the i-th instance in the unlabeled data .
A Joint Model with Unlabeled Parallel Text
By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
A Joint Model with Unlabeled Parallel Text
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data ; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Abstract
Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data ; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Experimental Setup 4.1 Data Sets and Preprocessing
MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Experimental Setup 4.1 Data Sets and Preprocessing
SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Introduction
maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)).
Introduction
To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another.
Related Work
Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009).
unlabeled data is mentioned in 38 sentences in this paper.
Topics mentioned in this paper:
Wang, Chang and Fan, James
Experiments
By integrating unlabeled data , the manifold model under setting (1) made a 15% improvement over linear regression model on F1 score, where the improvement was significant across all relations.
Experiments
This tells us that estimating the label of unlabeled examples based upon the sampling result is one way to utilize unlabeled data and may help improve the relation extraction results.
Experiments
On one hand, this result shows that using more unlabeled data can further improve the result.
Identifying Key Medical Relations
Part of the resulting data was manually vetted by our annotators, and the remaining was held as unlabeled data for further experiments.
Introduction
0 From the perspective of relation extraction methodologies, we present a manifold model for relation extraction utilizing both labeled and unlabeled data .
Relation Extraction with Manifold Models
Given a few labeled examples and many unlabeled examples for a relation, we want to build a relation detector leveraging both labeled and unlabeled data .
Relation Extraction with Manifold Models
Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient.
Relation Extraction with Manifold Models
When ,u = 0, the model disregards the unlabeled data , and the data manifold topology is not respected.
unlabeled data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Abstract
We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains.
Experiments
Figure 4: Effect of source domain unlabeled data .
Experiments
The amount of unlabeled data is held constant, so that any change in classification accu-
Experiments
Figure 5: Effect of target domain unlabeled data .
Introduction
positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains.
Introduction
We use labeled data from multiple source domains and unlabeled data from source and target domains to represent the distribution of features.
Introduction
Unlabeled data is cheaper to collect compared to labeled data and is often available in large quantities.
unlabeled data is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Li, Shoushan and Huang, Chu-Ren and Zhou, Guodong and Lee, Sophia Yat Mei
Introduction
Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time.
Introduction
Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005).
Introduction
Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification.
Related Work
Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models.
Unsupervised Mining of Personal and Impersonal Views
Semi-supervised learning is a strategy which combines unlabeled data with labeled training data to improve the models.
Unsupervised Mining of Personal and Impersonal Views
The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
Unsupervised Mining of Personal and Impersonal Views
L—impersonal The unlabeled data U containing personal sentence set S l and impersonal sentence set
unlabeled data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Lucas, Michael and Downey, Doug
Abstract
Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data .
Abstract
However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data , meaning the techniques are not applicable to large corpora being produced today.
Abstract
In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets.
Introduction
Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data , combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006).
Introduction
Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)).
Introduction
Instead of utilizing unlabeled examples directly for each given target concept, our approach is to pre-compute a small set of statistics over the unlabeled data in advance.
Problem Definition
MNB-FM attempts to improve MNB’s estimates of 6:; and 6,; , using statistics computed over the unlabeled data .
Problem Definition
Equation 2 can be estimated in advance, Without knowledge of the target class, simply by counting the number of tokens of each word in the unlabeled data .
Problem Definition
The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates.
unlabeled data is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions.
Abstract
The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data .
Introduction
Therefore, semi-supervised joint S&T appears to be a natural solution for easily incorporating accessible unlabeled data to improve the joint S&T model.
Introduction
Motivated by the works in (Subramanya et al., 2010; Das and Smith, 2011), for structured problems, graph-based label propagation can be employed to infer valuable syntactic information (n-gram-level label distributions) from labeled data to unlabeled data .
Introduction
labeled and unlabeled data to achieve the semi-supervised learning.
Method
The emphasis of this work is on building a joint S&T model based on two different kinds of data sources, labeled and unlabeled data .
Method
In essence, this learning problem can be treated as incorporating certain gainful information, e.g., prior knowledge or label constraints, of unlabeled data into the supervised model.
Related Work
Sun and Xu (2011) enhanced a CWS model by interpolating statistical features of unlabeled data into the CRFs model.
Related Work
(2011) proposed a semi-supervised pipeline S&T model by incorporating n-gram and lexicon features derived from unlabeled data .
Related Work
Different from their concern, our emphasis is to learn the semi-supervised model by injecting the label information from a similarity graph constructed from labeled and unlabeled data .
unlabeled data is mentioned in 33 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data .
Abstract
The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data .
Introduction
Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model.
Introduction
The crux of solving semi-supervised learning problem is the learning on unlabeled data .
Introduction
Inspired by multi-view learning that exploits redundant views of the same input data (Ganchev et al., 2008), this paper proposes a semi-supervised CWS model of co-regularizing from two different views (intrinsically two different models), character-based and word-based, on unlabeled data .
Semi-supervised Learning via Co-regularizing Both Models
As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data .
Semi-supervised Learning via Co-regularizing Both Models
Obviously, the learning on unlabeled data does not come for “free”.
Semi-supervised Learning via Co-regularizing Both Models
Very often, it is necessary to discover certain gainful information, e.g., label constraints of unlabeled data , that is incorporated to guide the learner toward a desired solution.
unlabeled data is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
In addition to labelled data, a large amount of unlabelled data on the web domain is also provided.
Experiments
about labelled and unlabelled data are summarized in Table 1 and Table 2, respectively.
Experiments
unlabelled data .
Introduction
The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data).
Introduction
The idea of learning representations from unlabelled data and then fine-tuning a model with such representations according to some supervised criterion has been studied before (Turian et al., 2010; Collobert et al., 2011; Glorot et al., 2011).
Related Work
(2011) propose to learn representations from the mixture of both source and target domain unlabelled data to improve cross-domain sentiment classification.
Related Work
Such high dimensional input gives rise to high computational cost and it is not clear whether those approaches can be applied to large scale unlabelled data , with hundreds of millions of training examples.
Related Work
The new representations are induced based on the auxiliary tasks defined on unlabelled data together with a dimensionality reduction technique.
unlabeled data is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Experiments
the dependency annotations off the training portion of each treebank, and use that as the unlabeled data for that target language.
Experiments
-U: Our approach training on only parallel data without unlabeled data for the target language.
Experiments
+U: Our approach training on both parallel and unlabeled data .
Introduction
We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data .
Our Approach
Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
Our Approach
However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser.
Our Approach
In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
unlabeled data is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng
Abstract
We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA.
Abstract
We create a graph for labeled and unlabeled data using match-scores of textual entailment features as similarity weights between data points.
Experiments
We show that as we increase the number of unlabeled data , with our graph-summarization, it is feasible to extract information that can improve the performance of QA models.
Experiments
In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data.
Experiments
Table 3: The effect of number of unlabeled data on MRR from Hybrid graph Summarization SSL.
Graph Summarization
Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
Graph Summarization
Since we do not know estimated local density constraints of unlabeled data points, we use constants to construct local density constraint column vector for X’ dataset as follows:
Introduction
Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
Introduction
We consider situations where there are much more unlabeled data , X U, than labeled data, X L, i.e., 711; << 711].
unlabeled data is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang
Introduction
Following the common theme of “more data is better data” we also use both a limited labeled corpora and a plentiful unlabeled data resource.
Introduction
(2006a) successfully applied self-training to parsing by exploiting available unlabeled data , and obtained remarkable results when the same technique was applied to parser adaptation (McClosky et al., 2006b).
Introduction
However, the standard objective of an S3VM is non-convex on the unlabeled data , thus requiring sophisticated global optimization heuristics to obtain reasonable solutions.
Semi-supervised Convex Training for Structured SVM
for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data .
Semi-supervised Structured Large Margin Objective
The objective of standard semi-supervised structured SVM is a combination of structured large margin losses on both labeled and unlabeled data .
Semi-supervised Structured Large Margin Objective
We introduce an efficient approximation—least squares loss—for the structured large margin loss on unlabeled data below.
unlabeled data is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Abstract
We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method.
Abstract
In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection
Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data .
Co-training strategy for prosodic event detection
There are two issues: (1) the accurate self-labeling method for unlabeled data and (2) effective heuristics to se-
Co-training strategy for prosodic event detection
Given a set L of labeled training data and a set U of unlabeled data Randomly select U’ from U, |U’ |=u while iteration < k do Use L to train classifiers h] and [12 Apply h] and 112 to assign labels for all examples in U’ Select n self-labeled samples and add to L Remove these it samples from U Recreate U’ by choosing u instances randomly from U
Conclusions
We introduced a confidence-based method to assign possible labels to unlabeled data and evaluated the performance combined with informative sample selection methods.
Conclusions
This suggests that the use of unlabeled data can lead to significant improvement for prosodic event detection.
Experiments and results
Our goal is to determine whether the co-training algorithm described above could successfully use the unlabeled data for prosodic event detection.
Introduction
We propose a confidence-based method to assign labels to unlabeled data in training iterations and evaluate its performance combined with different informative sample selection methods.
Introduction
Our experiments on the Boston Radio News corpus show that the use of unlabeled data can lead to significant improvement of prosodic event detection compared to using the original small training set, and that the semi-supervised learning result is comparable with supervised learning with similar amount of training data.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong
Abstract
Our framework leverages on both labeled and unlabeled data in the target domain.
Abstract
To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain.
Experiments
negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data .
Introduction
In the first phase, Supervised Voting is used to determine the relevance of each source domain to each region in the target domain, using both labeled and unlabeled data in the target domain.
Introduction
By using also unlabeled data , we alleviate the lack of labeled samples for rarer relations due to imbalanced distributions in relation types.
Introduction
reasonable predictive performance even when all the source domains are irrelevant and augments the rarer classes with examples in the unlabeled data .
Problem Statement
:1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector, yi is the corresponding label (if available).
Robust Domain Adaptation
With data from both the labeled and unlabeled data sets, we apply transductive inference or semi-supervised learning (Zhou et al., 2003) to achieve both (i) and (ii).
Robust Domain Adaptation
By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data.
Robust Domain Adaptation
Here, we have multiple objectives: the first term controls the training error; the second regularizes the complexity of the functions f js in the Reproducing Kernel Hilbert Space (RKHS) H; and the third prefers the predicted labels of the unlabeled data D; to be close to the reference predictions.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Fukumoto, Fumiyo and Suzuki, Yoshimi and Matsuyoshi, Suguru
Abstract
This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data .
Conclusion
The research described in this paper involved text classification using positive and unlabeled data .
Experiments
The rest of the positive and negative documents are used as unlabeled data .
Experiments
The number of positive training data in other three methods depends on the value of 6, and the rest of the positive and negative documents were used as unlabeled data .
Experiments
Our goal is to achieve classification accuracy from only positive documents and unlabeled data as high as that from labeled positive and negative data.
Framework of the System
First, we randomly select documents from unlabeled data (U) where the number of documents is equal to that of the initial positive training documents (Pl).
Framework of the System
For the result of correction (R001, we train SVM classifiers, and classify the remaining unlabeled data (U \ N1).
Introduction
Several authors have attempted to improve classification accuracy using only positive and unlabeled data (Yu et al., 2002; Ho et al., 2011).
Introduction
Our goal is to eliminate the need for manually collecting training documents, and hopefully achieve classification accuracy from positive and unlabeled data as high as that from labeled positive and labeled negative data.
Introduction
Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data , and add the classification results to the training data.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Experiments
As for unlabeled data we crawled the web and collected around 100,000 questions that are similar in style and length to the ones in QuestionBank, e.g.
Experiments
A CRF model is used to decode the unlabeled data to generate more labeled examples for retraining.
Experiments
smooth the semantic tag posteriors of a unlabeled data decoded by the CRF model using Eq.
Introduction
To the best of our knowledge, our work is the first to explore the unlabeled data to iteratively adapt the semantic tagging models for target domains, preserving information from the previous iterations.
Related Work and Motivation
Adapting the source domain using unlabeled data is the key to achieving good performance across domains.
Semi-Supervised Semantic Labeling
The last term is the loss on unlabeled data from target domain with a hyper-parameter 7'.
Semi-Supervised Semantic Labeling
After we decode the unlabeled data , we retrain a new CRF model at each iteration.
Semi-Supervised Semantic Labeling
Each iteration makes predictions on the semantic tags of unlabeled data with varying posterior probabilities.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Conclusion and Future Work
First, the proposed model can learn previously unseen sentiment words from large unlabeled data , which are not covered by the limited vocabulary in machine translation of the labeled data.
Cross-Lingual Mixture Model for Sentiment Classification
where A and AWL) are weighting factor to control the influence of the unlabeled data .
Cross-Lingual Mixture Model for Sentiment Classification
We set A, (AND) to A, (At) when d, belongs to unlabeled data , 1 otherwise.
Cross-Lingual Mixture Model for Sentiment Classification
When d, belongs to unlabeled data , P(cj is computed according to Equation 5 or 6.
Experiment
Larger weights indicate larger influence from the unlabeled data .
Experiment
Second, the two classifiers make prediction on Chinese unlabeled data and their English translation,
Experiment
Figure 3: Accuracy with different size of unlabeled data for NTICR-EN+NTCIR-CH
Related Work
After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification.
unlabeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
(Ueffing et al., 2007; Haffari et al., 2009) show that treating [0+ as a source for a new feature function in a log-linear model for SMT (Och and Ney, 2004) allows us to maximally take advantage of unlabeled data by finding a weight for this feature using minimum error-rate training (MERT) (Och, 2003).
Introduction
In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data .
Related Work
(Reichart et al., 2008) introduces multitask active learning where unlabeled data require annotations for multiple tasks, e. g. they consider named-entities and parse trees, and showed that multiple tasks helps selection compared to individual tasks.
Sentence Selection: Multiple Language Pairs
Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task.
Sentence Selection: Single Language Pair
The more frequent a phrase (not a phrase pair) is in the unlabeled data , the more important it is to
Sentence Selection: Single Language Pair
know its translation; since it is more likely to see it in test data (specially when the test data is in-domain with respect to unlabeled data ).
Sentence Selection: Single Language Pair
In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
unlabeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Abstract
We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling.
Cross-Language Structural Correspondence Learning
Note that the support of 2125 and 2127 can be determined from the unlabeled data Du.
Cross-Language Structural Correspondence Learning
Input: Labeled source data D5 Unlabeled data Du 2 D5,” U D1,,
Experiments
Due to the use of task-specific, unlabeled data , relevant characteristics are captured by the pivot classifiers.
Experiments
Unlabeled Data The first row of Figure 2 shows the performance of CL-SCL as a function of the ratio of labeled and unlabeled documents.
Related Work
In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
Related Work
SCL then models the correlation between the pivots and all other features by training linear classifiers on the unlabeled data from both domains.
Related Work
Ando and Zhang (2005b) present a semi-supervised learning method based on this paradigm, which generates related tasks from unlabeled data .
unlabeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Introduction
By using unlabelled data to reduce data sparsity in the labeled training data, semi-supervised approaches improve generalization accuracy.
Unlabled Data
Unlabeled data is used for inducing the word representations.
Unlabled Data
(2009), we found that all word representations performed better on the supervised task when they were induced on the clean unlabeled data , both embeddings and Brown clusters.
Unlabled Data
Note that cleaning is applied only to the unlabeled data , not to the labeled data used in the supervised tasks.
unlabeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Uszkoreit, Hans
Abstract
Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger.
Capturing Paradigmatic Relations via Word Clustering
Brown Clustering Our first choice is the bottom-up agglomerative word clustering algorithm of (Brown et al., 1992) which derives a hierarchical clustering of words from unlabeled data .
Capturing Paradigmatic Relations via Word Clustering
The large-scale unlabeled data we use in our experiments comes from the Chinese Gigaword (LDC2005T14).
Capturing Paradigmatic Relations via Word Clustering
Furthermore, as shown in (Sun and Xu, 2011), appropriate string knowledge acquired from large-scale unlabeled data can significantly enhance a supervised model, especially for the prediction of out-of-vocabulary (OOV) words.
Introduction
First, we employ unsupervised word clustering to explore paradigmatic relations that are encoded in large-scale unlabeled data .
unlabeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Experiments
by the character-based alignment (VES-NO-GP), and the graph propagation (VES-GP—PL), are regarded as virtual evidences to bias CRFs model’s learning on the unlabeled data .
Methodology
Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence.
Methodology
In our setting, the CRFs model is required to learn from unlabeled data .
Methodology
This work employs the posterior regularization (PR) framework3 (Ganchev et al., 2010) to bias the CRFs model’s learning on unlabeled data , under a constraint encoded by the graph propagation expression.
Related Work
(2014), proposed GP for inferring the label information of unlabeled data , and then leverage these GP outcomes to learn a semi-supervised scalable model (e. g., CRFs).
Related Work
One of our main objectives is to bias CRFs model’s learning on unlabeled data , under a nonlinear GP constraint encoding the bilingual knowledge.
Related Work
(2008) described constraint driven learning (CODL) that augments model learning on unlabeled data by adding a cost for violating expectations of constraint features designed by domain knowledge.
unlabeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Dasgupta, Sajib and Ng, Vincent
Evaluation
We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data .
Evaluation
This could be attributed to (l) the unlabeled data , which may have provided the transductive learner with useful information that are not accessible to the other learners, and (2) the ensemble, which is more noise-tolerant to the imperfect seeds.
Evaluation
Specifically, we used the 500 seeds to guide the selection of active leam-ing points, but trained a transductive SVM using only the active learning points as labeled data (and the rest as unlabeled data ).
Our Approach
Given that we now have a labeled set (composed of 100 manually labeled points selected by active learning and 500 unambiguous points) as well as a larger set of points that are yet to be labeled (i.e., the remaining unlabeled points in the training folds and those in the test fold), we aim to train a better classifier by using a weakly supervised learner to learn from both the labeled and unlabeled data .
Our Approach
points in Lj, where i 75 j) as unlabeled data .
Our Approach
Since the points in the test fold are included in the unlabeled data , they are all classified in this step.
unlabeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Discussion and Related Work
In earlier work on semi-supervised learning, e. g., (Blum and Mitchell 1998), the classifiers learned from unlabeled data were used directly.
Discussion and Related Work
Recent research shows that it is better to use whatever is learned from the unlabeled data as features in a discriminative classifier.
Discussion and Related Work
Wong and Ng (2007) and Suzuki and Isozaki (2008) are similar in that they run a baseline discriminative classifier on unlabeled data to generate pseudo examples, which are then used to train a different type of classifier for the same problem.
Introduction
two-stage strategy: first create word clusters with unlabeled data and then use the clusters as features in supervised training.
unlabeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Experiments
At each random selection, the rest of the utterances are used as unlabeled data to boost the performance of MCM.
Experiments
Being Bayesian, our model can incorporate unlabeled data at training time.
Experiments
Here, we evaluate the performance gain on domain, act and slot predictions as more unlabeled data is introduced at learning time.
unlabeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Abstract
Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.
Experiments of Parsing
4.3 Using Unlabeled Data for Parsing
Experiments of Parsing
Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing
2000) (PDC) as unlabeled data for parsing.
unlabeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
In the supervised setting, we treated the test data as unlabeled data and performed transductive learning.
Experiments
In the semi-supervised setting, our unlabeled data consists of
Experiments
both the available unlabeled data and the test data.
Introduction
In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.
unlabeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Experiments
N0 peak in performance is reached, so further improvements are possible with more unlabeled data .
Experiments
Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work
Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data.
Related Work
Unlike these systems, our efforts are aimed at using unlabeled data to find distributional representations that work well on rare terms, making the supervised systems more applicable to other domains and decreasing their sample complexity.
Smoothing Natural Language Sequences
Using Expectation-Maximization (Dempster et al., 1977), it is possible to estimate the distributions for and P |yi_1) from unlabeled data .
unlabeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni
Experiment
To keep the experiment tractable, we first randomly choose 50,000 of all the texts as unlabeled data , which contain 2,420,037 characters.
Experiment
We also experimented on different size of unlabeled data to evaluate the performance when adding unlabeled target domain data.
Experiment
TABLE 5 shows different f-scores and OOV—Recalls on different unlabeled data set.
unlabeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lan, Man and Xu, Yu and Niu, Zhengyu
Implementation Details of Multitask Learning Method
BLLIP North American News Text (Complete) is used as unlabeled data source to generate synthetic labeled data.
Related Work
Due to the lack of benchmark data for implicit discourse relation analysis, earlier work used unlabeled data to generate synthetic implicit discourse data.
Related Work
Research work in this category exploited both labeled and unlabeled data for discourse relation prediction.
Related Work
(Hernault et al., 2010) presented a semi-supervised method based on the analysis of co-occurring features in labeled and unlabeled data .
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Tao and Zhang, Yi and Sindhwani, Vikas
Abstract
We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents.
Conclusion
To more effectively utilize unlabeled data and induce domain-specific adaptation of our models, several extensions are possible: facilitating learning from related domains, incorporating hyperlinks between documents, incorporating synonyms or co-occurences between words etc.
Incorporating Lexical Knowledge
It should be noted, that this list was constructed without a specific domain in mind; which is further motivation for using training examples and unlabeled data to learn domain specific connotations.
Related Work
The goal of the former theme is to learn from few labeled examples by making use of unlabeled data , while the goal of the latter theme is to utilize weak prior knowledge about term-class affinities (e.g., the term “awful” indicates negative sentiment and therefore may be considered as a negatively labeled feature).
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Wilson, Theresa and Yarowsky, David
Abstract
Starting with a domain-independent, high-precision sentiment lexicon and a large pool of unlabeled data , we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process.
Lexicon Bootstrapping
To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process.
Lexicon Bootstrapping
On each iteration i 2 1, tweets in the unlabeled data are labeled using the lexicon
Related Work
Corpus-based methods extract subjectivity and sentiment lexicons from large amounts of unlabeled data using different similarity metrics to measure the relatedness between words.
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Abstract
To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data .
Evaluation
To get a sense of the accuracy of the bootstrapped documents without further manual labeling, recall that our experimental setup resembles a transductive setting where the test documents are part of the unlabeled data , and consequently, some of them may have been automatically labeled by the bootstrapping algorithm.
Our Bootstrapping Algorithm
To alleviate the data scarcity problem and improve the accuracy of the classifiers, we propose in this section a bootstrapping algorithm that automatically augments a training set by exploiting a large amount of unlabeled data .
Related Work
Minority classes can be expanded without the availability of unlabeled data as well.
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
The Related Work
Typical domain adaptation tasks often assume annotated data in new domain absent or insufficient and a large scale unlabeled data available.
The Related Work
As unlabeled data are concerned, semi-supervised or unsupervised methods will be naturally adopted.
The Related Work
The first is usually focus on exploiting automatic generated labeled data from the unlabeled data (Steedman et al., 2003; McClosky et al., 2006; Reichart and Rappoport, 2007; Sagae and Tsujii, 2007; Chen et al., 2008), the second is on combining supervised and unsupervised methods, and only unlabeled data are considered (Smith and Eisner, 2006; Wang and Schuurmans, 2008; Koo et al., 2008).
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Dhillon, Paramveer S. and Talukdar, Partha Pratim and Crammer, Koby
A <— METRICLEARNER(X, 3,1?)
In this case, we treat the set of test instances (without their gold labels) as the unlabeled data .
Introduction
IDML-IT (Dhillon et al., 2010) is another such method which exploits labeled as well as unlabeled data during metric learning.
Metric Learning
The ITML metric learning algorithm, which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data .
Metric Learning
In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined.
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and Klementiev, Alexandre
Inference
An inference algorithm for an unsupervised model should be efficient enough to handle vast amounts of unlabeled data , as it can easily be obtained and is likely to improve results.
Introduction
This suggests that the models scale to much larger corpora, which is an important property for a successful unsupervised learning method, as unlabeled data is abundant.
Related Work
Most of the SRL research has focused on the supervised setting, however, lack of annotated resources for most languages and insufficient coverage provided by the existing resources motivates the need for using unlabeled data or other forms of weak supervision.
Related Work
This includes methods based on graph alignment between labeled and unlabeled data (Furstenau and Lapata, 2009), using unlabeled data to improve lexical generalization (Deschacht and Moens, 2009), and projection of annotation across languages (Pado and Lapata, 2009; van der Plas et al., 2011).
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming
Introduction
(2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data.
Related Work
Semi-supervised learning exploits both labeled and unlabeled data .
Related Work
It proves useful when labeled data is scarce and hard to construct while unlabeled data is abundant and easy to access.
Related Work
Another representative of semi-supervised learning is learning a robust representation of the input from unlabeled data .
unlabeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pei, Wenzhe and Ge, Tao and Chang, Baobao
Experiment
Previous work found that the performance can be improved by pre-training the character embeddings on large unlabeled data and using the obtained embeddings to initialize the character lookup table instead of random initialization
Experiment
There are several ways to learn the embeddings on unlabeled data .
Experiment
(2013), the bigram embeddings are pre-trained on unlabeled data with character embeddings, which significantly improves the model performance.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Experiments
We also compare our method with the semi-supervised approaches, the semi-supervised approaches achieved very high accuracies by leveraging on large unlabeled data directly into the systems for joint learning and decoding, while in our method, we only explore the N-gram features to further improve supervised dependency parsing performance.
Experiments
Some previous studies also found a log-linear relationship between unlabeled data (Suzuki and Isozaki, 2008; Suzuki et al., 2009; Bergsma et al., 2010; Pitler et al., 2010).
Web-Derived Selectional Preference Features
All of our selectional preference features described in this paper rely on probabilities derived from unlabeled data .
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hovy, Dirk
Abstract
The results suggest that it is possible to learn domain-specific entity types from unlabeled data .
Conclusion
We evaluated an approach to learning domain-specific interpretable entity types from unlabeled data .
Introduction
0 we empirically evaluate an approach to learning types from unlabeled data
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Discussion and Future Work
We utilize unlabeled data in both generative and discriminative models for dependency syntax and in generative word clustering.
Discussion and Future Work
Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
Discussion and Future Work
We observe that careful use of these unlabeled data resources can improve performance on the end task.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Labutov, Igor and Lipson, Hod
Abstract
Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data .
Introduction
Traditionally, we would learn the embeddings for the target task jointly with whatever unlabeled data we may have, in an instance of semi-supervised learning, and/or we may leverage labels from multiple other related tasks in a multitask approach.
Related Work
Our method is different in that the (potentially) massive amount of unlabeled data is not required a-priori, but only the resultant embedding.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Introduction
In general, semi-supervised learning can be motivated by two concerns: first, given a fixed amount of supervised data, we might wish to leverage additional unlabeled data to facilitate the utilization of the supervised corpus, increasing the performance of the model in absolute terms.
Introduction
Second, given a fixed target performance level, we might wish to use unlabeled data to reduce the amount of annotated data necessary to reach this target.
Related Work
Crucially, however, these methods do not exploit unlabeled data when leam-ing their representations.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans
Related Work
Co-training puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance.
Results and Discussion
For an unsupervised approach, which only needs unlabeled data , there is little cost to creating large training sets.
System Architecture
Unlabeled Data
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Qixia and Sun, Maosong
Introduction
This is implemented by maximizing the empirical accuracy on the prior knowledge (labeled data) and the entropy of hash functions (estimated over labeled and unlabeled data ).
Semi-Supervised SimHash
Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data, c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data .
Semi-Supervised SimHash
the labeled and unlabeled data , 26,; and XU.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kulkarni, Anagha and Callan, Jamie
Data
The set of 3,123 words that were not annotated was the unlabeled data for the EM algorithm.
Experiments and Results
Our post-experimental analysis reveals that the parameter updation process using the unlabeled data has an effect of overly separating the two overlapping distributions.
Finding the Homographs in a Lexicon
In Model II, the semi-supervised setup, the training data is used to initialize the Expectation-Maximization (EM) algorithm (Dempster et al., 1977) and the unlabeled data , described in Section 3.1, updates the initial estimates.
unlabeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: