Index of papers in Proc. ACL that mention

unlabeled data

Seen in text as:

unlabeled data (455)
Unlabeled Data (24)
unlabelled data (12)
Unlabeled data (4)

Seen in 473 sentences in 47 papers.

1. Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing

Li, Zhenghua and Zhang, Min and Chen, Wenliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data .
Abstract	With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
Introduction	In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data , have attracted more and more interest.
Introduction	Previously, unlabeled data is explored to derive useful local-context features such as word clusters (Koo et al., 2008), subtree frequencies (Chen et al., 2009; Chen et al., 2013), and word co-occurrence counts (Zhou et al., 2011; Bansal and Klein, 2011).
Introduction	A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009).

unlabeled data is mentioned in 58 sentences in this paper.

Topics mentioned in this paper:

2. Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

Titov, Ivan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain.
Empirical Evaluation	For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains.
Empirical Evaluation	Also, it is important to point out that the SCL method uses auxiliary tasks to induce the shared feature representation, these tasks are constructed on the basis of unlabeled data .
Introduction	In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain.
Introduction	In this paper we focus on a more challenging and arguably more realistic version of the domain-adaptation problem where only unlabeled data is available for the target domain.
Introduction	(2006) use auxiliary tasks based on unlabeled data for both domains (called pivot features) and a dimensionality reduction technique to induce such shared representation.
Learning and Inference	Intuitively, maximizing the likelihood of unlabeled data is closely related to minimizing the reconstruction error, that is training a model to discover such mapping parameters u that z encodes all the necessary information to accurately reproduce :13“) from z for every training example :3“).
The Latent Variable Model	and unlabeled data for the source and target domain {m(l)}l€3UuTU, where SU and TU stand for the unlabeled datasets for the source and target domains, respectively.
The Latent Variable Model	However, given that, first, amount of unlabeled data \|SU U TU\| normally vastly exceeds the amount of labeled data \|SL\| and, second, the number of features for each example \|a3(l)\| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e.

unlabeled data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

3. Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Joint Model with Unlabeled Parallel Text	where y,-’ is the unobserved class label for the i-th instance in the unlabeled data .
A Joint Model with Unlabeled Parallel Text	By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
A Joint Model with Unlabeled Parallel Text	where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data ; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Abstract	Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data ; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Experimental Setup 4.1 Data Sets and Preprocessing	MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Experimental Setup 4.1 Data Sets and Preprocessing	SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Introduction	maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)).
Introduction	To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another.
Related Work	Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009).

unlabeled data is mentioned in 38 sentences in this paper.

Topics mentioned in this paper:

4. Medical Relation Extraction with Manifold Models

Wang, Chang and Fan, James

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	By integrating unlabeled data , the manifold model under setting (1) made a 15% improvement over linear regression model on F1 score, where the improvement was significant across all relations.
Experiments	This tells us that estimating the label of unlabeled examples based upon the sampling result is one way to utilize unlabeled data and may help improve the relation extraction results.
Experiments	On one hand, this result shows that using more unlabeled data can further improve the result.
Identifying Key Medical Relations	Part of the resulting data was manually vetted by our annotators, and the remaining was held as unlabeled data for further experiments.
Introduction	0 From the perspective of relation extraction methodologies, we present a manifold model for relation extraction utilizing both labeled and unlabeled data .
Relation Extraction with Manifold Models	Given a few labeled examples and many unlabeled examples for a relation, we want to build a relation detector leveraging both labeled and unlabeled data .
Relation Extraction with Manifold Models	Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient.
Relation Extraction with Manifold Models	When ,u = 0, the model disregards the unlabeled data , and the data manifold topology is not respected.

unlabeled data is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

5. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains.
Experiments	Figure 4: Effect of source domain unlabeled data .
Experiments	The amount of unlabeled data is held constant, so that any change in classification accu-
Experiments	Figure 5: Effect of target domain unlabeled data .
Introduction	positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains.
Introduction	We use labeled data from multiple source domains and unlabeled data from source and target domains to represent the distribution of features.
Introduction	Unlabeled data is cheaper to collect compared to labeled data and is often available in large quantities.

unlabeled data is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

6. Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification

Li, Shoushan and Huang, Chu-Ren and Zhou, Guodong and Lee, Sophia Yat Mei

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time.
Introduction	Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005).
Introduction	Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification.
Related Work	Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models.
Unsupervised Mining of Personal and Impersonal Views	Semi-supervised learning is a strategy which combines unlabeled data with labeled training data to improve the models.
Unsupervised Mining of Personal and Impersonal Views	The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
Unsupervised Mining of Personal and Impersonal Views	L—impersonal The unlabeled data U containing personal sentence set S l and impersonal sentence set

unlabeled data is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

7. Scaling Semi-supervised Naive Bayes with Feature Marginals

Lucas, Michael and Downey, Doug

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data .
Abstract	However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data , meaning the techniques are not applicable to large corpora being produced today.
Abstract	In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets.
Introduction	Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data , combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006).
Introduction	Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)).
Introduction	Instead of utilizing unlabeled examples directly for each given target concept, our approach is to pre-compute a small set of statistics over the unlabeled data in advance.
Problem Definition	MNB-FM attempts to improve MNB’s estimates of 6:; and 6,; , using statistics computed over the unlabeled data .
Problem Definition	Equation 2 can be estimated in advance, Without knowledge of the target class, simply by counting the number of tokens of each word in the unlabeled data .
Problem Definition	The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates.

unlabeled data is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

8. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions.
Abstract	The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data .
Introduction	Therefore, semi-supervised joint S&T appears to be a natural solution for easily incorporating accessible unlabeled data to improve the joint S&T model.
Introduction	Motivated by the works in (Subramanya et al., 2010; Das and Smith, 2011), for structured problems, graph-based label propagation can be employed to infer valuable syntactic information (n-gram-level label distributions) from labeled data to unlabeled data .
Introduction	labeled and unlabeled data to achieve the semi-supervised learning.
Method	The emphasis of this work is on building a joint S&T model based on two different kinds of data sources, labeled and unlabeled data .
Method	In essence, this learning problem can be treated as incorporating certain gainful information, e.g., prior knowledge or label constraints, of unlabeled data into the supervised model.
Related Work	Sun and Xu (2011) enhanced a CWS model by interpolating statistical features of unlabeled data into the CRFs model.
Related Work	(2011) proposed a semi-supervised pipeline S&T model by incorporating n-gram and lexicon features derived from unlabeled data .
Related Work	Different from their concern, our emphasis is to learn the semi-supervised model by injecting the label information from a similarity graph constructed from labeled and unlabeled data .

unlabeled data is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

9. Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data .
Abstract	The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data .
Introduction	Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model.
Introduction	The crux of solving semi-supervised learning problem is the learning on unlabeled data .
Introduction	Inspired by multi-view learning that exploits redundant views of the same input data (Ganchev et al., 2008), this paper proposes a semi-supervised CWS model of co-regularizing from two different views (intrinsically two different models), character-based and word-based, on unlabeled data .
Semi-supervised Learning via Co-regularizing Both Models	As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data .
Semi-supervised Learning via Co-regularizing Both Models	Obviously, the learning on unlabeled data does not come for “free”.
Semi-supervised Learning via Co-regularizing Both Models	Very often, it is necessary to discover certain gainful information, e.g., label constraints of unlabeled data , that is incorporated to guide the learner toward a desired solution.

unlabeled data is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

10. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In addition to labelled data, a large amount of unlabelled data on the web domain is also provided.
Experiments	about labelled and unlabelled data are summarized in Table 1 and Table 2, respectively.
Experiments	unlabelled data .
Introduction	The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data).
Introduction	The idea of learning representations from unlabelled data and then fine-tuning a model with such representations according to some supervised criterion has been studied before (Turian et al., 2010; Collobert et al., 2011; Glorot et al., 2011).
Related Work	(2011) propose to learn representations from the mixture of both source and target domain unlabelled data to improve cross-domain sentiment classification.
Related Work	Such high dimensional input gives rise to high computational cost and it is not clear whether those approaches can be applied to large scale unlabelled data , with hundreds of millions of training examples.
Related Work	The new representations are induced based on the auxiliary tasks defined on unlabelled data together with a dimensionality reduction technique.

unlabeled data is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

11. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

Ma, Xuezhe and Xia, Fei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	the dependency annotations off the training portion of each treebank, and use that as the unlabeled data for that target language.
Experiments	-U: Our approach training on only parallel data without unlabeled data for the target language.
Experiments	+U: Our approach training on both parallel and unlabeled data .
Introduction	We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data .
Our Approach	Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
Our Approach	However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser.
Our Approach	In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.

unlabeled data is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

12. A Graph-based Semi-Supervised Learning for Question-Answering

Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA.
Abstract	We create a graph for labeled and unlabeled data using match-scores of textual entailment features as similarity weights between data points.
Experiments	We show that as we increase the number of unlabeled data , with our graph-summarization, it is feasible to extract information that can improve the performance of QA models.
Experiments	In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >\|< 711;, 5% >\|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data.
Experiments	Table 3: The effect of number of unlabeled data on MRR from Hybrid graph Summarization SSL.
Graph Summarization	Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
Graph Summarization	Since we do not know estimated local density constraints of unlabeled data points, we use constants to construct local density constraint column vector for X’ dataset as follows:
Introduction	Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
Introduction	We consider situations where there are much more unlabeled data , X U, than labeled data, X L, i.e., 711; << 711].

unlabeled data is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

13. Semi-Supervised Convex Training for Dependency Parsing

Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Following the common theme of “more data is better data” we also use both a limited labeled corpora and a plentiful unlabeled data resource.
Introduction	(2006a) successfully applied self-training to parsing by exploiting available unlabeled data , and obtained remarkable results when the same technique was applied to parser adaptation (McClosky et al., 2006b).
Introduction	However, the standard objective of an S3VM is non-convex on the unlabeled data , thus requiring sophisticated global optimization heuristics to obtain reasonable solutions.
Semi-supervised Convex Training for Structured SVM	for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data .
Semi-supervised Structured Large Margin Objective	The objective of standard semi-supervised structured SVM is a combination of structured large margin losses on both labeled and unlabeled data .
Semi-supervised Structured Large Margin Objective	We introduce an efficient approximation—least squares loss—for the structured large margin loss on unlabeled data below.

unlabeled data is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

14. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm

Jeon, Je Hun and Liu, Yang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method.
Abstract	In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection	Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data .
Co-training strategy for prosodic event detection	There are two issues: (1) the accurate self-labeling method for unlabeled data and (2) effective heuristics to se-
Co-training strategy for prosodic event detection	Given a set L of labeled training data and a set U of unlabeled data Randomly select U’ from U, \|U’ \|=u while iteration < k do Use L to train classifiers h] and [12 Apply h] and 112 to assign labels for all examples in U’ Select n self-labeled samples and add to L Remove these it samples from U Recreate U’ by choosing u instances randomly from U
Conclusions	We introduced a confidence-based method to assign possible labels to unlabeled data and evaluated the performance combined with informative sample selection methods.
Conclusions	This suggests that the use of unlabeled data can lead to significant improvement for prosodic event detection.
Experiments and results	Our goal is to determine whether the co-training algorithm described above could successfully use the unlabeled data for prosodic event detection.
Introduction	We propose a confidence-based method to assign labels to unlabeled data in training iterations and evaluate its performance combined with different informative sample selection methods.
Introduction	Our experiments on the Boston Radio News corpus show that the use of unlabeled data can lead to significant improvement of prosodic event detection compared to using the original small training set, and that the semi-supervised learning result is comparable with supervised learning with similar amount of training data.

unlabeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

15. Robust Domain Adaptation for Relation Extraction via Clustering Consistency

Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our framework leverages on both labeled and unlabeled data in the target domain.
Abstract	To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain.
Experiments	negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data .
Introduction	In the first phase, Supervised Voting is used to determine the relevance of each source domain to each region in the target domain, using both labeled and unlabeled data in the target domain.
Introduction	By using also unlabeled data , we alleviate the lack of labeled samples for rarer relations due to imbalanced distributions in relation types.
Introduction	reasonable predictive performance even when all the source domains are irrelevant and augments the rarer classes with examples in the unlabeled data .
Problem Statement	:1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector, yi is the corresponding label (if available).
Robust Domain Adaptation	With data from both the labeled and unlabeled data sets, we apply transductive inference or semi-supervised learning (Zhou et al., 2003) to achieve both (i) and (ii).
Robust Domain Adaptation	By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data.
Robust Domain Adaptation	Here, we have multiple objectives: the first term controls the training error; the second regularizes the complexity of the functions f js in the Reproducing Kernel Hilbert Space (RKHS) H; and the third prefers the predicted labels of the unlabeled data D; to be close to the reference predictions.

unlabeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

16. Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

Fukumoto, Fumiyo and Suzuki, Yoshimi and Matsuyoshi, Suguru

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data .
Conclusion	The research described in this paper involved text classification using positive and unlabeled data .
Experiments	The rest of the positive and negative documents are used as unlabeled data .
Experiments	The number of positive training data in other three methods depends on the value of 6, and the rest of the positive and negative documents were used as unlabeled data .
Experiments	Our goal is to achieve classification accuracy from only positive documents and unlabeled data as high as that from labeled positive and negative data.
Framework of the System	First, we randomly select documents from unlabeled data (U) where the number of documents is equal to that of the initial positive training documents (Pl).
Framework of the System	For the result of correction (R001, we train SVM classifiers, and classify the remaining unlabeled data (U \ N1).
Introduction	Several authors have attempted to improve classification accuracy using only positive and unlabeled data (Yu et al., 2002; Ho et al., 2011).
Introduction	Our goal is to eliminate the need for manually collecting training documents, and hopefully achieve classification accuracy from positive and unlabeled data as high as that from labeled positive and labeled negative data.
Introduction	Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data , and add the classification results to the training data.

unlabeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

SVM (23)
unlabeled data (10)
F-score (6)

17. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	As for unlabeled data we crawled the web and collected around 100,000 questions that are similar in style and length to the ones in QuestionBank, e.g.
Experiments	A CRF model is used to decode the unlabeled data to generate more labeled examples for retraining.
Experiments	smooth the semantic tag posteriors of a unlabeled data decoded by the CRF model using Eq.
Introduction	To the best of our knowledge, our work is the first to explore the unlabeled data to iteratively adapt the semantic tagging models for target domains, preserving information from the previous iterations.
Related Work and Motivation	Adapting the source domain using unlabeled data is the key to achieving good performance across domains.
Semi-Supervised Semantic Labeling	The last term is the loss on unlabeled data from target domain with a hyper-parameter 7'.
Semi-Supervised Semantic Labeling	After we decode the unlabeled data , we retrain a new CRF model at each iteration.
Semi-Supervised Semantic Labeling	Each iteration makes predictions on the semantic tags of unlabeled data with varying posterior probabilities.

unlabeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

18. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	First, the proposed model can learn previously unseen sentiment words from large unlabeled data , which are not covered by the limited vocabulary in machine translation of the labeled data.
Cross-Lingual Mixture Model for Sentiment Classification	where A and AWL) are weighting factor to control the influence of the unlabeled data .
Cross-Lingual Mixture Model for Sentiment Classification	We set A, (AND) to A, (At) when d, belongs to unlabeled data , 1 otherwise.
Cross-Lingual Mixture Model for Sentiment Classification	When d, belongs to unlabeled data , P(cj is computed according to Equation 5 or 6.
Experiment	Larger weights indicate larger influence from the unlabeled data .
Experiment	Second, the two classifiers make prediction on Chinese unlabeled data and their English translation,
Experiment	Figure 3: Accuracy with different size of unlabeled data for NTICR-EN+NTCIR-CH
Related Work	After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification.

unlabeled data is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

19. Active Learning for Multilingual Statistical Machine Translation

Haffari, Gholamreza and Sarkar, Anoop

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

AL-SMT: Multilingual Setting	(Ueffing et al., 2007; Haffari et al., 2009) show that treating [0+ as a source for a new feature function in a log-linear model for SMT (Och and Ney, 2004) allows us to maximally take advantage of unlabeled data by finding a weight for this feature using minimum error-rate training (MERT) (Och, 2003).
Introduction	In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data .
Related Work	(Reichart et al., 2008) introduces multitask active learning where unlabeled data require annotations for multiple tasks, e. g. they consider named-entities and parse trees, and showed that multiple tasks helps selection compared to individual tasks.
Sentence Selection: Multiple Language Pairs	Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task.
Sentence Selection: Single Language Pair	The more frequent a phrase (not a phrase pair) is in the unlabeled data , the more important it is to
Sentence Selection: Single Language Pair	know its translation; since it is more likely to see it in test data (specially when the test data is in-domain with respect to unlabeled data ).
Sentence Selection: Single Language Pair	In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?

unlabeled data is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

20. Cross-Language Text Classification Using Structural Correspondence Learning

Prettenhofer, Peter and Stein, Benno

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling.
Cross-Language Structural Correspondence Learning	Note that the support of 2125 and 2127 can be determined from the unlabeled data Du.
Cross-Language Structural Correspondence Learning	Input: Labeled source data D5 Unlabeled data Du 2 D5,” U D1,,
Experiments	Due to the use of task-specific, unlabeled data , relevant characteristics are captured by the pivot classifiers.
Experiments	Unlabeled Data The first row of Figure 2 shows the performance of CL-SCL as a function of the ratio of labeled and unlabeled documents.
Related Work	In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
Related Work	SCL then models the correlation between the pivots and all other features by training linear classifiers on the unlabeled data from both domains.
Related Work	Ando and Zhang (2005b) present a semi-supervised learning method based on this paradigm, which generates related tasks from unlabeled data .

unlabeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

21. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	By using unlabelled data to reduce data sparsity in the labeled training data, semi-supervised approaches improve generalization accuracy.
Unlabled Data	Unlabeled data is used for inducing the word representations.
Unlabled Data	(2009), we found that all word representations performed better on the supervised task when they were induced on the clean unlabeled data , both embeddings and Brown clusters.
Unlabled Data	Note that cleaning is applied only to the unlabeled data , not to the labeled data used in the supervised tasks.

unlabeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

22. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger.
Capturing Paradigmatic Relations via Word Clustering	Brown Clustering Our first choice is the bottom-up agglomerative word clustering algorithm of (Brown et al., 1992) which derives a hierarchical clustering of words from unlabeled data .
Capturing Paradigmatic Relations via Word Clustering	The large-scale unlabeled data we use in our experiments comes from the Chinese Gigaword (LDC2005T14).
Capturing Paradigmatic Relations via Word Clustering	Furthermore, as shown in (Sun and Xu, 2011), appropriate string knowledge acquired from large-scale unlabeled data can significantly enhance a supervised model, especially for the prediction of out-of-vocabulary (OOV) words.
Introduction	First, we employ unsupervised word clustering to explore paradigmatic relations that are encoded in large-scale unlabeled data .

unlabeled data is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

23. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	by the character-based alignment (VES-NO-GP), and the graph propagation (VES-GP—PL), are regarded as virtual evidences to bias CRFs model’s learning on the unlabeled data .
Methodology	Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence.
Methodology	In our setting, the CRFs model is required to learn from unlabeled data .
Methodology	This work employs the posterior regularization (PR) framework3 (Ganchev et al., 2010) to bias the CRFs model’s learning on unlabeled data , under a constraint encoded by the graph propagation expression.
Related Work	(2014), proposed GP for inferring the label information of unlabeled data , and then leverage these GP outcomes to learn a semi-supervised scalable model (e. g., CRFs).
Related Work	One of our main objectives is to bias CRFs model’s learning on unlabeled data , under a nonlinear GP constraint encoding the bilingual knowledge.
Related Work	(2008) described constraint driven learning (CODL) that augments model learning on unlabeled data by adding a cost for violating expectations of constraint features designed by domain knowledge.

unlabeled data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

24. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification

Dasgupta, Sajib and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data .
Evaluation	This could be attributed to (l) the unlabeled data , which may have provided the transductive learner with useful information that are not accessible to the other learners, and (2) the ensemble, which is more noise-tolerant to the imperfect seeds.
Evaluation	Specifically, we used the 500 seeds to guide the selection of active leam-ing points, but trained a transductive SVM using only the active learning points as labeled data (and the rest as unlabeled data ).
Our Approach	Given that we now have a labeled set (composed of 100 manually labeled points selected by active learning and 500 unambiguous points) as well as a larger set of points that are yet to be labeled (i.e., the remaining unlabeled points in the training folds and those in the test fold), we aim to train a better classifier by using a weakly supervised learner to learn from both the labeled and unlabeled data .
Our Approach	points in Lj, where i 75 j) as unlabeled data .
Our Approach	Since the points in the test fold are included in the unlabeled data , they are all classified in this step.

unlabeled data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

25. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Related Work	In earlier work on semi-supervised learning, e. g., (Blum and Mitchell 1998), the classifiers learned from unlabeled data were used directly.
Discussion and Related Work	Recent research shows that it is better to use whatever is learned from the unlabeled data as features in a discriminative classifier.
Discussion and Related Work	Wong and Ng (2007) and Suzuki and Isozaki (2008) are similar in that they run a baseline discriminative classifier on unlabeled data to generate pseudo examples, which are then used to train a different type of classifier for the same problem.
Introduction	two-stage strategy: first create word clusters with unlabeled data and then use the clusters as features in supervised training.

unlabeled data is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

26. A Joint Model for Discovery of Aspects in Utterances

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	At each random selection, the rest of the utterances are used as unlabeled data to boost the performance of MCM.
Experiments	Being Bayesian, our model can incorporate unlabeled data at training time.
Experiments	Here, we evaluate the performance gain on domain, act and slot predictions as more unlabeled data is introduced at learning time.

unlabeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

27. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.
Experiments of Parsing	4.3 Using Unlabeled Data for Parsing
Experiments of Parsing	Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing	2000) (PDC) as unlabeled data for parsing.

unlabeled data is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

28. Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization

Yang, Bishan and Cardie, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the supervised setting, we treated the test data as unlabeled data and performed transductive learning.
Experiments	In the semi-supervised setting, our unlabeled data consists of
Experiments	both the available unlabeled data and the test data.
Introduction	In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.

unlabeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

29. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	N0 peak in performance is reached, so further improvements are possible with more unlabeled data .
Experiments	Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work	Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data.
Related Work	Unlike these systems, our efforts are aimed at using unlabeled data to find distributional representations that work well on rare terms, making the supervised systems more applicable to other domains and decreasing their sample complexity.
Smoothing Natural Language Sequences	Using Expectation-Maximization (Dempster et al., 1977), it is possible to estimate the distributions for and P \|yi_1) from unlabeled data .

unlabeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

30. Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	To keep the experiment tractable, we first randomly choose 50,000 of all the texts as unlabeled data , which contain 2,420,037 characters.
Experiment	We also experimented on different size of unlabeled data to evaluate the performance when adding unlabeled target domain data.
Experiment	TABLE 5 shows different f-scores and OOV—Recalls on different unlabeled data set.

unlabeled data is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition

Lan, Man and Xu, Yu and Niu, Zhengyu

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Implementation Details of Multitask Learning Method	BLLIP North American News Text (Complete) is used as unlabeled data source to generate synthetic labeled data.
Related Work	Due to the lack of benchmark data for implicit discourse relation analysis, earlier work used unlabeled data to generate synthetic implicit discourse data.
Related Work	Research work in this category exploited both labeled and unlabeled data for discourse relation prediction.
Related Work	(Hernault et al., 2010) presented a semi-supervised method based on the analysis of co-occurring features in labeled and unlabeled data .

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

32. A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge

Li, Tao and Zhang, Yi and Sindhwani, Vikas

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents.
Conclusion	To more effectively utilize unlabeled data and induce domain-specific adaptation of our models, several extensions are possible: facilitating learning from related domains, incorporating hyperlinks between documents, incorporating synonyms or co-occurences between words etc.
Incorporating Lexical Knowledge	It should be noted, that this list was constructed without a specific domain in mind; which is further motivation for using training examples and unlabeled data to learn domain specific connotations.
Related Work	The goal of the former theme is to learn from few labeled examples by making use of unlabeled data , while the goal of the latter theme is to utilize weak prior knowledge about term-class affinities (e.g., the term “awful” indicates negative sentiment and therefore may be considered as a negatively labeled feature).

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

33. Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

Volkova, Svitlana and Wilson, Theresa and Yarowsky, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Starting with a domain-independent, high-precision sentiment lexicon and a large pool of unlabeled data , we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process.
Lexicon Bootstrapping	To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process.
Lexicon Bootstrapping	On each iteration i 2 1, tweets in the unlabeled data are labeled using the lexicon
Related Work	Corpus-based methods extract subjectivity and sentiment lexicons from large amounts of unlabeled data using different similarity metrics to measure the relatedness between words.

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. Semi-Supervised Cause Identification from Aviation Safety Reports

Persing, Isaac and Ng, Vincent

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data .
Evaluation	To get a sense of the accuracy of the bootstrapped documents without further manual labeling, recall that our experimental setup resembles a transductive setting where the test documents are part of the unlabeled data , and consequently, some of them may have been automatically labeled by the bootstrapping algorithm.
Our Bootstrapping Algorithm	To alleviate the data scarcity problem and improve the accuracy of the classifiers, we propose in this section a bootstrapping algorithm that automatically augments a training set by exploiting a large amount of unlabeled data .
Related Work	Minority classes can be expanded without the availability of unlabeled data as well.

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

35. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Related Work	Typical domain adaptation tasks often assume annotated data in new domain absent or insufficient and a large scale unlabeled data available.
The Related Work	As unlabeled data are concerned, semi-supervised or unsupervised methods will be naturally adopted.
The Related Work	The first is usually focus on exploiting automatic generated labeled data from the unlabeled data (Steedman et al., 2003; McClosky et al., 2006; Reichart and Rappoport, 2007; Sagae and Tsujii, 2007; Chen et al., 2008), the second is on combining supervised and unsupervised methods, and only unlabeled data are considered (Smith and Eisner, 2006; Wang and Schuurmans, 2008; Koo et al., 2008).

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

36. Learning Better Data Representation Using Inference-Driven Metric Learning

Dhillon, Paramveer S. and Talukdar, Partha Pratim and Crammer, Koby

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A <— METRICLEARNER(X, 3,1?)	In this case, we treat the set of test instances (without their gold labels) as the unlabeled data .
Introduction	IDML-IT (Dhillon et al., 2010) is another such method which exploits labeled as well as unlabeled data during metric learning.
Metric Learning	The ITML metric learning algorithm, which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data .
Metric Learning	In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined.

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. Crosslingual Induction of Semantic Roles

Titov, Ivan and Klementiev, Alexandre

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	An inference algorithm for an unsupervised model should be efficient enough to handle vast amounts of unlabeled data , as it can easily be obtained and is likely to improve results.
Introduction	This suggests that the models scale to much larger corpora, which is an important property for a successful unsupervised learning method, as unlabeled data is abundant.
Related Work	Most of the SRL research has focused on the supervised setting, however, lack of annotated resources for most languages and insufficient coverage provided by the existing resources motivates the need for using unlabeled data or other forms of weak supervision.
Related Work	This includes methods based on graph alignment between labeled and unlabeled data (Furstenau and Lapata, 2009), using unlabeled data to improve lexical generalization (Deschacht and Moens, 2009), and projection of annotation across languages (Pado and Lapata, 2009; van der Plas et al., 2011).

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Recognizing Named Entities in Tweets

LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	(2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data.
Related Work	Semi-supervised learning exploits both labeled and unlabeled data .
Related Work	It proves useful when labeled data is scarce and hard to construct while unlabeled data is abundant and easy to access.
Related Work	Another representative of semi-supervised learning is learning a robust representation of the input from unlabeled data .

unlabeled data is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

CRF (35)
NER (32)
semi-supervised (24)

39. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	Previous work found that the performance can be improved by pre-training the character embeddings on large unlabeled data and using the obtained embeddings to initialize the character lookup table instead of random initialization
Experiment	There are several ways to learn the embeddings on unlabeled data .
Experiment	(2013), the bigram embeddings are pre-trained on unlabeled data with character embeddings, which significantly improves the model performance.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

40. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We also compare our method with the semi-supervised approaches, the semi-supervised approaches achieved very high accuracies by leveraging on large unlabeled data directly into the systems for joint learning and decoding, while in our method, we only explore the N-gram features to further improve supervised dependency parsing performance.
Experiments	Some previous studies also found a log-linear relationship between unlabeled data (Suzuki and Isozaki, 2008; Suzuki et al., 2009; Bergsma et al., 2010; Pitler et al., 2010).
Web-Derived Selectional Preference Features	All of our selectional preference features described in this paper rely on probabilities derived from unlabeled data .

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

41. How Well can We Learn Interpretable Entity Types from Text?

Hovy, Dirk

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The results suggest that it is possible to learn domain-specific entity types from unlabeled data .
Conclusion	We evaluated an approach to learning domain-specific interpretable entity types from unlabeled data .
Introduction	0 we empirically evaluate an approach to learning types from unlabeled data

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

42. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	We utilize unlabeled data in both generative and discriminative models for dependency syntax and in generative word clustering.
Discussion and Future Work	Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
Discussion and Future Work	We observe that careful use of these unlabeled data resources can improve performance on the end task.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

43. Re-embedding words

Labutov, Igor and Lipson, Hod

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data .
Introduction	Traditionally, we would learn the embeddings for the target task jointly with whatever unlabeled data we may have, in an instance of semi-supervised learning, and/or we may leverage labels from multiple other related tasks in a multitask approach.
Related Work	Our method is different in that the (potentially) massive amount of unlabeled data is not required a-priori, but only the resultant embedding.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

44. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In general, semi-supervised learning can be motivated by two concerns: first, given a fixed amount of supervised data, we might wish to leverage additional unlabeled data to facilitate the utilization of the supervised corpus, increasing the performance of the model in absolute terms.
Introduction	Second, given a fixed target performance level, we might wish to use unlabeled data to reduce the amount of annotated data necessary to reach this target.
Related Work	Crucially, however, these methods do not exploit unlabeled data when leam-ing their representations.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

45. Bootstrapping coreference resolution using word associations

Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Co-training puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance.
Results and Discussion	For an unsupervised approach, which only needs unlabeled data , there is little cost to creating large training sets.
System Architecture	Unlabeled Data

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. Semi-Supervised SimHash for Efficient Document Similarity Search

Jiang, Qixia and Sun, Maosong

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	This is implemented by maximizing the empirical accuracy on the prior knowledge (labeled data) and the entropy of hash functions (estimated over labeled and unlabeled data ).
Semi-Supervised SimHash	Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data, c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data .
Semi-Supervised SimHash	the labeled and unlabeled data , 26,; and XU.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Dictionary Definitions based Homograph Identification using a Generative Hierarchical Model

Kulkarni, Anagha and Callan, Jamie

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	The set of 3,123 words that were not annotated was the unlabeled data for the EM algorithm.
Experiments and Results	Our post-experimental analysis reveals that the parameter updation process using the unlabeled data has an effect of overly separating the two overlapping distributions.
Finding the Homographs in a Lexicon	In Model II, the semi-supervised setup, the training data is used to initialize the Expectation-Maximization (EM) algorithm (Dempster et al., 1977) and the unlabeled data , described in Section 3.1, updates the initial estimates.

unlabeled data is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: