Abstract | Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. |
Abstract | In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. |
Introduction | Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006). |
Introduction | Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)). |
Problem Definition | We consider a semi-supervised classification task, in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, . |
Problem Definition | Our semi-supervised technique utilizes statistics computed over the labeled corpus, denoted as follows. |
Problem Definition | In addition to Multinomial Naive Bayes (discussed in Section 3), we evaluate against a variety of supervised and semi-supervised techniques from previous work, which provide a representation of the state of the art. |
Abstract | This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. |
Abstract | Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task. |
Introduction | Therefore, semi-supervised joint S&T appears to be a natural solution for easily incorporating accessible unlabeled data to improve the joint S&T model. |
Introduction | This study focuses on using a graph-based label propagation method to build a semi-supervised joint S&T model. |
Introduction | labeled and unlabeled data to achieve the semi-supervised learning. |
Related Work | There are few explorations of semi-supervised approaches for CWS or POS tagging in previous works. |
Related Work | (2008) described a Bayesian semi-supervised CWS model by considering the segmentation as the hidden variable in machine translation. |
Related Work | (2011) proposed a semi-supervised pipeline S&T model by incorporating n-gram and lexicon features derived from unlabeled data. |
Abstract | This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. |
Abstract | The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature. |
Experiment | The line of “ours” reports the performance of our semi-supervised model with the tuned parameters. |
Experiment | It can be observed that our semi-supervised model is able to benefit from unlabeled data and greatly improves the results over the supervised baseline. |
Experiment | We also compare our model with two state-of-the-art semi-supervised methods of Wang ’11 (Wang et al., 2011) and Sun ’11 (Sun and Xu, 2011). |
Introduction | This naturally provides motivation for using easily accessible raw texts to enhance supervised CWS models, in semi-supervised approaches. |
Introduction | In the past years, however, few semi-supervised CWS models have been proposed. |
Introduction | (2008) described a Bayesian semi-supervised model by considering the segmentation as the hidden variable in machine translation. |
Semi-supervised Learning via Co-regularizing Both Models | As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data. |
Experiments | Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser. |
Experiments | Based on the extended parser, we experimented different types of semi-supervised features by adding the features incrementally. |
Experiments | By comparing the results in Table 5 and the results in Table 6 we can see that the semi-supervised features achieve an overall improvement of 1.0% on the English data and an im- |
Introduction | In addition to the above contributions, we apply a variety of semi-supervised learning techniques to our transition-based parser. |
Introduction | Experimental results show that semi-supervised methods give a further improvement of 0.9% in F-score on the English data and 2.4% on the Chinese data. |
Semi-supervised Parsing with Large Data | 4.4 Semi-supervised Features |
Abstract | To deal with these issues, we describe an efficient semi-supervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). |
Abstract | Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging. |
Introduction | To deal with these issues, we present a new semi-supervised learning (SSL) approach, which mainly has two components. |
Related Work and Motivation | (I) Semi-Supervised Tagging. |
Related Work and Motivation | 0 (Wang et al., 2009; Li et al., 2009; Li, 2010; Liu et al., 2011) investigate web query tagging using semi-supervised sequence models. |
Semi-Supervised Semantic Labeling | 4.2 Retrospective Semi-Supervised CRF |
Semi-Supervised Semantic Labeling | Algorithm 2 Retrospective Semi-Supervised CRF Input: Labeled Lil, and unlabeled Ll“ data. |
Abstract | While a variety of semi-supervised methods exist for training from incomplete data, there are open questions regarding what types of training data should be used and how much is necessary. |
Abstract | Our results show that annotation of word types is the most important, provided a sufficiently capable semi-supervised learning infrastructure is in place to project type information onto a raw corpus. |
Conclusions and Future Work | Most importantly, it is clear that type annotations are the most useful input one can obtain from a linguist—provided a semi-supervised algorithm for projecting that information reliably onto raw tokens is available. |
Data | While we do not explore a rule-writing approach to POS-tagging, we do consider the impact of rule-based morphological analyzers as a component in our semi-supervised POS-tagging system. |
Experiments3 | In addition to annotations, semi-supervised tagger training requires a corpus of raw text. |
Introduction | The overwhelming take away from our results is that type supervision—when backed by an effective semi-supervised learning approach—is the most important source of linguistic information. |
Abstract | Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. |
Conclusion | Since a large labeled sentiment relevance resource does not yet exist, we investigated semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data. |
Distant Supervision | Since a large labeled resource for sentiment relevance classification is not yet available, we investigate semi-supervised methods for creating sentiment relevance classifiers. |
Introduction | For this reason, we investigate two semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data. |
Transfer Learning | To address the problem that we do not have enough labeled SR data we now investigate a second semi-supervised method for SR classification, transfer learning (TL). |
Experiments | resorting to complicated features, system combination and other semi-supervised technologies. |
Related Work | Lots of efforts have been devoted to semi-supervised methods in sequence labeling and word segmentation (Xu et al., 2008; Suzuki and Isozaki, 2008; Haffari and Sarkar, 2008; Tomanek and Hahn, 2009; Wang et al., 2011). |
Related Work | A semi-supervised method tries to find an optimal hyperplane of both annotated data and raw data, thus to result in a model with better coverage and higher accuracy. |
Related Work | It is fundamentally different from semi-supervised and unsupervised methods in that we aimed to excavate a totally different kind of knowledge, the natural annotations implied by the structural information in web text. |
Multitask Learning for Discourse Relation Prediction | ASO has been shown to be useful in a semi-supervised learning configuration for several NLP applications, such as, text chunking (Ando and Zhang, 2005b) and text classification (Ando and Zhang, 2005a). |
Related Work | 2.1.3 Semi-supervised approaches |
Related Work | (Hernault et al., 2010) presented a semi-supervised method based on the analysis of co-occurring features in labeled and unlabeled data. |
Related Work | Very recently, (Hernault et al., 2011) introduced a semi-supervised work using structure learning method for discourse relation classification, which is quite relevant to our work. |
Experiment | Another baseline is Li and Sun (2009), which also uses punctuation in their semi-supervised framework. |
INTRODUCTION | We build a semi-supervised learning (SSL) framework which can iteratively incorporate newly labeled instances from unlabeled micro-blog data during the training process. |
Related Work | Meanwhile semi-supervised methods have been applied into NLP applications. |
Related Work | Similar semi-supervised applications include Shen et al. |
Introduction | Quite a lot of learning techniques e. g., semi-supervised learning, self-training, and active learning have been proposed. |
Introduction | proposed a semi-supervised learning approach called the Graph Mincut algorithm which uses a small number of positive and negative examples and assigns values to unlabeled examples in a way that optimizes consistency in a nearest-neighbor sense (Blum et al., 2001). |
Introduction | Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data. |
Introduction | Traditionally, we would learn the embeddings for the target task jointly with whatever unlabeled data we may have, in an instance of semi-supervised learning, and/or we may leverage labels from multiple other related tasks in a multitask approach. |
Related Work | Embeddings are learned in a semi-supervised fashion, and the components of the embedding are given an explicit probabilistic interpretation. |
Related Work | In machine learning literature, joint semi-supervised embedding takes form in methods such as the LaplacianSVM (LapSVM) (Belkin et al., 2006) and Label Propogation (Zhu and Ghahra—mani, 2002), to which our approach is related. |
Conclusion | A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model. |
Introduction | We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework. |
Model | JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions. |