Abstract | To tackle these challenges, we propose a novel semi-supervised graph regularization model to incorporate both local and global evidence from multiple tweets through three fine-grained relations. |
Introduction | Therefore, it is challenging to create sufficient high quality labeled tweets for supervised models and worth considering semi-supervised learning with the exploration of unlabeled data. |
Introduction | However, when selecting semi-supervised learning frameworks, we noticed another unique challenge that tweets pose to wikification due to their informal writing style, shortness and noisiness. |
Introduction | Therefore, a collective inference model over multiple tweets in the semi-supervised setting is desirable. |
Principles and Approach Overview | The label assignment is obtained by our semi-supervised graph regularization framework based on a relational graph, which is constructed from local compatibility, coreference, and semantic relatedness relations. |
Relational Graph Construction | They con-:rol the contributions of these three relations in our semi-supervised graph regularization model. |
Relational Graph Construction | (ii) It is more appropriate for our graph-based semi-supervised model since it is difficult to assign labels to a pair of mention and concept in the referent graph. |
Semi-supervised Graph Regularization | We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003): |
Empirical Evaluation | In this section, we consider the semi-supervised setup, and present evaluation of our approach on on the problem of aligning weather forecast reports to the formal representation of weather. |
Empirical Evaluation | Only then, in the semi-supervised learning scenarios, we added unlabeled data and ran 5 additional iterations of EM. |
Empirical Evaluation | We compare our approach (Semi-superv, non-contr) with two baselines: the basic supervised training on 100 labeled forecasts (Supervised BL) and with the semi-supervised training which disregards the non-contradiction relations (Semi-superv BL). |
Inference with NonContradictory Documents | However, in a semi-supervised or unsupervised case variational techniques, such as the EM algorithm (Dempster et al., 1977), are often used to estimate the model. |
Introduction | Such annotated resources are scarce and expensive to create, motivating the need for unsupervised or semi-supervised techniques (Poon and Domingos, 2009). |
Introduction | This compares favorably with 69.1% shown by a semi-supervised learning approach, though, as expected, does not reach the score of the model which, in training, observed semantics states for all the 750 documents (77.7% F1). |
Summary and Future Work | Our approach resulted in an improvement over the scores of both the supervised baseline and of the traditional semi-supervised leam-ing. |
Abstract | This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training. |
Abstract | Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training. |
Ambiguity-aware Ensemble Training | In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard. |
Ambiguity-aware Ensemble Training | We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers. |
Experiments and Analysis | For the semi-supervised parsers trained with Algorithm 1, we use N1 = 20K and M1 = 50K for English, and N1 = 15K and M1 = 50K for Chinese, based on a few preliminary experiments. |
Experiments and Analysis | For semi-supervised cases, one iteration takes about 2 hours on an IBM server having 2.0 GHz Intel Xeon CPUs and 72G memory. |
Introduction | In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data, have attracted more and more interest. |
Introduction | To solve above issues, this paper proposes a more general and effective framework for semi-supervised dependency parsing, referred to as ambiguity-aware ensemble training. |
Introduction | We propose a generalized ambiguity-aware ensemble training framework for semi-supervised dependency parsing, which can |
Abstract | In this paper, we adopt two views, personal and impersonal views, and systematically employ them in both supervised and semi-supervised sentiment classification. |
Abstract | On this basis, an ensemble method and a co—training algorithm are explored to employ the two views in supervised and semi-supervised sentiment classification respectively. |
Introduction | Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time. |
Introduction | Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005). |
Introduction | In this paper, we systematically employ personal/impersonal views in supervised and semi-supervised sentiment classification. |
Related Work | Generally, document-level sentiment classification methods can be categorized into three types: unsupervised, supervised, and semi-supervised . |
Related Work | Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models. |
Related Work | Compared to the supervised and unsupervised methods, semi-supervised methods for sentiment classification are relatively new and have much less related studies. |
Abstract | We consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain. |
Constraints on Inter-Domain Variability | As we discussed in the introduction, our goal is to provide a method for domain adaptation based on semi-supervised learning of models with distributed representations. |
Constraints on Inter-Domain Variability | In this section, we first discuss the shortcomings of domain adaptation with the above-described semi-supervised approach and motivate constraints on inter-domain variability of |
Empirical Evaluation | For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains. |
Empirical Evaluation | All the methods, supervised and semi-supervised , are based on the model described in Section 2. |
Empirical Evaluation | This does not seem to have an adverse effect on the accuracy but makes learning very efficient: the average training time for the semi-supervised methods was about 20 minutes on a standard PC. |
Introduction | The danger of this semi-supervised approach in the domain-adaptation setting is that some of the latent variables will correspond to clusters of features specific only to the source domain, and consequently, the classifier relying on this latent variable will be badly affected when tested on the target domain. |
Related Work | Various semi-supervised techniques for domain-adaptation have also been considered, one example being self-training (McClosky et al., 2006). |
Related Work | Semi-supervised leam-ing with distributed representations and its application to domain adaptation has previously been considered in (Huang and Yates, 2009), but no attempt has been made to address problems specific to the domain-adaptation setting. |
Abstract | Using the unsupervised pre-trained deep belief net (DBN) to initialize DAE’s parameters and using the input original phrase features as a teacher for semi-supervised fine-tuning, we learn new semi-supervised DAE features, which are more effective and stable than the unsupervised DBN features. |
Abstract | On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively. |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model. |
Introduction | By using the input data as the teacher, the “semi-supervised” fine-tuning process of DAE addresses the problem of “back-propagation without a teacher” (Rumelhart et al., 1986), which makes the DAB learn more powerful and abstract features (Hinton and Salakhutdinov, 2006). |
Introduction | For our semi-supervised DAE feature learning task, we use the unsupervised pre-trained DBN to initialize DAE’s parameters and use the input original phrase features as the “teacher” for semi-supervised back-propagation. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Figure 2: After the unsupervised pre-training, the DBNs are “unrolled” to create a semi-supervised DAE, which is then fine-tuned using back-propagation of error derivatives. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To learn a semi-supervised DAE, we first “unroll” the above 11 layer DBN by using its weight matrices to create a deep, 2n-l layer network whose lower layers use the matrices to “encode” the input and whose upper layers use the matrices in reverse order to “decode” the input (Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2009; Deng et al., 2010), as shown in Figure 2. |
Abstract | This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. |
Abstract | The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature. |
Experiment | The line of “ours” reports the performance of our semi-supervised model with the tuned parameters. |
Experiment | It can be observed that our semi-supervised model is able to benefit from unlabeled data and greatly improves the results over the supervised baseline. |
Experiment | We also compare our model with two state-of-the-art semi-supervised methods of Wang ’11 (Wang et al., 2011) and Sun ’11 (Sun and Xu, 2011). |
Introduction | This naturally provides motivation for using easily accessible raw texts to enhance supervised CWS models, in semi-supervised approaches. |
Introduction | In the past years, however, few semi-supervised CWS models have been proposed. |
Introduction | (2008) described a Bayesian semi-supervised model by considering the segmentation as the hidden variable in machine translation. |
Semi-supervised Learning via Co-regularizing Both Models | As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data. |
Abstract | This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. |
Abstract | Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task. |
Introduction | Therefore, semi-supervised joint S&T appears to be a natural solution for easily incorporating accessible unlabeled data to improve the joint S&T model. |
Introduction | This study focuses on using a graph-based label propagation method to build a semi-supervised joint S&T model. |
Introduction | labeled and unlabeled data to achieve the semi-supervised learning. |
Related Work | There are few explorations of semi-supervised approaches for CWS or POS tagging in previous works. |
Related Work | (2008) described a Bayesian semi-supervised CWS model by considering the segmentation as the hidden variable in machine translation. |
Related Work | (2011) proposed a semi-supervised pipeline S&T model by incorporating n-gram and lexicon features derived from unlabeled data. |
Abstract | We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled. |
Active Learning for Sequence Labeling | This section, first, describes a common approach to AL for sequential data, and then presents our approach to semi-supervised AL. |
Active Learning for Sequence Labeling | 3.2 Semi-Supervised Active Learning |
Active Learning for Sequence Labeling | We call this semi-supervised Active Learning (SeSAL) for sequence labeling. |
Introduction | Accordingly, our approach is a combination of AL and self-training to which we will refer as semi-supervised Active Learning (SeSAL) for sequence labeling. |
Introduction | After a brief overview of the formal underpinnings of Conditional Random Fields, our base classifier for sequence labeling tasks (Section 2), a fully supervised approach to AL for sequence labeling is introduced and complemented by our semi-supervised approach in Section 3. |
Introduction | Our experiments are laid out in Section 5 where we compare fully and semi-supervised AL for NER on two corpora, the newspaper selection of MUC7 and PENNBIOIE, a biological abstracts corpus. |
Related Work | Self-training (Yarowsky, 1995) is a form of semi-supervised learning. |
Related Work | A combination of active and semi-supervised learning has first been proposed by McCallum and Nigam (1998) for text classification. |
Related Work | Similarly, co-testing (Muslea et al., 2002), a multi-view AL algorithms, selects examples for the multi-view, semi-supervised Co-EM algorithm. |
Introduction | By using unlabelled data to reduce data sparsity in the labeled training data, semi-supervised approaches improve generalization accuracy. |
Introduction | Semi-supervised models such as Ando and Zhang (2005), Suzuki and Isozaki (2008), and Suzuki et al. |
Introduction | It can be tricky and time-consuming to adapt an existing supervised NLP system to use these semi-supervised techniques. |
Supervised evaluation tasks | This technique for turning a supervised approach into a semi-supervised one is general and task-agnostic. |
Supervised evaluation tasks | We apply clustering and distributed representations to NER and chunking, which allows us to compare our semi-supervised models to those of Ando and Zhang (2005) and Suzuki and Isozaki (2008). |
Unlabled Data | Ando and Zhang (2005) present a semi-supervised learning algorithm called alternating structure optimization (ASO). |
Unlabled Data | Suzuki and Isozaki (2008) present a semi-supervised extension of CRFs. |
Unlabled Data | (2009), they extend their semi-supervised approach to more general conditional models.) |
Abstract | Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features. |
Experiment | In this subsection, we examined the usefulness of the new nonlocal features and the semi-supervised word cluster features described in Subsection 3.3. |
Experiment | We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features. |
Experiment | Compared with the StateAlign system which takes only the baseline features, the nonlocal features improved parsing F1 by 0.8%, while the semi-supervised word cluster features result in an improvement of 2.3% in parsing F1 and an 1.1% improvement on POS tagging accuracy. |
Introduction | Third, we take into account two groups of complex structural features that have not been previously used in transition-based parsing: nonlocal features (Charniak and Johnson, 2005) and semi-supervised word cluster features (Koo et al., 2008). |
Introduction | After integrating semi-supervised word cluster features, the parsing accuracy is further improved to 86.3% when trained on CTB 5.1 and 87.1% when trained on CTB 6.0, and this is the best reported performance for Chinese. |
Joint POS Tagging and Parsing with Nonlocal Features | To further improve the performance of our transition-based constituent parser, we consider two group of complex structural features: nonlocal features (Chamiak and Johnson, 2005; Collins and Koo, 2005) and semi-supervised word cluster features (Koo et al., 2008). |
Joint POS Tagging and Parsing with Nonlocal Features | Semi-supervised word cluster features have been successfully applied to many NLP tasks (Miller et al., 2004; Koo et al., 2008; Zhu et al., 2013). |
Joint POS Tagging and Parsing with Nonlocal Features | Using these two types of clusters, we construct semi-supervised word cluster features by mimicking the template structure of the original baseline features in Table 1. |
Abstract | The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited. |
Abstract | Experiments on standard product review datasets show that our method outperforms the state-of—the-art methods in both the supervised and semi-supervised settings. |
Approach | Global Sentiment Previous studies have demonstrated the value of document-level sentiment in guiding the semi-supervised learning of sentence-level sentiment (Tackstrom and McDonald, 2011b; Qu et al., 2012). |
Experiments | We evaluated our method in two settings: supervised and semi-supervised . |
Experiments | In the semi-supervised setting, our unlabeled data consists of |
Experiments | Table 3: Accuracy results (%) for semi-supervised sentiment classification (three-way) on the MD dataset |
Introduction | Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012). |
Introduction | Experimental results show that our model outperforms state-of-the-art methods in both the supervised and semi-supervised settings. |
Related Work | Our approach is also semi-supervised . |
Related Work | Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints. |
Bilingually-constrained Recursive Auto-encoders | And the semi-supervised phrase embedding (Socher et al., 2011; Socher et al., 2013a; Li et al., 2013) further indicates that phrase embedding can be tuned with respect to the label. |
Bilingually-constrained Recursive Auto-encoders | We will first briefly present the unsupervised phrase embedding, and then describe the semi-supervised framework. |
Bilingually-constrained Recursive Auto-encoders | 3.2 Semi-supervised Phrase Embedding |
Related Work | This kind of semi-supervised phrase embedding is in fact performing phrase clustering with respect to the phrase label. |
Related Work | Obviously, this kind methods of semi-supervised phrase embedding do not fully address the semantic meaning of the phrases. |
Abstract | Synonym detection exploits redundant information to train several domain-specific synonym classifiers in a semi-supervised fashion. |
Background: Never-Ending Language Learner | NELL is an information extraction system that has been running 24x7 for over a year, using coupled semi-supervised learning to populate an ontology from unstructured text found on the web. |
ConceptResolver | After mapping each noun phrase to one or more senses (each with a distinct category type), ConceptResolver performs semi-supervised clustering to find synonymous senses. |
ConceptResolver | For each category, ConceptResolver trains a semi-supervised synonym classifier then uses its predictions to cluster word senses. |
Introduction | Train a semi-supervised classifier to predict synonymy. |
Introduction | used to train a semi-supervised classifier. |
Prior Work | However, our evaluation shows that ConceptResolver has higher synonym resolution precision than Resolver, which we attribute to our semi-supervised approach and the known relation schema. |
Prior Work | ConceptResolver’s approach lies between these two extremes: we label a small number of synonyms (10 pairs), then use semi-supervised training to learn a similarity function. |
Prior Work | ConceptResolver uses a novel algorithm for semi-supervised clustering which is conceptually similar to other work in the area. |
Abstract | We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. |
Abstract | The semi-supervised learning plus the gazetteers alleviate the lack of training data. |
Abstract | Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning. |
Introduction | (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data. |
Introduction | Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing. |
Introduction | It is also demonstrated that integrating KNN classified results into the CRF model and semi-supervised learning considerably boost the performance. |
Related Work | Related work can be roughly divided into three categories: NER on tweets, NER on non-tweets (e.g., news, biological medicine, and clinical notes), and semi-supervised learning for NER. |
Related Work | To achieve this, a KNN classifier with a CRF model is combined to leverage cross tweets information, and the semi-supervised learning is adopted to leverage unlabeled tweets. |
Related Work | 2.3 Semi-supervised Learning for NER |
Abstract | Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. |
Abstract | In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. |
Introduction | Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006). |
Introduction | Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)). |
Problem Definition | We consider a semi-supervised classification task, in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, . |
Problem Definition | Our semi-supervised technique utilizes statistics computed over the labeled corpus, denoted as follows. |
Problem Definition | In addition to Multinomial Naive Bayes (discussed in Section 3), we evaluate against a variety of supervised and semi-supervised techniques from previous work, which provide a representation of the state of the art. |
Abstract | We present a novel semi-supervised training algorithm for learning dependency parsers. |
Abstract | By combining a supervised large margin loss with an unsupervised least squares loss, a dis-criminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems. |
Introduction | heavy dependence on annotated corpora—many researchers have investigated semi-supervised learning techniques that can take both labeled and unlabeled training data as input. |
Introduction | Unfortunately, although significant recent progress has been made in the area of semi-supervised learning, the performance of semi-supervised learning algorithms still fall far short of expectations, particularly in challenging real-world tasks such as natural language parsing or machine translation. |
Introduction | A large number of distinct approaches to semi-supervised training algorithms have been investigated in the literature (Bennett and Demiriz, 1998; Zhu et al., 2003; Altun et al., 2005; Mann and McCallum, 2007). |
Experiments | Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser. |
Experiments | Based on the extended parser, we experimented different types of semi-supervised features by adding the features incrementally. |
Experiments | By comparing the results in Table 5 and the results in Table 6 we can see that the semi-supervised features achieve an overall improvement of 1.0% on the English data and an im- |
Introduction | In addition to the above contributions, we apply a variety of semi-supervised learning techniques to our transition-based parser. |
Introduction | Experimental results show that semi-supervised methods give a further improvement of 0.9% in F-score on the English data and 2.4% on the Chinese data. |
Semi-supervised Parsing with Large Data | 4.4 Semi-supervised Features |
Experiments | Genuinely unlabeled posts for Political and Lotus were used for semi-supervised learning experiments in section 6.3; they were not used in section 6.2 on the effect of lexical prior knowledge. |
Experiments | Robustness to Vocabulary Size High dimensionality and noise can have profound impact on the comparative performance of clustering and semi-supervised learning algorithms. |
Experiments | The natural question is whether the presence of lexical constraints leads to better semi-supervised models. |
Introduction | However, the treatment of such dictionaries as forms of prior knowledge that can be incorporated in machine learning models is a relatively less explored topic; even lesser so in conjunction with semi-supervised models that attempt to utilize un- |
Related Work | In this regard, our model brings two interrelated but distinct themes from machine learning to bear on this problem: semi-supervised learning and learning from labeled features. |
Related Work | (Goldberg and Zhu, 2006) adapt semi-supervised graph-based methods for sentiment analysis but do not incorporate lexical prior knowledge in the form of labeled features. |
Related Work | We also note the very recent work of (Sindhwani and Melville, 2008) which proposes a dual-supervision model for semi-supervised sentiment analysis. |
Semi-Supervised Learning With Lexical Knowledge | Therefore, the semi-supervised learning with lexical knowledge can be described as |
Semi-Supervised Learning With Lexical Knowledge | Thus the algorithm for semi-supervised learning with lexical knowledge based on our matrix factorization framework, referred as SSMFLK, consists of an iterative procedure using the above three rules until convergence. |
Abstract | In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse level representation of prosodic events such as pitch accents, intonational phrase boundaries, and break indices. |
Co-training strategy for prosodic event detection | Co-training (Blum and Mitchell, 1998) is a semi-supervised multi-view algorithm that uses the initial training set to learn a (weak) classifier in each view. |
Conclusions | In addition, we plan to compare this to other semi-supervised learning techniques such as active learning. |
Experiments and results | Although the test condition is different, our result is significantly better than that of other semi-supervised approaches of previous work and comparable with supervised approaches. |
Introduction | Limited research has been conducted using unsupervised and semi-supervised methods. |
Introduction | In this paper, we exploit semi-supervised learning with the |
Introduction | Our experiments on the Boston Radio News corpus show that the use of unlabeled data can lead to significant improvement of prosodic event detection compared to using the original small training set, and that the semi-supervised learning result is comparable with supervised learning with similar amount of training data. |
Previous work | Limited research has been done in prosodic detection using unsupervised or semi-supervised methods. |
Previous work | She also exploited a semi-supervised approach using Laplacian SVM classification on a small set of examples. |
Previous work | In this paper, we apply co-training algorithm to automatic prosodic event detection and propose methods to better select samples to improve semi-supervised learning performance for this task. |
A <— METRICLEARNER(X, 3,1?) | 3.3 Semi-Supervised Classification |
A <— METRICLEARNER(X, 3,1?) | In this section, we trained the GRF classifier (see Equation 3), a graph-based semi-supervised leam-ing (SSL) algorithm (Zhu et al., 2003), using Gaussian kernel parameterized by A = FTP to set edge weights. |
Abstract | We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). |
Abstract | Through a variety of experiments on different real-world datasets, we find IDML—IT, a semi-supervised metric learning algorithm to be the most effective. |
Conclusion | In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). |
Conclusion | Through a variety of experiments on different real-world NLP datasets, we demonstrated that supervised as well as semi-supervised classifiers trained on the space learned by IDML—IT consistently result in the lowest classification errors. |
Introduction | Even though different supervised and semi-supervised metric learning algorithms have recently been proposed, effectiveness of the transformed spaces learned by them in NLP |
Introduction | We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective. |
Metric Learning | 2.3 Inference-Driven Metric Learning (IDML): Semi-Supervised |
Metric Learning | Since we are focusing on the semi-supervised learning (SSL) setting with n; labeled and nu unlabeled instances, the idea is to automatically label the unlabeled instances using a graph based SSL algorithm, and then include instances with low assigned label entropy (i.e., high confidence label assignments) in the next round of metric learning. |
Abstract | The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. |
Abstract | It outperforms the state of the art methods in the semi-supervised setting. |
Conclusions | The proposed method can be used in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where only a handful of seeds is used to define the two polarity classes. |
Experiments | This method could be used in a semi-supervised setting where a set of labeled words are used and the system learns from these labeled nodes and from other unlabeled nodes. |
Introduction | Previous work on identifying the semantic orientation of words has addressed the problem as both a semi-supervised (Takamura et al., 2005) and an unsupervised (Turney and Littman, 2003) learning problem. |
Introduction | In the semi-supervised setting, a training set of labeled words |
Introduction | The proposed method could be used both in a semi-supervised and in an unsupervised setting. |
Word Polarity | This view is closely related to the partially labeled classification with random walks approach in (Szummer and J aakkola, 2002) and the semi-supervised learning using harmonic functions approach in (Zhu et al., 2003). |
Abstract | We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted. |
Conclusion | We evaluated it against the semi-supervised systems of NEWS10 and achieved high F-measure and performed better than most of the semi-supervised systems. |
Experiments | On the WIL data sets, we compare our fully unsupervised system with the semi-supervised systems presented at the NEWSlO (Kumaran et al., 2010). |
Experiments | For English/Arabic, English/Hindi and English/Tamil, our system is better than most of the semi-supervised systems presented at the NEWS 2010 shared task for transliteration mining. |
Introduction | We compare our unsupervised transliteration mining method with the semi-supervised systems presented at the NEWS 2010 shared task on transliteration mining (Kumaran et al., 2010) using four language pairs. |
Introduction | These systems used a manually labelled set of data for initial supervised training, which means that they are semi-supervised systems. |
Introduction | We achieve an F-measure of up to 92% outperforming most of the semi-supervised systems. |
Previous Research | Our unsupervised method seems robust as its performance is similar to or better than many of the semi-supervised systems on three language pairs. |
Abstract | This paper proposes a novel (semi-)supervised hashing method named Semi-Supervised SimHash (83H) for high—dimensional data similarity search. |
Background and Related Works | 2.3 Semi-Supervised Hashing |
Background and Related Works | Semi-Supervised Hashing (SSH) (Wang et al., 2010a) is recently proposed to incorporate prior knowledge for better hashing. |
Introduction | vated by this, some supervised methods are proposed to derive effective hash functions from prior knowledge, i.e., Spectral Hashing (Weiss et al., 2009) and Semi-Supervised Hashing (SSH) (Wang et al., 2010a). |
Introduction | This paper proposes a novel (semi-)supervised hashing method, Semi-Supervised SimHash (S3H), for high-dimensional data similarity search. |
Introduction | In Section 3, we describe our proposed Semi-Supervised SimHash (S3H). |
Semi-Supervised SimHash | In this section, we present our hashing method, named Semi-Supervised SimHash (83H). |
The direction is determined by concatenating w L times. | We have proposed a novel supervised hashing method named Semi-Supervised Simhash (83H) for high-dimensional data similarity search. |
Abstract | To address this problem, we propose a semi-supervised approach to sentiment classification where we first mine the unambiguous reviews using spectral techniques and then exploit them to classify the ambiguous reviews via a novel combination of active learning, transductive learning, and ensemble learning. |
Conclusions | We have proposed a novel semi-supervised approach to polarity classification. |
Conclusions | Since the semi-supervised learner is discriminative, our approach can adopt a richer representation that makes use of more sophisticated features such as bigrams or manually labeled sentiment-oriented words. |
Evaluation | Semi-supervised spectral clustering. |
Evaluation | We implemented Kamvar et al.’s (2003) semi-supervised spectral clustering algorithm, which incorporates labeled data into the clustering framework in the form of must-link and cannot-link constraints. |
Evaluation | As we can see, accuracy ranges from 57.3% to 68.7% and ARI ranges from 0.02 to 0.14, which are significantly better than those of semi-supervised spectral learning. |
Introduction | In light of the difficulties posed by ambiguous reviews, we differentiate between ambiguous and unambiguous reviews in our classification process by addressing the task of semi-supervised polarity classification via a “mine the easy, classify the hard” approach. |
Our Approach | Recall that the goal of this step is not only to identify the unambiguous reviews, but also to annotate them as POSITIVE or NEGATIVE, so that they can serve as seeds for semi-supervised learning in a later step. |
Introduction | This is necessary for the scenario of semi-supervised learning of weights with partially annotated sentences, as described later. |
Introduction | Semi-supervised learning: In the semi-supervised case, the labels yz-(k) are known only for some of the tokens in x(k). |
Introduction | The semi-supervised approach enables incorporation of significantly more training data. |
Bootstrapping | Abney (2004) defines useful notation for semi-supervised learning, shown in table 1. |
Existing algorithms 3.1 Yarowsky | Haffari and Sarkar (2007) suggest a bipartite graph framework for semi-supervised learning based on their analysis of Y— l/DL-l-VS and objective (2). |
Existing algorithms 3.1 Yarowsky | 3.7 Semi-supervised learning algorithm of Subramanya et al. |
Existing algorithms 3.1 Yarowsky | (2010) give a semi-supervised algorithm for part of speech tagging. |
Graph propagation | Note that (3) is independent of their specific graph structure, distributions, and semi-supervised learning algorithm. |
Introduction | In this paper, we are concerned with a case of semi-supervised learning that is close to unsupervised learning, in that the labelled and unlabelled data points are from the same domain and only a small set of seed rules is used to derive the labelled points. |
Introduction | In contrast, typical semi-supervised learning deals with a large number of labelled points, and a domain adaptation task with unlabelled points from the new domain. |
Abstract | To deal with these issues, we describe an efficient semi-supervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). |
Abstract | Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging. |
Introduction | To deal with these issues, we present a new semi-supervised learning (SSL) approach, which mainly has two components. |
Related Work and Motivation | (I) Semi-Supervised Tagging. |
Related Work and Motivation | 0 (Wang et al., 2009; Li et al., 2009; Li, 2010; Liu et al., 2011) investigate web query tagging using semi-supervised sequence models. |
Semi-Supervised Semantic Labeling | 4.2 Retrospective Semi-Supervised CRF |
Semi-Supervised Semantic Labeling | Algorithm 2 Retrospective Semi-Supervised CRF Input: Labeled Lil, and unlabeled Ll“ data. |
Conclusion and Future Work | The features and weights are tuned with an iterative semi-supervised method. |
Experiments and Results | To perform consensus-based re-ranking, we first use the baseline decoder to get the n-best list for each sentence of development and test data, then we create graph using the n-best lists and training data as we described in section 5.1, and perform semi-supervised training as mentioned in section 4.3. |
Features and Training | Algorithm 1 Semi-Supervised Learning |
Features and Training | Algorithm 1 outlines our semi-supervised method for such alternative training. |
Graph-based Translation Consensus | Before elaborating how the graph model of consensus is constructed for both a decoder and N-best output re-ranking in section 5, we will describe how the consensus features and their feature weights can be trained in a semi-supervised way, in section 4. |
Introduction | Alexandrescu and Kirchhoff (2009) proposed a graph-based semi-supervised model to re-rank n-best translation output. |
Abstract | We present a simple semi-supervised relation extraction system with large-scale word clustering. |
Abstract | When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system. |
Cluster Feature Selection | The cluster based semi-supervised system works by adding an additional layer of lexical features that incorporate word clusters as shown in column 4 of Table 4. |
Conclusion and Future Work | We have described a semi-supervised relation extraction system with large-scale word clustering. |
Experiments | For the semi-supervised system, 70 percent of the rest of the documents were randomly selected as training data and 30 percent as development data. |
Experiments | For the semi-supervised system, each test fold was the same one used in the baseline and the other 4 folds were further split into a training set and a development set in a ratio of 7:3 for selecting clusters. |
Abstract | While a variety of semi-supervised methods exist for training from incomplete data, there are open questions regarding what types of training data should be used and how much is necessary. |
Abstract | Our results show that annotation of word types is the most important, provided a sufficiently capable semi-supervised learning infrastructure is in place to project type information onto a raw corpus. |
Conclusions and Future Work | Most importantly, it is clear that type annotations are the most useful input one can obtain from a linguist—provided a semi-supervised algorithm for projecting that information reliably onto raw tokens is available. |
Data | While we do not explore a rule-writing approach to POS-tagging, we do consider the impact of rule-based morphological analyzers as a component in our semi-supervised POS-tagging system. |
Experiments3 | In addition to annotations, semi-supervised tagger training requires a corpus of raw text. |
Introduction | The overwhelming take away from our results is that type supervision—when backed by an effective semi-supervised learning approach—is the most important source of linguistic information. |
Abstract | We present a simple and effective semi-supervised method for training dependency parsers. |
Conclusions | In this paper, we have presented a simple but effective semi-supervised learning approach and demonstrated that it achieves substantial improvement over a competitive baseline in two broad-coverage depen- |
Introduction | In this paper, we introduce lexical intermediaries via a simple two-stage semi-supervised approach. |
Introduction | In general, semi-supervised learning can be motivated by two concerns: first, given a fixed amount of supervised data, we might wish to leverage additional unlabeled data to facilitate the utilization of the supervised corpus, increasing the performance of the model in absolute terms. |
Introduction | We show that our semi-supervised approach yields improvements for fixed datasets by performing parsing experiments on the Penn Treebank (Marcus et al., 1993) and Prague Dependency Treebank (Hajic, 1998; Hajic et al., 2001) (see Sections 4.1 and 4.3). |
Related Work | Semi-supervised phrase structure parsing has been previously explored by McClosky et al. |
Conclusions | In this paper, a distributional approach for acquiring a semi-supervised model of argument classification (AC) preferences has been proposed. |
Conclusions | Moreover, dimensionality reduction methods alternative to LSA, as currently studied on semi-supervised spectral learning (Johnson and Zhang, 2008), will be experimented. |
Introduction | Finally, the application of semi-supervised learning is attempted to increase the lexical expressiveness of the model, e.g. |
Introduction | A semi-supervised statistical model exploiting useful lexical information from unlabeled corpora is proposed. |
Related Work | Accordingly a semi-supervised approach for reducing the costs of the manual annotation effort is proposed. |
Related Work | It embodies the idea that a multitask learning architecture coupled with semi-supervised learning can be effectively applied even to complex linguistic tasks such as SRL. |
Discussion | Therefore, it is also interesting to combine them in a discriminative way as per-sued in POS tagging using CRF+HMM (Suzuki et al., 2007), let alone a simple semi-supervised approach in Section 5.2. |
Experiments | Table 4: Semi-supervised and supervised results. |
Experiments | Semi-supervised results used only 10K sentences (1/5) of supervised segmentations. |
Experiments | Furthermore, NPYLM is easily amenable to semi-supervised or even supervised learning. |
Introduction | In Section 5 we describe experiments on the standard datasets in Chinese and Japanese in addition to English phonetic transcripts, and semi-supervised experiments are also explored. |
Abstract | Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. |
Conclusion | Since a large labeled sentiment relevance resource does not yet exist, we investigated semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data. |
Distant Supervision | Since a large labeled resource for sentiment relevance classification is not yet available, we investigate semi-supervised methods for creating sentiment relevance classifiers. |
Introduction | For this reason, we investigate two semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data. |
Transfer Learning | To address the problem that we do not have enough labeled SR data we now investigate a second semi-supervised method for SR classification, transfer learning (TL). |
Abstract | In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. |
Introduction | McCallum and Nigam (1998) combined active learning and semi-supervised learning technique |
Introduction | Figure l: A combination of active learning and semi-supervised learning starting with positive and unlabeled examples |
Introduction | Next we use Nigam’s semi-supervised learning method using EM and a naive Bayes classifier (Nigam et. |
Abstract | We propose a semi-supervised framework for generating a domain-specific sentiment lexicon and inferring sentiments at the segment level. |
Introduction | To address the above shortcomings of lexicon and granularity, we propose a semi-supervised framework named ReNew. |
Related Work | Rao and Ravichandran (2009) formalize the problem of sentiment detection as a semi-supervised label propagation problem in a graph. |
Related Work | Esuli and Sebas-tiani (2006) use a set of classifiers in a semi-supervised fashion to iteratively expand a manu- |
Related Work | (2011) introduce a semi-supervised approach that uses recursive autoencoders to learn the hierarchical structure and sentiment distribution of a sentence. |
Abstract | We present a graph-based semi-supervised learning for the question-answering (QA) task for ranking candidate sentences. |
Abstract | We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA. |
Conclusions and Discussions | (2) We will use other distance measures to better explain entailment between q/a pairs and compare with other semi-supervised and transductive approaches. |
Graph Based Semi-Supervised Learning for Entailment Ranking | We formulate semi-supervised entailment rank scores as follows. |
Introduction | Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam- |
Related Work | Several researchers have proposed semi-supervised text classification algorithms with the aim of reducing the time, effort and cost involved in labeling documents. |
Related Work | Semi-supervised text classification algorithms proposed in (Nigam et al., 2000), (J oachims, 1999), (Zhu and Ghahra—mani, 2002) and (Blum and Mitchell, 1998) are a few examples of this type. |
Related Work | The third type of semi-supervised text classification algorithms is based on active learning. |
Abstract | A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. |
Conclusion and Future Work | We apply our model to SMT decoding, and propose a three-step semi-supervised training method. |
Introduction | We propose a three-step semi-supervised training approach to optimizing the parameters of RZNN, which includes recursive auto-encoding for unsupervised pre-training, supervised local training based on the derivation trees of forced decoding, and supervised global training using early update strategy. |
Introduction | Our RZNN framework is introduced in detail in Section 3, followed by our three-step semi-supervised training approach in Section 4. |
Experiment | Another baseline is Li and Sun (2009), which also uses punctuation in their semi-supervised framework. |
INTRODUCTION | We build a semi-supervised learning (SSL) framework which can iteratively incorporate newly labeled instances from unlabeled micro-blog data during the training process. |
Related Work | Meanwhile semi-supervised methods have been applied into NLP applications. |
Related Work | Similar semi-supervised applications include Shen et al. |
Autoencoders for Grounded Semantics | Alternatively, a semi-supervised criterion can be used (Ranzato and Szummer, 2008; Socher et al., 2011) through combination of the unsupervised training criterion (global reconstruction) with a supervised criterion (prediction of some target given the latent representation). |
Autoencoders for Grounded Semantics | Stacked Bimodal Autoencoder We finally build a stacked bimodal autoencoder (SAE) with all pre-trained layers and fine-tune them with respect to a semi-supervised criterion. |
Autoencoders for Grounded Semantics | Furthermore, the semi-supervised setting affords flexibility, allowing to adapt the architecture to specific tasks. |
Related Work | Secondly, our problem setting is different from the former studies, which usually deal with classification tasks and fine-tune the deep neural networks using training data with explicit class labels; in contrast we fine-tune our autoencoders using a semi-supervised criterion. |
Experiments and Results | The results for the semi-supervised models are nonconclusive. |
Finding the Homographs in a Lexicon | We experiment with three model setups: Supervised, semi-supervised , and unsupervised. |
Finding the Homographs in a Lexicon | In Model II, the semi-supervised setup, the training data is used to initialize the Expectation-Maximization (EM) algorithm (Dempster et al., 1977) and the unlabeled data, described in Section 3.1, updates the initial estimates. |
Finding the Homographs in a Lexicon | The unsupervised setup, Model III, is similar to the semi-supervised setup except that the EM algorithm is initialized using an informed guess by the authors. |
Multitask Learning for Discourse Relation Prediction | ASO has been shown to be useful in a semi-supervised learning configuration for several NLP applications, such as, text chunking (Ando and Zhang, 2005b) and text classification (Ando and Zhang, 2005a). |
Related Work | 2.1.3 Semi-supervised approaches |
Related Work | (Hernault et al., 2010) presented a semi-supervised method based on the analysis of co-occurring features in labeled and unlabeled data. |
Related Work | Very recently, (Hernault et al., 2011) introduced a semi-supervised work using structure learning method for discourse relation classification, which is quite relevant to our work. |
Experiments | resorting to complicated features, system combination and other semi-supervised technologies. |
Related Work | Lots of efforts have been devoted to semi-supervised methods in sequence labeling and word segmentation (Xu et al., 2008; Suzuki and Isozaki, 2008; Haffari and Sarkar, 2008; Tomanek and Hahn, 2009; Wang et al., 2011). |
Related Work | A semi-supervised method tries to find an optimal hyperplane of both annotated data and raw data, thus to result in a model with better coverage and higher accuracy. |
Related Work | It is fundamentally different from semi-supervised and unsupervised methods in that we aimed to excavate a totally different kind of knowledge, the natural annotations implied by the structural information in web text. |
Experiments | (2006): the semi-supervised Alternating Structural Optimization (ASO) technique and the Structural Correspondence Learning (SCL) technique for domain adaptation. |
Related Work | Ando and Zhang develop a semi-supervised chunker that outperforms purely supervised approaches on the CoNLL 2000 dataset (Ando and Zhang, 2005). |
Related Work | Recent projects in semi-supervised (Toutanova and Johnson, 2007) and unsupervised (Biemann et al., 2007; Smith and Eisner, 2005) tagging also show significant progress. |
Related Work | HMMs have been used many times for POS tagging and chunking, in supervised, semi-supervised , and in unsupervised settings (Banko and Moore, 2004; Goldwater and Griffiths, 2007; Johnson, 2007; Zhou, 2004). |
Introduction | (2009) proposed a rule-based semi-supervised learning methods for lexicon extraction. |
Introduction | Semi-Supervised Method (Semi) we implement the double propagation model proposed in (Qiu et al., 2009). |
Introduction | The relational bootstrapping method performs better than the unsupervised method, TrAdaBoost and the cross-domain CRF algorithm, and achieves comparable results with the semi-supervised method. |
Experiments | Suzuki2009 (Suzuki et al., 2009) reported the best reported result by combining a Semi-supervised Structured Conditional Model (Suzuki and Isozaki, 2008) with the method of (Koo et al., 2008). |
Experiments | G denotes the supervised graph-based parsers, S denotes the graph-based parsers with semi-supervised methods, D denotes our new parsers |
Related work | (2009) presented a semi-supervised learning approach. |
Related work | They extended a Semi-supervised Structured Conditional Model (SS-SCM)(Suzuki and Isozaki, 2008) to the dependency parsing problem and combined their method with the approach of Koo et al. |
Abstract | In this paper, we propose a novel method for semi-supervised learning of non-projective log-linear dependency parsers using directly expressed linguistic prior knowledge (e.g. |
Experimental Comparison with Unsupervised Learning | In this paper, we developed a novel method for the semi-supervised learning of a non-projective CRF dependency parser that directly uses linguistic prior knowledge as a training signal. |
Related Work | Conventional semi-supervised learning requires parsed sentences. |
Introduction | Traditionally, we would learn the embeddings for the target task jointly with whatever unlabeled data we may have, in an instance of semi-supervised learning, and/or we may leverage labels from multiple other related tasks in a multitask approach. |
Related Work | Embeddings are learned in a semi-supervised fashion, and the components of the embedding are given an explicit probabilistic interpretation. |
Related Work | In machine learning literature, joint semi-supervised embedding takes form in methods such as the LaplacianSVM (LapSVM) (Belkin et al., 2006) and Label Propogation (Zhu and Ghahra—mani, 2002), to which our approach is related. |
Experiments | Note that we ran Brown clustering only on the training documents; running it on a larger collection of (unlabeled) documents relevant to the prediction task (i.e., semi-supervised learning) is worth exploring in future work. |
Structured Regularizers for Text | This contrasts with typical semi-supervised learning methods for text categorization that combine unlabeled and labeled data within a generative model, such as multinomial na‘1've Bayes, via expectation-maximization (Nigam et al., 2000) or semi-supervised frequency estimation (Su et al., 2011). |
Structured Regularizers for Text | We leave comparison with other semi-supervised methods for future work. |
Related work | (2006) explored semi-supervised learning for relation extraction using label propagation, which makes use of unlabeled data. |
Task definition | Another existing solution to weakly-supervised learning problems is semi-supervised learning, e.g. |
Task definition | However, because our proposed transfer learning method can be combined with semi-supervised learning, here we do not include semi-supervised learning as a baseline. |
Experiments | Afterwards, word-syntactic pattern co-occurrence statistic is used as feature for a semi-supervised classifier TSVM (J oachims, 1999) to further refine the results. |
Introduction | At the same time, a semi-supervised convolutional neural model (Collobert et al., 2011) is employed to encode contextual semantic clue. |
Related Work | A recent research (Xu et al., 2013) extracted infrequent product features by a semi-supervised classifier, which used word-syntactic pattern co-occurrence statistics as features for the classifier. |
Abstract | The evaluation set is derived from WordNet in a semi-supervised way. |
Conclusion and Future Work | We proposed a semi-supervised way to extract non-compositional MWEs from WordNet. |
Introduction and related work | Thirdly, we propose a semi-supervised approach for extracting non-compositional MWEs from WordNet, to decrease annotation cost. |
Discussion and Related Work | In earlier work on semi-supervised learning, e. g., (Blum and Mitchell 1998), the classifiers learned from unlabeled data were used directly. |
Discussion and Related Work | There are two main scenarios that motivate semi-supervised learning. |
Introduction | In this paper, we present a semi-supervised learning algorithm that goes a step further. |
Introduction | Even with semi-supervised approaches, which use a large unlabeled corpus, manual construction of a small set of seeds known as true instances of the target entity or relation is susceptible to arbitrary human decisions. |
Introduction | 0 The combination of these patterns produces a clustering method to achieve high precision for different Information Extraction applications, especially for bootstrapping a high-recall semi-supervised relation extraction system. |
Related Work | (Rosenfeld and Feldman, 2006) showed that the clusters discovered by URI are useful for seeding a semi-supervised relation extraction system. |
Abstract | In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. |
Evaluation | In this set of experiments, we examined if the improvements in §3.2 can be explained primarily through the extraction of language model characteristics during the semi-supervised learning phase, or through orthogonal pieces of evidence. |
Introduction | Our work introduces a new take on the problem using graph-based semi-supervised learning to acquire translation rules and probabilities by leveraging both monolingual and parallel data resources. |
Abstract | In the literature, the mainstream research on relation extraction adopts statistical machine learning methods, which can be grouped into supervised learning (Zelenko et al., 2003; Culotta and Soresen, 2004; Zhou et al., 2005; Zhang et al., 2006; Qian et al., 2008; Chan and Roth, 2011), semi-supervised learning (Zhang et al., 2004; Chen et al., 2006; Zhou et al., 2008; Qian et al., 2010) and unsupervised learning (Hase-gawa et al., 2004; Zhang et al., 2005) in terms of the amount of labeled training data they need. |
Abstract | Therefore, Kim and Lee (2012) further employ a graph-based semi-supervised learning method, namely Label Propagation (LP), to indirectly propagate labels from the source language to the target language in an iterative fashion. |
Abstract | For future work, on one hand, we plan to combine uncertainty sampling with diversity and informativeness measures; on the other hand, we intend to combine BAL with semi-supervised learning to further reduce human annotation efforts. |
Extraction with Lexicons | Then Section 4.2 presents our semi-supervised algorithm for learning semantic lexicons from these lists. |
Extraction with Lexicons | 4.2 Semi-Supervised Learning of Lexicons |
Introduction | When learning an extractor for relation R, LUCHS extracts seed phrases from R’s training data and uses a semi-supervised learning algorithm to create several relation-specific lexicons at different points on a precision-recall spectrum. |
Introduction | Furstenau and Lapata (2009b; 2009a) use semi-supervised techniques to automatically annotate data for previously unseen predicates with semantic role information. |
Introduction | (2008) use deep learning techniques based on semi-supervised em-beddings to improve an SRL system, though their tests are on in-domain data. |
Introduction | Unsupervised SRL systems (Swier and Stevenson, 2004; Grenager and Manning, 2006; Abend et al., 2009) can naturally be ported to new domains with little trouble, but their accuracy thus far falls short of state-of-the-art supervised and semi-supervised systems. |
Data | We use the standard splits of the data used in semi-supervised tagging experiments (e. g. Banko and Moore (2004)): sections 0-18 for training, 19-21 for development, and 22-24 for test. |
Experiments | The HMM when using full supervision obtains 87.6% accuracy (Baldridge, 2008),8 so the accuracy of 63.8% achieved by EMGI+IPGI nearly halves the gap between the supervised model and the 45.6% obtained by basic EM semi-supervised model. |
Introduction | This provides a much more challenging starting point for the semi-supervised methods typically applied to the task. |
Experimental Setup | Forms of Supervision We consider both semi-supervised and supervised learning. |
Experimental Setup | In the semi-supervised scenario, we assume access to the numerical answers of all problems in the training corpus and to a small number of problems paired with full equation systems. |
Learning | Also, using different types of validation functions on different subsets of the data enables semi-supervised learning. |
Related Work | (2007) propose an objective function for semi-supervised extraction that balances likelihood of labeled instances and constraint violation on unlabeled instances. |
Results | Comparison against Supervised CRF Our final set of experiments compares a semi-supervised version of our model against a conditional random field (CRF) model. |
Results | For finance, it takes at least 10 annotated documents (corresponding to roughly 130 annotated relation instances) for the CRF to match the semi-supervised model’s performance. |
Related Work | There are three main approaches to CoRe: supervised, semi-supervised (or weakly supervised) and unsupervised. |
Related Work | We use the term semi-supervised for approaches that use some amount of human-labeled coreference pairs. |
Related Work | (2002) used co-training for coreference resolution, a semi-supervised method. |
Related Work | Semi-supervised Learning. |
Related Work | Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009). |
Related Work | Among the popular semi-supervised methods (e. g. EM on Nai've Bayes (Nigam et al., 2000), co-training (Blum and Mitchell, 1998), transductive SVMs (Joachims, 1999b), and co-regularization (Sindhwani et al., 2005; Amini et al., 2010)), our approach employs the EM algorithm, extending it to the bilingual case based on maximum entropy. |
Experiments | Type D, C and S denote discriminative, combined and semi-supervised systems, respectively. |
Experiments | We also compare our method with the semi-supervised approaches, the semi-supervised approaches achieved very high accuracies by leveraging on large unlabeled data directly into the systems for joint learning and decoding, while in our method, we only explore the N-gram features to further improve supervised dependency parsing performance. |
Introduction | (2008) proposed a semi-supervised dependency parsing by introducing lexical intermediaries at a coarser level than words themselves via a cluster method. |
Introduction | Semi-Supervised Modeling |
Introduction | With seeds, our models are thus semi-supervised and need a different formulation. |
Related Work | In (Lu and Zhai, 2008), a semi-supervised model was proposed. |
Conclusion | A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model. |
Introduction | We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework. |
Model | JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions. |
Introduction | Quite a lot of learning techniques e. g., semi-supervised learning, self-training, and active learning have been proposed. |
Introduction | proposed a semi-supervised learning approach called the Graph Mincut algorithm which uses a small number of positive and negative examples and assigns values to unlabeled examples in a way that optimizes consistency in a nearest-neighbor sense (Blum et al., 2001). |
Introduction | Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data. |