Abstract | To tackle these challenges, we propose a novel semi-supervised graph regularization model to incorporate both local and global evidence from multiple tweets through three fine-grained relations. |
Introduction | Therefore, it is challenging to create sufficient high quality labeled tweets for supervised models and worth considering semi-supervised learning with the exploration of unlabeled data. |
Introduction | However, when selecting semi-supervised learning frameworks, we noticed another unique challenge that tweets pose to wikification due to their informal writing style, shortness and noisiness. |
Introduction | Therefore, a collective inference model over multiple tweets in the semi-supervised setting is desirable. |
Principles and Approach Overview | The label assignment is obtained by our semi-supervised graph regularization framework based on a relational graph, which is constructed from local compatibility, coreference, and semantic relatedness relations. |
Relational Graph Construction | They con-:rol the contributions of these three relations in our semi-supervised graph regularization model. |
Relational Graph Construction | (ii) It is more appropriate for our graph-based semi-supervised model since it is difficult to assign labels to a pair of mention and concept in the referent graph. |
Semi-supervised Graph Regularization | We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003): |
Abstract | This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training. |
Abstract | Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training. |
Ambiguity-aware Ensemble Training | In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard. |
Ambiguity-aware Ensemble Training | We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers. |
Experiments and Analysis | For the semi-supervised parsers trained with Algorithm 1, we use N1 = 20K and M1 = 50K for English, and N1 = 15K and M1 = 50K for Chinese, based on a few preliminary experiments. |
Experiments and Analysis | For semi-supervised cases, one iteration takes about 2 hours on an IBM server having 2.0 GHz Intel Xeon CPUs and 72G memory. |
Introduction | In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data, have attracted more and more interest. |
Introduction | To solve above issues, this paper proposes a more general and effective framework for semi-supervised dependency parsing, referred to as ambiguity-aware ensemble training. |
Introduction | We propose a generalized ambiguity-aware ensemble training framework for semi-supervised dependency parsing, which can |
Abstract | Using the unsupervised pre-trained deep belief net (DBN) to initialize DAE’s parameters and using the input original phrase features as a teacher for semi-supervised fine-tuning, we learn new semi-supervised DAE features, which are more effective and stable than the unsupervised DBN features. |
Abstract | On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively. |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model. |
Introduction | By using the input data as the teacher, the “semi-supervised” fine-tuning process of DAE addresses the problem of “back-propagation without a teacher” (Rumelhart et al., 1986), which makes the DAB learn more powerful and abstract features (Hinton and Salakhutdinov, 2006). |
Introduction | For our semi-supervised DAE feature learning task, we use the unsupervised pre-trained DBN to initialize DAE’s parameters and use the input original phrase features as the “teacher” for semi-supervised back-propagation. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Figure 2: After the unsupervised pre-training, the DBNs are “unrolled” to create a semi-supervised DAE, which is then fine-tuned using back-propagation of error derivatives. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To learn a semi-supervised DAE, we first “unroll” the above 11 layer DBN by using its weight matrices to create a deep, 2n-l layer network whose lower layers use the matrices to “encode” the input and whose upper layers use the matrices in reverse order to “decode” the input (Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2009; Deng et al., 2010), as shown in Figure 2. |
Abstract | Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features. |
Experiment | In this subsection, we examined the usefulness of the new nonlocal features and the semi-supervised word cluster features described in Subsection 3.3. |
Experiment | We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features. |
Experiment | Compared with the StateAlign system which takes only the baseline features, the nonlocal features improved parsing F1 by 0.8%, while the semi-supervised word cluster features result in an improvement of 2.3% in parsing F1 and an 1.1% improvement on POS tagging accuracy. |
Introduction | Third, we take into account two groups of complex structural features that have not been previously used in transition-based parsing: nonlocal features (Charniak and Johnson, 2005) and semi-supervised word cluster features (Koo et al., 2008). |
Introduction | After integrating semi-supervised word cluster features, the parsing accuracy is further improved to 86.3% when trained on CTB 5.1 and 87.1% when trained on CTB 6.0, and this is the best reported performance for Chinese. |
Joint POS Tagging and Parsing with Nonlocal Features | To further improve the performance of our transition-based constituent parser, we consider two group of complex structural features: nonlocal features (Chamiak and Johnson, 2005; Collins and Koo, 2005) and semi-supervised word cluster features (Koo et al., 2008). |
Joint POS Tagging and Parsing with Nonlocal Features | Semi-supervised word cluster features have been successfully applied to many NLP tasks (Miller et al., 2004; Koo et al., 2008; Zhu et al., 2013). |
Joint POS Tagging and Parsing with Nonlocal Features | Using these two types of clusters, we construct semi-supervised word cluster features by mimicking the template structure of the original baseline features in Table 1. |
Abstract | The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited. |
Abstract | Experiments on standard product review datasets show that our method outperforms the state-of—the-art methods in both the supervised and semi-supervised settings. |
Approach | Global Sentiment Previous studies have demonstrated the value of document-level sentiment in guiding the semi-supervised learning of sentence-level sentiment (Tackstrom and McDonald, 2011b; Qu et al., 2012). |
Experiments | We evaluated our method in two settings: supervised and semi-supervised . |
Experiments | In the semi-supervised setting, our unlabeled data consists of |
Experiments | Table 3: Accuracy results (%) for semi-supervised sentiment classification (three-way) on the MD dataset |
Introduction | Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012). |
Introduction | Experimental results show that our model outperforms state-of-the-art methods in both the supervised and semi-supervised settings. |
Related Work | Our approach is also semi-supervised . |
Related Work | Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints. |
Bilingually-constrained Recursive Auto-encoders | And the semi-supervised phrase embedding (Socher et al., 2011; Socher et al., 2013a; Li et al., 2013) further indicates that phrase embedding can be tuned with respect to the label. |
Bilingually-constrained Recursive Auto-encoders | We will first briefly present the unsupervised phrase embedding, and then describe the semi-supervised framework. |
Bilingually-constrained Recursive Auto-encoders | 3.2 Semi-supervised Phrase Embedding |
Related Work | This kind of semi-supervised phrase embedding is in fact performing phrase clustering with respect to the phrase label. |
Related Work | Obviously, this kind methods of semi-supervised phrase embedding do not fully address the semantic meaning of the phrases. |
Introduction | This is necessary for the scenario of semi-supervised learning of weights with partially annotated sentences, as described later. |
Introduction | Semi-supervised learning: In the semi-supervised case, the labels yz-(k) are known only for some of the tokens in x(k). |
Introduction | The semi-supervised approach enables incorporation of significantly more training data. |
Abstract | We propose a semi-supervised framework for generating a domain-specific sentiment lexicon and inferring sentiments at the segment level. |
Introduction | To address the above shortcomings of lexicon and granularity, we propose a semi-supervised framework named ReNew. |
Related Work | Rao and Ravichandran (2009) formalize the problem of sentiment detection as a semi-supervised label propagation problem in a graph. |
Related Work | Esuli and Sebas-tiani (2006) use a set of classifiers in a semi-supervised fashion to iteratively expand a manu- |
Related Work | (2011) introduce a semi-supervised approach that uses recursive autoencoders to learn the hierarchical structure and sentiment distribution of a sentence. |
Related Work | Several researchers have proposed semi-supervised text classification algorithms with the aim of reducing the time, effort and cost involved in labeling documents. |
Related Work | Semi-supervised text classification algorithms proposed in (Nigam et al., 2000), (J oachims, 1999), (Zhu and Ghahra—mani, 2002) and (Blum and Mitchell, 1998) are a few examples of this type. |
Related Work | The third type of semi-supervised text classification algorithms is based on active learning. |
Abstract | A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. |
Conclusion and Future Work | We apply our model to SMT decoding, and propose a three-step semi-supervised training method. |
Introduction | We propose a three-step semi-supervised training approach to optimizing the parameters of RZNN, which includes recursive auto-encoding for unsupervised pre-training, supervised local training based on the derivation trees of forced decoding, and supervised global training using early update strategy. |
Introduction | Our RZNN framework is introduced in detail in Section 3, followed by our three-step semi-supervised training approach in Section 4. |
Autoencoders for Grounded Semantics | Alternatively, a semi-supervised criterion can be used (Ranzato and Szummer, 2008; Socher et al., 2011) through combination of the unsupervised training criterion (global reconstruction) with a supervised criterion (prediction of some target given the latent representation). |
Autoencoders for Grounded Semantics | Stacked Bimodal Autoencoder We finally build a stacked bimodal autoencoder (SAE) with all pre-trained layers and fine-tune them with respect to a semi-supervised criterion. |
Autoencoders for Grounded Semantics | Furthermore, the semi-supervised setting affords flexibility, allowing to adapt the architecture to specific tasks. |
Related Work | Secondly, our problem setting is different from the former studies, which usually deal with classification tasks and fine-tune the deep neural networks using training data with explicit class labels; in contrast we fine-tune our autoencoders using a semi-supervised criterion. |
Experimental Setup | Forms of Supervision We consider both semi-supervised and supervised learning. |
Experimental Setup | In the semi-supervised scenario, we assume access to the numerical answers of all problems in the training corpus and to a small number of problems paired with full equation systems. |
Learning | Also, using different types of validation functions on different subsets of the data enables semi-supervised learning. |
Abstract | In the literature, the mainstream research on relation extraction adopts statistical machine learning methods, which can be grouped into supervised learning (Zelenko et al., 2003; Culotta and Soresen, 2004; Zhou et al., 2005; Zhang et al., 2006; Qian et al., 2008; Chan and Roth, 2011), semi-supervised learning (Zhang et al., 2004; Chen et al., 2006; Zhou et al., 2008; Qian et al., 2010) and unsupervised learning (Hase-gawa et al., 2004; Zhang et al., 2005) in terms of the amount of labeled training data they need. |
Abstract | Therefore, Kim and Lee (2012) further employ a graph-based semi-supervised learning method, namely Label Propagation (LP), to indirectly propagate labels from the source language to the target language in an iterative fashion. |
Abstract | For future work, on one hand, we plan to combine uncertainty sampling with diversity and informativeness measures; on the other hand, we intend to combine BAL with semi-supervised learning to further reduce human annotation efforts. |
Abstract | In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. |
Evaluation | In this set of experiments, we examined if the improvements in §3.2 can be explained primarily through the extraction of language model characteristics during the semi-supervised learning phase, or through orthogonal pieces of evidence. |
Introduction | Our work introduces a new take on the problem using graph-based semi-supervised learning to acquire translation rules and probabilities by leveraging both monolingual and parallel data resources. |
Experiments | Afterwards, word-syntactic pattern co-occurrence statistic is used as feature for a semi-supervised classifier TSVM (J oachims, 1999) to further refine the results. |
Introduction | At the same time, a semi-supervised convolutional neural model (Collobert et al., 2011) is employed to encode contextual semantic clue. |
Related Work | A recent research (Xu et al., 2013) extracted infrequent product features by a semi-supervised classifier, which used word-syntactic pattern co-occurrence statistics as features for the classifier. |
Experiments | Note that we ran Brown clustering only on the training documents; running it on a larger collection of (unlabeled) documents relevant to the prediction task (i.e., semi-supervised learning) is worth exploring in future work. |
Structured Regularizers for Text | This contrasts with typical semi-supervised learning methods for text categorization that combine unlabeled and labeled data within a generative model, such as multinomial na‘1've Bayes, via expectation-maximization (Nigam et al., 2000) or semi-supervised frequency estimation (Su et al., 2011). |
Structured Regularizers for Text | We leave comparison with other semi-supervised methods for future work. |