Abstract | Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data . |
Abstract | With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings. |
Introduction | In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data , have attracted more and more interest. |
Introduction | Previously, unlabeled data is explored to derive useful local-context features such as word clusters (Koo et al., 2008), subtree frequencies (Chen et al., 2009; Chen et al., 2013), and word co-occurrence counts (Zhou et al., 2011; Bansal and Klein, 2011). |
Introduction | A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009). |
Experiments | the dependency annotations off the training portion of each treebank, and use that as the unlabeled data for that target language. |
Experiments | -U: Our approach training on only parallel data without unlabeled data for the target language. |
Experiments | +U: Our approach training on both parallel and unlabeled data . |
Introduction | We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data . |
Our Approach | Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach. |
Our Approach | However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser. |
Our Approach | In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U. |
Experiments | In addition to labelled data, a large amount of unlabelled data on the web domain is also provided. |
Experiments | about labelled and unlabelled data are summarized in Table 1 and Table 2, respectively. |
Experiments | unlabelled data . |
Introduction | The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data). |
Introduction | The idea of learning representations from unlabelled data and then fine-tuning a model with such representations according to some supervised criterion has been studied before (Turian et al., 2010; Collobert et al., 2011; Glorot et al., 2011). |
Related Work | (2011) propose to learn representations from the mixture of both source and target domain unlabelled data to improve cross-domain sentiment classification. |
Related Work | Such high dimensional input gives rise to high computational cost and it is not clear whether those approaches can be applied to large scale unlabelled data , with hundreds of millions of training examples. |
Related Work | The new representations are induced based on the auxiliary tasks defined on unlabelled data together with a dimensionality reduction technique. |
Experiments | By integrating unlabeled data , the manifold model under setting (1) made a 15% improvement over linear regression model on F1 score, where the improvement was significant across all relations. |
Experiments | This tells us that estimating the label of unlabeled examples based upon the sampling result is one way to utilize unlabeled data and may help improve the relation extraction results. |
Experiments | On one hand, this result shows that using more unlabeled data can further improve the result. |
Identifying Key Medical Relations | Part of the resulting data was manually vetted by our annotators, and the remaining was held as unlabeled data for further experiments. |
Introduction | 0 From the perspective of relation extraction methodologies, we present a manifold model for relation extraction utilizing both labeled and unlabeled data . |
Relation Extraction with Manifold Models | Given a few labeled examples and many unlabeled examples for a relation, we want to build a relation detector leveraging both labeled and unlabeled data . |
Relation Extraction with Manifold Models | Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient. |
Relation Extraction with Manifold Models | When ,u = 0, the model disregards the unlabeled data , and the data manifold topology is not respected. |
Abstract | Our framework leverages on both labeled and unlabeled data in the target domain. |
Abstract | To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain. |
Experiments | negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data . |
Introduction | In the first phase, Supervised Voting is used to determine the relevance of each source domain to each region in the target domain, using both labeled and unlabeled data in the target domain. |
Introduction | By using also unlabeled data , we alleviate the lack of labeled samples for rarer relations due to imbalanced distributions in relation types. |
Introduction | reasonable predictive performance even when all the source domains are irrelevant and augments the rarer classes with examples in the unlabeled data . |
Problem Statement | :1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector, yi is the corresponding label (if available). |
Robust Domain Adaptation | With data from both the labeled and unlabeled data sets, we apply transductive inference or semi-supervised learning (Zhou et al., 2003) to achieve both (i) and (ii). |
Robust Domain Adaptation | By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data. |
Robust Domain Adaptation | Here, we have multiple objectives: the first term controls the training error; the second regularizes the complexity of the functions f js in the Reproducing Kernel Hilbert Space (RKHS) H; and the third prefers the predicted labels of the unlabeled data D; to be close to the reference predictions. |
Experiments | by the character-based alignment (VES-NO-GP), and the graph propagation (VES-GP—PL), are regarded as virtual evidences to bias CRFs model’s learning on the unlabeled data . |
Methodology | Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence. |
Methodology | In our setting, the CRFs model is required to learn from unlabeled data . |
Methodology | This work employs the posterior regularization (PR) framework3 (Ganchev et al., 2010) to bias the CRFs model’s learning on unlabeled data , under a constraint encoded by the graph propagation expression. |
Related Work | (2014), proposed GP for inferring the label information of unlabeled data , and then leverage these GP outcomes to learn a semi-supervised scalable model (e. g., CRFs). |
Related Work | One of our main objectives is to bias CRFs model’s learning on unlabeled data , under a nonlinear GP constraint encoding the bilingual knowledge. |
Related Work | (2008) described constraint driven learning (CODL) that augments model learning on unlabeled data by adding a cost for violating expectations of constraint features designed by domain knowledge. |
Experiments | In the supervised setting, we treated the test data as unlabeled data and performed transductive learning. |
Experiments | In the semi-supervised setting, our unlabeled data consists of |
Experiments | both the available unlabeled data and the test data. |
Introduction | In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning. |
Discussion and Future Work | We utilize unlabeled data in both generative and discriminative models for dependency syntax and in generative word clustering. |
Discussion and Future Work | Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision. |
Discussion and Future Work | We observe that careful use of these unlabeled data resources can improve performance on the end task. |
Abstract | The results suggest that it is possible to learn domain-specific entity types from unlabeled data . |
Conclusion | We evaluated an approach to learning domain-specific interpretable entity types from unlabeled data . |
Introduction | 0 we empirically evaluate an approach to learning types from unlabeled data |
Experiment | Previous work found that the performance can be improved by pre-training the character embeddings on large unlabeled data and using the obtained embeddings to initialize the character lookup table instead of random initialization |
Experiment | There are several ways to learn the embeddings on unlabeled data . |
Experiment | (2013), the bigram embeddings are pre-trained on unlabeled data with character embeddings, which significantly improves the model performance. |