Abstract | In addition, it is challenging to generate sufficient high quality labeled data for supervised models with low cost. |
Abstract | Compared to the state-of-the-art supervised model trained from 100% labeled data, our proposed approach achieves comparable performance with 31% labeled data and obtains 5% absolute Fl gain with 50% labeled data . |
Conclusions | By studying three novel fine-grained relations, detecting semantically-related information with semantic meta paths, and exploiting the data manifolds in both unlabeled and labeled data for collective inference, our work can dramatically save annotation cost and achieve better performance, thus shed light on the challenging wikification task for tweets. |
Experiments | In comparision with the supervised baseline proposed by (Meij et al., 2012), our model SSRega1 relying on local compatibility already achieves comparable performance with 50% of labeled data . |
Experiments | We can easily see that our proposed approach using 50% labeled data achieves similar performance with the state-of-the-art supervised model with 100% labeled data . |
Experiments | 6.4 Effect of Labeled Data Size |
Introduction | Sufficient labeled data is crucial for supervised models. |
Introduction | In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data . |
Abstract | We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions. |
Introduction | However, most supervised learning algorithms require adequate labeled data for every relation type to be extracted. |
Introduction | Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data . |
Introduction | Together with imbalanced relation distributions inherent in the domain, this can cause some rarer relations to constitute only a very small proportion of the labeled data set. |
Problem Statement | The target domain has a few labeled data D; 2 {(xi, yi)}:. |
Problem Statement | For the sth source domain, we have an adequate labeled data set DS. |
Related Work | However, purely supervised relation extraction methods assume the availability of sufficient labeled data , which may be costly to obtain for new domains. |
Related Work | We address this by augmenting a small labeled data set with other information in the domain adaptation setting. |
Related Work | To create labeled data , the texts are dependency-parsed, and the domain-independent patterns on the parses form the basis for extractions. |
Robust Domain Adaptation | By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data . |
Datasets | (2006), we use sections 2 — 21 from Wall Street Journal (WSJ) as the source domain labeled data . |
Distribution Prediction | Our distribution prediction learning method is unsupervised in the sense that it does not require manually labeled data for a particular task from any of the domains. |
Domain Adaptation | The main reason that a model trained only on the source domain labeled data performs poorly in the target domain is the feature mismatch — few features in target domain test instances appear in source domain training instances. |
Experiments and Results | For each domain, the accuracy obtained y a classifier trained using labeled data from that |
Experiments and Results | This upper baseline represents the classification accuracy we could hope to obtain if we were to have labeled data for the target domain. |
Introduction | Our proposed cross-domain word distribution prediction method is unsupervised in the sense that it does not require any labeled data in either of the two steps. |
O \ | Unlike our distribution prediction method, which is unsupervised, SST requires labeled data for the source domain to learn a feature mapping between a source and a target domain in the form of a thesaurus. |
Related Work | (2006) append the source domain labeled data with predicted pivots (i.e. |
Related Work | The unsupervised DA setting that we consider does not assume the availability of labeled data for the target domain. |
Related Work | However, if a small amount of labeled data is available for the target domain, it can be used to further improve the performance of DA tasks (Xiao et al., 2013; Daume III, 2007). |
Abstract | With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings. |
Ambiguity-aware Ensemble Training | not sufficiently covered in manually labeled data . |
Ambiguity-aware Ensemble Training | Since 13’ contains much more instances than D (1.7M vs. 40K for English, and 4M vs. 16K for Chinese), it is likely that the unlabeled data may overwhelm the labeled data during SGD training. |
Ambiguity-aware Ensemble Training | 1: Input: Labeled data D = {(Xi,di)}£\;1, and unlabeled data 13’ = {(u,,v,)}§‘:1;Parameters: I, N1, M1, b : Output: w : Initialization: Wm) = O, k: = 0; : forz' = 1 to I do {iterations} Randomly select N1 instances from D and M1 instances from D’ to compose a new dataset Di, and shuffle it. |
Conclusions | The training objective is to maximize the mixed likelihood of both the labeled data and the auto-parsed unlabeled data with ambiguous labelings. |
Introduction | Such sentences can provide more discriminative instances for training which may be unavailable in labeled data . |
Introduction | Evaluation on labeled data shows the oracle accuracy of parse forest is much higher than that of l-best outputs of single parsers (see Table 3). |
Introduction | Finally, using a conditional random field (CRF) based probabilistic parser, we train a better model by maximizing mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings. |
Abstract | However, in some cases a small amount of human labeled data is available. |
Available at http://nlp. stanford.edu/software/mimlre. shtml. | Thus, our approach outperforms state-of-the-art model for relation extraction using much less labeled data that was used by Zhang et al., (2012) to outper- |
Introduction | In this paper, we present the first effective approach, Guided DS (distant supervision), to incorporate labeled data into distant supervision for extracting relations from sentences. |
Introduction | (2012), we generalize the labeled data through feature selection and model this additional information directly in the latent variable approaches. |
Introduction | While prior work employed tens of thousands of human labeled examples (Zhang et al., 2012) and only got a 6.5% increase in F-score over a logistic regression baseline, our approach uses much less labeled data (about 1/8) but achieves much higher improvement on performance over stronger baselines. |
The Challenge | Simply taking the union of the hand-labeled data and the corpus labeled by distant supervision is not effective since hand-labeled data will be swamped by a larger amount of distantly labeled data . |
The Challenge | An effective approach must recognize that the hand-labeled data is more reliable than the automatically labeled data and so must take precedence in cases of conflict. |
The Challenge | Instead we propose to perform feature selection to generalize human labeled data into training guidelines, and integrate them into latent variable model. |
Training | Upsam—pling the labeled data did not improve the performance either. |
Background | Recently, “distant supervision” has emerged to be a popular choice for training relation extractors without using manually labeled data (Mintz et al., 2009; J iang, 2009; Chan and Roth, 2010; Wang et al., 2011; Riedel et al., 2010; Ji et al., 2011; Hoffmann et al., 2011; Sur-deanu et al., 2012; Takamatsu et al., 2012; Min et al., 2013). |
Experiments | (1) Manifold Unlabeled: We combined the labeled data and unlabeled set 1 in training. |
Experiments | (2) Manifold Predicted Labels: We combined labeled data and unlabeled set 2 in training. |
Experiments | beled data and the data from unlabeled set 2 was used as labeled data (With Weights). |
Identifying Key Medical Relations | Our current strategy is to integrate all associated types, and rely on the relation detector trained with the labeled data to decide how to weight different types based upon the context. |
Introduction | When we build a naive model to detect relations, the model tends to overfit for the labeled data . |
Relation Extraction with Manifold Models | Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient. |
Abstract | The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited. |
Approach | PR makes the assumption that the labeled data we have is not enough for learning good model parameters, but we have a set of constraints on the posterior distribution of the labels. |
Experiments | For the MD dataset, we also used the dvd domain as additional labeled data for developing the constraints. |
Experiments | We found that the PR model is able to correct many CRF errors caused by the lack of labeled data . |
Experiments | However, with limited labeled data , the CRF learner can only associate very weak sentiment signals to these features. |
Related Work | Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data , instead, distant supervision mainly comes from linguistically-motivated constraints. |
Experiments | Due to these reasons, there is a lack of sufficient and high quality labeled data for emotion research. |
Experiments | Since in real world applications people are primarily concerned with how well the algorithm will work for new TV shows or movies that may not be included in the training data, we defined a test fold for each TV show or movie in our labeled data set. |
Experiments | Each test fold corresponded to a training fold containing all the labeled data from all the other TV shows and movies. |
Introduction | An active learner uses a small set of labeled data to iteratively select the most informative instances from a large pool of unlabeled data for human annotators to label (Settles, 2010). |
Related Work | In Active Learning (Settles, 2010) a small set of labeled data is used to find documents that should be annotated from a large pool of unlabeled documents. |
Experiments | The data set consists of labelled data for both the source (Wall Street Journal portion of the Penn Treebank) and target (web) domains. |
Experiments | Participants are not allowed to use web-domain labelled data for training. |
Experiments | In addition to labelled data , a large amount of unlabelled data on the web domain is also provided. |
Introduction | The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data). |
Abstract | Usually the extraction performance depends heavily on the quality and quantity of the labeled data , however, the manual annotation of a large-scale corpus is labor-intensive and time-consuming. |
Abstract | During iterations a batch of unlabeled instances are chosen in terms of their informativeness to the current classifier, labeled by an oracle and in turn added into the labeled data to retrain the classifier. |
Abstract | Input: - L, labeled data set - U, unlabeled data set - n, batch size Output: - SVM, classifier Repeat: 1. |
Experiments | 0 Self-training Segmenters (STS): two variant models were defined by the approach reported in (Subramanya et al., 2010) that uses the supervised CRFs model’s decodings, incorporating empirical and constraint information, for unlabeled examples as additional labeled data to retrain a CRFs model. |
Introduction | They leverage such mappings to either constitute a Chinese word dictionary for maximum-matching segmentation (Xu et al., 2004), or form labeled data for training a sequence labeling model (Paul et al., 2011). |
Methodology | Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence. |