Abstract | Distant supervision usually utilizes only unlabeled data and existing knowledge bases to learn relation extraction models. |
Abstract | In this paper, we demonstrate how a state-of-the-art multi-instance multi-label model can be modified to make use of these reliable sentence-level labels in addition to the relation-level distant supervision from a database. |
Available at http://nlp. stanford.edu/software/mimlre. shtml. | We also compare Guided DS with three state-of-the-art models: 1) MultiR and 2) MIML are two distant supervision models that support multi-instance learning and overlapping relations; 3) Mintz++ is a single-instance learning algorithm for distant supervision . |
Guided DS | Our goal is to jointly model human-labeled ground truth and structured data from a knowledge base in distant supervision . |
Introduction | Recently, distant supervision has emerged as an important technique for relation extraction and has attracted increasing attention because of its effective use of readily available databases (Mintz et al., 2009; Bunescu and Mooney, 2007; Snyder and Barzilay, 2007; Wu and Weld, 2007). |
Introduction | One of most crucial problems in distant supervision is the inherent errors in the automatically generated training data (Roth et al., 2013). |
Introduction | Surdeanu et al., 2012) have been proposed to address the issue by loosening the distant supervision assumption. |
The Challenge | Simply taking the union of the hand-labeled data and the corpus labeled by distant supervision is not effective since hand-labeled data will be swamped by a larger amount of distantly labeled data. |
Discussion and Future Work | Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision . |
Experiments | WSJ"o Distant Supervision SAJM’ 10 44.8 none SAJ’ 13 64.4 none _ SJA’10 _ _ _ _ _ _ ' _ _5_O._4_ _HT_ML_ _ _ ' NB’ 11 59.4 ACE05 _ DMVZbE) _ _ _ _ _ ' _ ‘24—8_ _n_one_ _ _ _ ' DMV+C (bc) 44.8 SRL Marginalized, IGC 48.8 SRL Marginalized, IGB 5 8 .9 SRL |
Experiments | Interestingly, the marginalized grammars best the DMV grammar induction method; however, this difference is less pronounced when the DMV is constrained using SRL labels as distant supervision . |
Experiments | We contrast with methods using distant supervision (Naseem and Barzilay, 2011; Spitkovsky et al., 2010b) and fully unsupervised dependency parsing (Spitkovsky et al., 2013). |
Introduction | In the pipeline models, we develop a novel approach to unsupervised grammar induction and explore performance using SRL as distant supervision . |
Related Work | In our low-resource pipelines, we assume that the syntactic parser is given no labeled parses—however, it may optionally utilize the semantic parses as distant supervision . |
Related Work | Grammar induction work has further demonstrated that distant supervision in the form of ACE-style relations (Naseem and Barzilay, 2011) or HTML markup (Spitkovsky et al., 2010b) can lead to considerable gains. |
Abstract | We present an approach to training a joint syntactic and semantic parser that combines syntactic training information from CCGbank with semantic training information from a knowledge base via distant supervision . |
Parameter Estimation | Distant supervision is provided by the following constraint: every relation instance 7“(€1,€2) E K must be expressed by at least one sentence in 8031,62), the set of sentences that mention both 61 and 62 (Hoffmann et al., 2011). |
Parameter Estimation | tures of the best set of parses that satisfy the distant supervision constraint. |
Parameter Estimation | This maximization is intractable due to the coupling between logical forms in E caused by enforcing the distant supervision constraint. |
Prior Work | The parser presented in this paper can be viewed as a combination of both a broad coverage syntactic parser and a semantic parser trained using distant supervision . |
Abstract | In addition to traditional linguistic features used in distant supervision for information extraction, our approach also takes into account network information, a unique opportunity offered by social media. |
Conclusion and Future Work | We construct the publicly available dataset based on distant supervision and experiment our model on three useful user profile attributes, i.e., Education, Job and Spouse. |
Introduction | Inspired by the concept of distant supervision , we collect training tweets by matching attribute ground truth from an outside “knowledge base” such as Facebook or Google Plus. |
Model | The distant supervision assumes that if entity 6 corresponds to an attribute for user i, at least one posting from user i’s Twitter stream containing a mention of 6 might express that attribute. |
Related Work | Distant Supervision Distant supervision , also known as weak supervision, is a method for leam-ing to extract relations from text using ground truth from an existing database as a source of supervision. |
Related Work | Rather than relying on mention-level annotations, which are expensive and time consuming to generate, distant supervision leverages readily available structured data sources as a weak source of supervision for relation extraction from related text corpora (Craven et al., 1999). |
Related Work | In addition to the wide use in text entity relation extraction (Mintz et al., 2009; Ritter et al., 2013; Hoffmann et al., 2011; Surdeanu et al., 2012; Takamatsu et al., 2012), distant supervision has been applied to multiple |
Discussion | We have mentioned that the basic alignment assumption of distant supervision (Mintz et al., 2009) tends to generate noisy (noisy features and |
Introduction | Therefore, the distant supervision paradigm may generate incomplete labeling corpora. |
Introduction | To the best of our knowledge, we are the first to apply this technique on relation extraction with distant supervision . |
Related Work | The idea of distant supervision was firstly proposed in the field of bioinformatics (Craven and Kumlien, 1999). |
Related Work | 11It is the abbreviation for Distant supervision for Relation extraction with Matrix Completion |
Related Work | However, they did not concern about the data noise brought by the basic assumption of distant supervision . |
Experiments | We also use two distant supervision approaches for the comparison. |
Related Work | Distant supervision (DS) is a semi-supervised RE framework and has attracted many attentions (Bunescu, 2007; Mintz et al., 2009; Yao et al., 2010; Surdeanu et al., 2010; Hoffmann et al., 2011; Surdeanu et al., 2012). |
Related Work | (2013) utilize relation cardinality to create negative samples for distant supervision while we use both implicit type clues and relation cardinality expectations to discover possible inconsistencies among local predictions. |
The Framework | Since we will focus on the open domain relation extraction, we still follow the distant supervision paradigm to collect our training data guided by a KB, and train the local extractor accordingly. |
Abstract | We test the DCNN in four experiments: small scale binary and multi-class sentiment prediction, six-way question classification and Twitter sentiment prediction by distant supervision . |
Experiments | 5.4 TWitter Sentiment Prediction with Distant Supervision |
Introduction | The fourth experiment involves predicting the sentiment of Twitter posts using distant supervision (Go et al., 2009). |
Abstract | Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. |
Introduction | Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk, as well as methods like distant supervision and co-training that involve automatically generating training data. |
Introduction | Although small amounts of noise may not be detrimental, in some applications the level can be high: upon manually inspecting a relation extraction corpus commonly used in distant supervision , Riedel et al. |
Background | Recently, “distant supervision” has emerged to be a popular choice for training relation extractors without using manually labeled data (Mintz et al., 2009; J iang, 2009; Chan and Roth, 2010; Wang et al., 2011; Riedel et al., 2010; Ji et al., 2011; Hoffmann et al., 2011; Sur-deanu et al., 2012; Takamatsu et al., 2012; Min et al., 2013). |
Identifying Key Medical Relations | This ( distant supervision ) approach resulted in a huge amount of sentences that contain the desired relations, but also brought in a lot of noise in the form of false positives. |
Relation Extraction with Manifold Models | This feature is useful when the training data comes from “crowdsourcing” or “distant supervision” . |