Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
Jiang, Jing

Article Structure

Abstract

Creating labeled training data for relation extraction is expensive.

Introduction

Relation extraction is the task of detecting and characterizing semantic relations between entities from free text.

Related work

Recent work on relation extraction has been dominated by feature-based and kernel-based supervised learning methods.

Task definition

Our study is conducted using data from the Automatic Content Extraction (ACE) programl.

A multitask transfer learning solution

We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.

Experiments

5.1 Data set and experiment setup

Conclusions and future work

In this paper, we applied multitask transfer learning to solve a weakly-supervised relation extraction problem, leveraging both labeled instances of auxiliary relation types and human knowledge including hypotheses on feature generality and entity type constraints.

Topics

relation extraction

Appears in 25 sentences as: Relation extraction (1) relation extraction (21) relation extractor (3)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. Creating labeled training data for relation extraction is expensive.
    Page 1, “Abstract”
  2. In this paper, we study relation extraction in a special weakly-supervised setting when we have only a few seed instances of the target relation type we want to extract but we also have a large amount of labeled instances of other relation types.
    Page 1, “Abstract”
  3. Observing that different relation types can share certain common structures, we propose to use a multitask learning method coupled with human guidance to address this weakly-supervised relation extraction problem.
    Page 1, “Abstract”
  4. Relation extraction is the task of detecting and characterizing semantic relations between entities from free text.
    Page 1, “Introduction”
  5. Recent work on relation extraction has shown that supervised machine learning coupled with intelligent feature engineering or kernel design provides state-of-the-art solutions to the problem (Culotta and Sorensen, 2004; Zhou et al., 2005; Bunescu and Mooney, 2005; Qian et al., 2008).
    Page 1, “Introduction”
  6. While transfer learning was proposed more than a decade ago (Thrun, 1996; Caruana, 1997), its application in natural language processing is still a relatively new territory (Blitzer et al., 2006; Daume III, 2007; J iang and Zhai, 2007a; Arnold et al., 2008; Dredze and Crammer, 2008), and its application in relation extraction is still unexplored.
    Page 1, “Introduction”
  7. The framework naturally transfers the knowledge learned from the old relation types to the new relation type and helps improve the recall of the relation extractor .
    Page 1, “Introduction”
  8. Imposing these constraints further improves the precision of the final relation extractor .
    Page 2, “Introduction”
  9. Recent work on relation extraction has been dominated by feature-based and kernel-based supervised learning methods.
    Page 2, “Related work”
  10. (2005) and Zhao and Grishman (2005) studied various features and feature combinations for relation extraction .
    Page 2, “Related work”
  11. We systematically explored the feature space for relation extraction (Jiang and Zhai, 2007b) .
    Page 2, “Related work”

See all papers in Proc. ACL 2009 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

relation instances

Appears in 13 sentences as: relation instance (6) relation instances (7)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. We focus on extracting binary relation instances between two relation arguments occurring in the same sentence.
    Page 2, “Task definition”
  2. Some example relation instances and their corresponding relation types as defined by ACE can be found in Table 1.
    Page 2, “Task definition”
  3. Each pair of entities within a single sentence is considered a candidate relation instance , and the task becomes predicting whether or not each candidate is a true instance of T. We use feature-based logistic regression classifiers.
    Page 3, “Task definition”
  4. Following our preVious work (J iang and Zhai, 2007b), we extract features from a sequence representation and a parse tree representation of each relation instance .
    Page 3, “Task definition”
  5. (2008), we trim the parse tree of a relation instance so that it contains only the most essential components.
    Page 3, “Task definition”
  6. An example of the graphic representation of a relation instance is shown in Figure l and some features extracted from this instance are shown in Table 2.
    Page 3, “Task definition”
  7. Figure 1: The combined sequence and parse tree representation of the relation instance “leader of a minority government.” The nonessential nodes for “a” and for “minority” are removed based on the algorithm from Qian et al.
    Page 3, “Task definition”
  8. baseline, we use only the few seed instances of the target relation type together with labeled negative relation instances (i.e.
    Page 3, “Task definition”
  9. Let cc represent the feature vector of a candidate relation instance , and y 6 {+1, —1} represent a class label.
    Page 4, “A multitask transfer learning solution”
  10. After data cleaning, we obtained 4290 positive instances among 48614 candidate relation instances .
    Page 6, “Experiments”
  11. In order to concentrate on the classification accuracy for the target relation type, we removed the positive instances of the auxiliary relation types from the test set, although in practice we need to extract these auxiliary relation instances using learned classifiers for these relation types.
    Page 6, “Experiments”

See all papers in Proc. ACL 2009 that mention relation instances.

See all papers in Proc. ACL that mention relation instances.

Back to top.

weight vector

Appears in 12 sentences as: weight vector (9) weight vectors (4)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. The proposed framework models the commonality among different relation types through a shared weight vector , enables knowledge learned from the auxiliary relation types to be transferred to the target relation type, and allows easy control of the tradeoff between precision and recall.
    Page 1, “Abstract”
  2. Let wk denote the weight vector of the linear classifier that separates positive instances of auxiliary type Ak, from negative instances, and let wT denote a similar weight vector for the target type ’2'.
    Page 4, “A multitask transfer learning solution”
  3. If different relation types are totally unrelated, these weight vectors should also be independent of each other.
    Page 4, “A multitask transfer learning solution”
  4. But because we observe similar syntactic structures across different relation types, we now assume that these weight vectors are related through a common component V2
    Page 4, “A multitask transfer learning solution”
  5. Now we can learn these weight vectors in a multitask learning framework.
    Page 4, “A multitask transfer learning solution”
  6. We learn the optimal weight vectors {fikfifzv fiT and 3 by optimizing the following objective function:
    Page 4, “A multitask transfer learning solution”
  7. Here L(D, w) is the aggregated loss of labeling cc with y for all (x, y) in D, using weight vector to.
    Page 5, “A multitask transfer learning solution”
  8. The larger the ratio AZ/AV (or AZ / AV) is, the more we believe that the model for T (or A1,) should conform to the common model, and the smaller the type-specific weight vector #7 (or M“) will be.
    Page 5, “A multitask transfer learning solution”
  9. We can force the weights of these hypothesized type-specific features to be 0 in the shared weight vector V, i.e.
    Page 5, “A multitask transfer learning solution”
  10. the number of nonzero entries in the shared weight vector V. To see how the performance may vary as H changes, we plot the performance of TL-comb and TL-auto in terms of the average Fl across the seven target relation types, with H ranging from 100 to 50000.
    Page 7, “Experiments”
  11. In the multitask learning framework that we introduced, different relation types are treated as different but related tasks that are learned together, with the common structures among the relation types modeled by a shared weight vector .
    Page 8, “Conclusions and future work”

See all papers in Proc. ACL 2009 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.

entity type

Appears in 11 sentences as: Entity type (1) entity type (10) entity types (1)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. ditional human knowledge about the entity type constraints on the relation arguments, which can usually be derived from the definition of a relation type.
    Page 2, “Introduction”
  2. For example, we may be given the entity type restrictions on the two relation arguments.
    Page 2, “Task definition”
  3. Nodes that represent the arguments are also labeled with the entity type , subtype and mention type as defined by ACE.
    Page 3, “Task definition”
  4. Entity type features: We hypothesize that the entity types and subtypes of the relation arguments are also more likely to be associated with specific relation types.
    Page 5, “A multitask transfer learning solution”
  5. We refer to the set of features that contain the entity type or subtype of an argument as “arg-NE” features.
    Page 5, “A multitask transfer learning solution”
  6. 4.4 Imposing entity type constraints
    Page 5, “A multitask transfer learning solution”
  7. We therefore manually identify the entity type constraints for each target relation type based on the definition of the relation type given in the ACE annotation guidelines, and impose these type constraints as a final refinement step on top of the predicted positive instances.
    Page 6, “A multitask transfer learning solution”
  8. Finally, TL-NE builds on top of TL-comb and uses the entity type constraints to refine the predictions.
    Page 7, “Experiments”
  9. In this paper, we applied multitask transfer learning to solve a weakly-supervised relation extraction problem, leveraging both labeled instances of auxiliary relation types and human knowledge including hypotheses on feature generality and entity type constraints.
    Page 8, “Conclusions and future work”
  10. We also leveraged additional human knowledge about the target relation type in the form of entity type constraints.
    Page 8, “Conclusions and future work”
  11. Experiment results on the ACE 2004 data show that the multitask transfer learning method achieves the best performance when we combine human guidance with automatic general feature selection, followed by imposing the entity type constraints.
    Page 8, “Conclusions and future work”

See all papers in Proc. ACL 2009 that mention entity type.

See all papers in Proc. ACL that mention entity type.

Back to top.

domain adaptation

Appears in 6 sentences as: Domain adaptation (1) domain adaptation (5)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. Inspired by recent work on transfer learning and domain adaptation , in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.
    Page 1, “Introduction”
  2. Domain adaptation is a special case of transfer learning where the leam-ing task remains the same but the distribution
    Page 2, “Related work”
  3. There has been an increasing amount of work on transfer learning and domain adaptation in natural language processing recently.
    Page 2, “Related work”
  4. (2006) proposed a structural correspondence learning method for domain adaptation and applied it to part-of-speech tagging.
    Page 2, “Related work”
  5. Daume III (2007) proposed a simple feature augmentation method to achieve domain adaptation .
    Page 2, “Related work”
  6. (2008) used a hierarchical prior structure to help transfer learning and domain adaptation for named entity recognition.
    Page 2, “Related work”

See all papers in Proc. ACL 2009 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

labeled data

Appears in 4 sentences as: labeled data (4)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. However, supervised learning heavily relies on a sufficient amount of labeled data for training, which is not always available in practice due to the labor-intensive nature of human annotation.
    Page 1, “Introduction”
  2. Inspired by recent work on transfer learning and domain adaptation, in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.
    Page 1, “Introduction”
  3. We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.
    Page 4, “A multitask transfer learning solution”
  4. It is general for any transfer learning problem with auxiliary labeled data from similar tasks.
    Page 5, “A multitask transfer learning solution”

See all papers in Proc. ACL 2009 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

objective function

Appears in 4 sentences as: objective function (3) objective function: (1)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. We learn the optimal weight vectors {fikfifzv fiT and 3 by optimizing the following objective function:
    Page 4, “A multitask transfer learning solution”
  2. The objective function follows standard empirical risk minimization with regularization.
    Page 5, “A multitask transfer learning solution”
  3. Recall that we impose a constraint FV = 0 when optimizing the objective function .
    Page 5, “A multitask transfer learning solution”
  4. Fix F, and optimize the objective function as in Equation (1).
    Page 5, “A multitask transfer learning solution”

See all papers in Proc. ACL 2009 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

parse tree

Appears in 4 sentences as: parse tree (4)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. Following our preVious work (J iang and Zhai, 2007b), we extract features from a sequence representation and a parse tree representation of each relation instance.
    Page 3, “Task definition”
  2. Each node in the sequence or the parse tree is augmented by an argument tag that indicates whether the node subsumes arg-I, arg-2, both or neither.
    Page 3, “Task definition”
  3. (2008), we trim the parse tree of a relation instance so that it contains only the most essential components.
    Page 3, “Task definition”
  4. Figure 1: The combined sequence and parse tree representation of the relation instance “leader of a minority government.” The nonessential nodes for “a” and for “minority” are removed based on the algorithm from Qian et al.
    Page 3, “Task definition”

See all papers in Proc. ACL 2009 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

semi-supervised

Appears in 3 sentences as: semi-supervised (4)
In Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
  1. (2006) explored semi-supervised learning for relation extraction using label propagation, which makes use of unlabeled data.
    Page 2, “Related work”
  2. Another existing solution to weakly-supervised learning problems is semi-supervised learning, e.g.
    Page 4, “Task definition”
  3. However, because our proposed transfer learning method can be combined with semi-supervised learning, here we do not include semi-supervised learning as a baseline.
    Page 4, “Task definition”

See all papers in Proc. ACL 2009 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.