Robust Domain Adaptation for Relation Extraction via Clustering Consistency
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong

Article Structure

Abstract

We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains.

Introduction

The World Wide Web contains information on real-world entities, such as persons, locations and

Related Work

Relation extraction is usually considered a classification problem: determine if two given entities in a sentence have a given relation.

Problem Statement

This section defines the domain adaptation problem and describes our feature extraction scheme.

Robust Domain Adaptation

In this section, we describe our two-phase approach, which comprises of a Supervised Voting scheme and a combined classifier learning phase.

Experiments

We evaluate our algorithm on two corpora: Automatic Content Extraction (ACE) 2004 and YAGO3.

Conclusion and Future Work

In this paper, we have proposed a robust domain adaptation (RDA) approach for the relation extraction problem where labeled data is scarce.

Topics

relation extraction

Appears in 26 sentences as: Relation Extraction (1) Relation extraction (1) relation extraction (23) relation extractor (1)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains.
    Page 1, “Abstract”
  2. Our method outperforms numerous baselines and a weakly-supervised relation extraction method on ACE 2004 and YAGO.
    Page 1, “Abstract”
  3. Recent work on relation extraction has demonstrated that supervised machine learning coupled with intelligent feature engineering can provide state-of-the-art performance (Jiang and Zhai, 2007b).
    Page 1, “Introduction”
  4. Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data.
    Page 1, “Introduction”
  5. This paper considers relation adaptation, where a relation extraction system trained on many source domains is adapted to a new target domain.
    Page 1, “Introduction”
  6. There are at least three challenges when adapting a relation extraction system to a new domain.
    Page 1, “Introduction”
  7. We compare the proposed two-phase framework with state-of-the-art domain adaptation baselines for the relation extraction task, and we find that our method outperforms the baselines.
    Page 2, “Introduction”
  8. Relation extraction is usually considered a classification problem: determine if two given entities in a sentence have a given relation.
    Page 2, “Related Work”
  9. However, purely supervised relation extraction methods assume the availability of sufficient labeled data, which may be costly to obtain for new domains.
    Page 2, “Related Work”
  10. Bootstrapping methods (Zhu et al., 2009; Agichtein and Gravano, 2000; Xu et al., 2010; Pasca et al., 2006; Riloff and Jones, 1999) to relation extraction are attractive because they require fewer training instances than supervised approaches.
    Page 2, “Related Work”
  11. Different from bootstrapping, we not only use labeled target domain data as seeds, but also leverage on existing source-domain predictors to obtain a robust relation extractor for the target domain.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

labeled data

Appears in 22 sentences as: labeled data (23)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions.
    Page 1, “Abstract”
  2. However, most supervised learning algorithms require adequate labeled data for every relation type to be extracted.
    Page 1, “Introduction”
  3. Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data .
    Page 1, “Introduction”
  4. Together with imbalanced relation distributions inherent in the domain, this can cause some rarer relations to constitute only a very small proportion of the labeled data set.
    Page 2, “Introduction”
  5. However, purely supervised relation extraction methods assume the availability of sufficient labeled data , which may be costly to obtain for new domains.
    Page 2, “Related Work”
  6. We address this by augmenting a small labeled data set with other information in the domain adaptation setting.
    Page 2, “Related Work”
  7. To create labeled data , the texts are dependency-parsed, and the domain-independent patterns on the parses form the basis for extractions.
    Page 2, “Related Work”
  8. In the weakly-supervised approach, we have plenty of labeled data for the source domain and a few labeled instances in the target domain; in the unsupervised approach, the data for the target domain are not labeled.
    Page 3, “Related Work”
  9. The target domain has a few labeled data D; 2 {(xi, yi)}:.
    Page 3, “Problem Statement”
  10. For the sth source domain, we have an adequate labeled data set DS.
    Page 3, “Problem Statement”
  11. By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data .
    Page 4, “Robust Domain Adaptation”

See all papers in Proc. ACL 2014 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

domain adaptation

Appears in 17 sentences as: Domain Adaptation (3) Domain adaptation (1) domain adaptation (12) Domain Adaptive (1)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. To tackle these challenges, we propose a two-phase Robust Domain Adaptation (RDA) framework.
    Page 2, “Introduction”
  2. We compare the proposed two-phase framework with state-of-the-art domain adaptation baselines for the relation extraction task, and we find that our method outperforms the baselines.
    Page 2, “Introduction”
  3. We address this by augmenting a small labeled data set with other information in the domain adaptation setting.
    Page 2, “Related Work”
  4. Domain adaptation methods can be classified broadly into weakly-supervised adaptation (Daume and Marcu, 2007; Blitzer et al., 2006; Jiang and Zhai, 2007a; Jiang, 2009), and unsupervised adaptation (Pan et al., 2010; Blitzer et al., 2006; Plank and Moschitti, 2013).
    Page 3, “Related Work”
  5. This section defines the domain adaptation problem and describes our feature extraction scheme.
    Page 3, “Problem Statement”
  6. 3.1 Relation Extraction Domain Adaptation
    Page 3, “Problem Statement”
  7. We define domain adaptation as the problem of learning a classifier p for relation extraction in the target domain using the data sets D1, Du and DS, 5 = 1,. .
    Page 3, “Problem Statement”
  8. , fc using the one-versus-rest decoding for multi-class classification.2 Inspired by the Domain Adaptive Machine (Duan et al., 2009), we combine the reference predictions and the labeled data of the target domain to learn these functions:
    Page 6, “Robust Domain Adaptation”
  9. We compare our framework with several other methods, including state-of-the-art machine learning, relation extraction and common domain adaptation methods.
    Page 7, “Experiments”
  10. Adaptive domain bootstrapping (DAB) This is an instance-based domain adaptation method for relation extraction (Xu et al., 2010).
    Page 7, “Experiments”
  11. Structural correspondence learning (SCL) We use the feature-based domain adaptation approach by Blitzer et a1.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

unlabeled data

Appears in 10 sentences as: unlabeled data (10)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. Our framework leverages on both labeled and unlabeled data in the target domain.
    Page 1, “Abstract”
  2. To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain.
    Page 1, “Abstract”
  3. In the first phase, Supervised Voting is used to determine the relevance of each source domain to each region in the target domain, using both labeled and unlabeled data in the target domain.
    Page 2, “Introduction”
  4. By using also unlabeled data , we alleviate the lack of labeled samples for rarer relations due to imbalanced distributions in relation types.
    Page 2, “Introduction”
  5. reasonable predictive performance even when all the source domains are irrelevant and augments the rarer classes with examples in the unlabeled data .
    Page 2, “Introduction”
  6. :1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector, yi is the corresponding label (if available).
    Page 3, “Problem Statement”
  7. With data from both the labeled and unlabeled data sets, we apply transductive inference or semi-supervised learning (Zhou et al., 2003) to achieve both (i) and (ii).
    Page 4, “Robust Domain Adaptation”
  8. By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data.
    Page 4, “Robust Domain Adaptation”
  9. Here, we have multiple objectives: the first term controls the training error; the second regularizes the complexity of the functions f js in the Reproducing Kernel Hilbert Space (RKHS) H; and the third prefers the predicted labels of the unlabeled data D; to be close to the reference predictions.
    Page 6, “Robust Domain Adaptation”
  10. negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

relation instance

Appears in 9 sentences as: relation instance (7) relation instances (3)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. In contrast to Open IE, we tune the relation patterns for a domain of interest, using labeled relation instances in source and target domains and unlabeled instances in the target domain.
    Page 3, “Related Work”
  2. We consider relation extraction as a classification problem, where each pair of entities A and B within a sentence S is a candidate relation instance .
    Page 3, “Problem Statement”
  3. We extract features from a sequence representation and a parse tree representation of each relation instance .
    Page 3, “Problem Statement”
  4. use a subgraph in the relation instance graph (J iang and Zhai, 2007b) that contains only the node presenting the head word of the entity A, labeled with the entity type or entity mention types, to describe a single entity attribute.
    Page 4, “Problem Statement”
  5. Syntactic Features The syntactic parse tree of the relation instance sentence can be augmented to represent the relation instance .
    Page 4, “Problem Statement”
  6. We trim the parse tree of a relation instance so that it contains only the most essential tree components based on constituent dependencies (Qian et al., 2008).
    Page 4, “Problem Statement”
  7. We also use unigram features and bigram features from a relation instance graph.
    Page 4, “Problem Statement”
  8. The candidate relation instances were generated by considering all pairs of entities that occur in the same sentence.
    Page 6, “Experiments”
  9. No-transfer classifier (NT) We only use the few labeled instances of the target relation type together with the negative relation instances to train a binary classifier.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention relation instance.

See all papers in Proc. ACL that mention relation instance.

Back to top.

in-domain

Appears in 5 sentences as: In-domain (2) in-domain (3)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. In-domain multiclass classifier This is Support-vector-machine (Fan et al., 2008, SVM) using the one-versus-rest decoding without removing positive labeled data (Jiang and Zhai, 2007b) from the target domain.
    Page 7, “Experiments”
  2. From Table 3 and Table 5, we see that the proposed method has the best F1 among all the other methods, except for the supervised upper bound ( In-domain ).
    Page 7, “Experiments”
  3. Performance Gap From Tables 2 to 4, we observe that the smallest performance gap between RDA and the in-domain settings is still high (about 12% with k = 5) on ACE 2004.
    Page 9, “Experiments”
  4. Comparing with the in-domain results in Table 5 (which is constant with k), Table 6 also shows a similar trend on YAGO.
    Page 9, “Experiments”
  5. By exploiting the labeled data in ten source domains in YAGO, our RDA algorithm can reduce the gap between the cross-domain and in-domain settings to 9%.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention in-domain.

See all papers in Proc. ACL that mention in-domain.

Back to top.

parse tree

Appears in 5 sentences as: parse tree (4) parse trees (2)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. We extract features from a sequence representation and a parse tree representation of each relation instance.
    Page 3, “Problem Statement”
  2. Syntactic Features The syntactic parse tree of the relation instance sentence can be augmented to represent the relation instance.
    Page 4, “Problem Statement”
  3. Each node in the sequence or the parse tree is augmented by an argument tag that indicates whether the node corresponds to entity A, B, both, or neither.
    Page 4, “Problem Statement”
  4. We trim the parse tree of a relation instance so that it contains only the most essential tree components based on constituent dependencies (Qian et al., 2008).
    Page 4, “Problem Statement”
  5. The constituent parse trees were then transformed into dependency parse trees , using the head of each constituent (Jiang and Zhai, 2007b).
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

entity type

Appears in 4 sentences as: entity type (2) Entity types (1) entity types (1)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. Entity Features Entity types and entity mention types are very useful for relation extraction.
    Page 3, “Problem Statement”
  2. use a subgraph in the relation instance graph (J iang and Zhai, 2007b) that contains only the node presenting the head word of the entity A, labeled with the entity type or entity mention types, to describe a single entity attribute.
    Page 4, “Problem Statement”
  3. The nodes that represent the argument are also labeled with the entity type , subtype and mention type.
    Page 4, “Problem Statement”
  4. YAGO is different from ACE 2004 in two aspects: there is less overlapping of topics, entity types and relation types between domains; and it has more relation mentions with 11 mentions per pair of entities on the average.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention entity type.

See all papers in Proc. ACL that mention entity type.

Back to top.

extraction system

Appears in 4 sentences as: extraction system (3) extraction systems (1)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data.
    Page 1, “Introduction”
  2. This paper considers relation adaptation, where a relation extraction system trained on many source domains is adapted to a new target domain.
    Page 1, “Introduction”
  3. There are at least three challenges when adapting a relation extraction system to a new domain.
    Page 1, “Introduction”
  4. Among these studies, Plank and Moschitti’s is the closest to ours because it adapts relation extraction systems to new domains.
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention extraction system.

See all papers in Proc. ACL that mention extraction system.

Back to top.

binary classifier

Appears in 3 sentences as: binary classification (1) binary classifier (2)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. For each domain in YAGO, we have a binary classification task: whether the instance has the relation corresponding to the domain.
    Page 6, “Experiments”
  2. No-transfer classifier (NT) We only use the few labeled instances of the target relation type together with the negative relation instances to train a binary classifier .
    Page 7, “Experiments”
  3. Alternate no-transfer classifier (NT-U) We use the union of the k source-domain labeled data sets Dss and the small set of target-domain labeled data D; to train a binary classifier .
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention binary classifier.

See all papers in Proc. ACL that mention binary classifier.

Back to top.

feature vector

Appears in 3 sentences as: feature vector (2) feature vectors (1)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. We introduce a feature extraction x that maps the triple (A,B,S) to its feature vector x.
    Page 3, “Problem Statement”
  2. :1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector , yi is the corresponding label (if available).
    Page 3, “Problem Statement”
  3. negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

natural language

Appears in 3 sentences as: Natural Language (1) natural language (2)
In Robust Domain Adaptation for Relation Extraction via Clustering Consistency
  1. Given a pair of entities (A,B) in S, the first step is to express the relation between A and B with some feature representation using a feature extraction scheme x. Lexical or syntactic patterns have been successfully used in numerous natural language processing tasks, including relation extraction.
    Page 3, “Problem Statement”
  2. Each node is augmented with relevant part-of-speech (POS) using the Python Natural Language Processing Tool Kit.
    Page 4, “Problem Statement”
  3. Because not-a-relation is a background or default relation type in the relation classification task, and because it has rather high variation when manifested in natural language , we have found it difficult to obtain a distance metric W that allows the not-a-relation samples to form clusters naturally using transductive inference.
    Page 5, “Robust Domain Adaptation”

See all papers in Proc. ACL 2014 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.