Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
Nguyen, Thien Huu and Grishman, Ralph

Article Structure

Abstract

Relation extraction suffers from a performance loss when a model is applied to out-of-domain data.

Introduction

The goal of Relation Extraction (RE) is to detect and classify relation mentions between entity pairs into predefined relation types such as Employ-mentor Citizenship relationships.

Regularization

Given the more general representations provided by word representations above, how can we learn a relation extractor from the labeled source domain data that generalizes well to new domains?

Related Work

Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.

Word Representations

We consider two types of word representations and use them as additional features in our DA system, namely Brown word clustering (Brown et al., 1992) and word embeddings (Bengio et al., 2001).

Feature Set

5.1 Baseline Feature Set

Experiments

6.1 Tools and Data

Topics

word embeddings

Appears in 30 sentences as: Word Embedding (1) word embedding (6) Word Embeddings (1) word embeddings (25)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
    Page 1, “Abstract”
  2. We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
    Page 1, “Abstract”
  3. valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
    Page 1, “Introduction”
  4. ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
    Page 2, “Introduction”
  5. We explore the embedding-based features in a principled way and demonstrate that word embedding itself is also an effective representation for domain adaptation of RE.
    Page 2, “Introduction”
  6. More importantly, we show empirically that word embeddings and word clusters capture different information and their combination would further improve the adaptability of relation extractors.
    Page 2, “Introduction”
  7. Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
    Page 2, “Related Work”
  8. (2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
    Page 2, “Related Work”
  9. (2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.
    Page 2, “Related Work”
  10. However, none of these works evaluate word embeddings for domain adaptation
    Page 2, “Related Work”
  11. In terms of word embeddings for DA, recently, Xiao and Guo (2013) present a log-bilinear language adaptation framework for sequential labeling tasks.
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention word embeddings.

See all papers in Proc. ACL that mention word embeddings.

Back to top.

embeddings

Appears in 27 sentences as: Embeddings (1) embeddings (32)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
    Page 1, “Abstract”
  2. We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
    Page 1, “Abstract”
  3. valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
    Page 1, “Introduction”
  4. ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
    Page 2, “Introduction”
  5. More importantly, we show empirically that word embeddings and word clusters capture different information and their combination would further improve the adaptability of relation extractors.
    Page 2, “Introduction”
  6. Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
    Page 2, “Related Work”
  7. (2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
    Page 2, “Related Work”
  8. (2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.
    Page 2, “Related Work”
  9. However, none of these works evaluate word embeddings for domain adaptation
    Page 2, “Related Work”
  10. In terms of word embeddings for DA, recently, Xiao and Guo (2013) present a log-bilinear language adaptation framework for sequential labeling tasks.
    Page 3, “Related Work”
  11. Above all, we move one step further by evaluating the effectiveness of word embeddings on domain adaptation for RE which is very different from the principal topic of sequence labeling in the previous research.
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention embeddings.

See all papers in Proc. ACL that mention embeddings.

Back to top.

domain adaptation

Appears in 17 sentences as: Domain Adaptation (4) domain adaptation (13)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. This has fostered the development of domain adaptation techniques for relation extraction.
    Page 1, “Abstract”
  2. This is where we need to resort to domain adaptation techniques (DA) to adapt a model trained on one domain (the
    Page 1, “Introduction”
  3. Unfortunately, there is very little work on domain adaptation for RE.
    Page 1, “Introduction”
  4. as word clusters in domain adaptation of RE (Plank and Moschitti, 2013) is motivated by its successes in semi-supervised methods (Chan and Roth, 2010; Sun et al., 2011) where word representations help to reduce data-sparseness of lexical information in the training data.
    Page 2, “Introduction”
  5. We explore the embedding-based features in a principled way and demonstrate that word embedding itself is also an effective representation for domain adaptation of RE.
    Page 2, “Introduction”
  6. Exploiting the shared interest in generalization performance with traditional machine learning, in domain adaptation for RE, we would prefer the relation extractor that fits the source domain data, but also circumvents the overfitting problem
    Page 2, “Regularization”
  7. However, none of these works evaluate word embeddings for domain adaptation
    Page 2, “Related Work”
  8. Regarding domain adaptation , in representation
    Page 2, “Related Work”
  9. Above all, we move one step further by evaluating the effectiveness of word embeddings on domain adaptation for RE which is very different from the principal topic of sequence labeling in the previous research.
    Page 3, “Related Work”
  10. Moreover, in all the cases, regularization methods are still effective for domain adaptation of RE.
    Page 3, “Experiments”
  11. 6.3 Domain Adaptation with Word Embeddings
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

relation extractors

Appears in 13 sentences as: Relation Extraction (1) Relation extraction (1) relation extraction (3) relation extractor (2) relation extractors (7)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. Relation extraction suffers from a performance loss when a model is applied to out-of-domain data.
    Page 1, “Abstract”
  2. This has fostered the development of domain adaptation techniques for relation extraction .
    Page 1, “Abstract”
  3. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
    Page 1, “Abstract”
  4. Finally, we demonstrate the effectiveness of regularization for the adaptability of relation extractors .
    Page 1, “Abstract”
  5. The goal of Relation Extraction (RE) is to detect and classify relation mentions between entity pairs into predefined relation types such as Employ-mentor Citizenship relationships.
    Page 1, “Introduction”
  6. The only study explicitly targeting this problem so far is by Plank and Moschitti (2013) who find that the out-of-domain performance of kemel-based relation extractors can be improved by embedding semantic similarity information generated from word clustering and latent semantic analysis (LSA) into syntactic tree kernels.
    Page 1, “Introduction”
  7. We will demonstrate later that the adaptability of relation extractors can benefit significantly from the addition of word cluster
    Page 1, “Introduction”
  8. More importantly, we show empirically that word embeddings and word clusters capture different information and their combination would further improve the adaptability of relation extractors .
    Page 2, “Introduction”
  9. Given the more general representations provided by word representations above, how can we learn a relation extractor from the labeled source domain data that generalizes well to new domains?
    Page 2, “Regularization”
  10. Exploiting the shared interest in generalization performance with traditional machine learning, in domain adaptation for RE, we would prefer the relation extractor that fits the source domain data, but also circumvents the overfitting problem
    Page 2, “Regularization”
  11. Our relation extraction system is hierarchical (Bunescu and Mooney, 2005b; Sun et a1., 2011) and apply maximum entropy (MaxEnt) in the MALLET3 toolkit as the machine 1eaming tool.
    Page 3, “Experiments”

See all papers in Proc. ACL 2014 that mention relation extractors.

See all papers in Proc. ACL that mention relation extractors.

Back to top.

word representations

Appears in 11 sentences as: Word Representations (1) word representations (10)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. The application of word representations such
    Page 2, “Introduction”
  2. as word clusters in domain adaptation of RE (Plank and Moschitti, 2013) is motivated by its successes in semi-supervised methods (Chan and Roth, 2010; Sun et al., 2011) where word representations help to reduce data-sparseness of lexical information in the training data.
    Page 2, “Introduction”
  3. In DA terms, since the vocabularies of the source and target domains are usually different, word representations would mitigate the lexical sparsity by providing general features of words that are shared across domains, hence bridge the gap between domains.
    Page 2, “Introduction”
  4. Given the more general representations provided by word representations above, how can we learn a relation extractor from the labeled source domain data that generalizes well to new domains?
    Page 2, “Regularization”
  5. In fact, this setting can benefit considerably from our general approach of applying word representations and regularization.
    Page 2, “Regularization”
  6. We consider two types of word representations and use them as additional features in our DA system, namely Brown word clustering (Brown et al., 1992) and word embeddings (Bengio et al., 2001).
    Page 3, “Word Representations”
  7. 2We have the same observation as Plank and Moschitti (2013) that when the gold-standard labels are used, the impact of word representations is limited since the gold-standard information seems to dominate.
    Page 3, “Experiments”
  8. However, whenever the gold labels are not available or inaccurate, the word representations would be useful for improving adaptability performance.
    Page 3, “Experiments”
  9. This section examines the effectiveness of word representations for RE across domains.
    Page 5, “Experiments”
  10. Table 32 Domain Adaptation Results with Word Representations .
    Page 5, “Experiments”
  11. improved consistently demonstrating the robustness of word representations (Plank and Moschitti, 201 3).
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention word representations.

See all papers in Proc. ACL that mention word representations.

Back to top.

in-domain

Appears in 8 sentences as: In-domain (1) in-domain (7)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. We test our system on two scenarios: In-domain : the system is trained and evaluated on the source domain (bn+nw, 5-fold cross validation); Out-of-domain: the system is trained on the source domain and evaluated on the target development set of bc (bc dev).
    Page 4, “Experiments”
  2. 5All the in-domain improvement in rows 2, 6, 7 of Table 2 are significant at confidence levels 2 95%.
    Page 4, “Experiments”
  3. HLBL embeddings of 50 and 100 dimensions, the most effective way to introduce word embeddings is to add embeddings to the heads of the two mentions (row 2; both in-domain and out-of-domain) although it is less pronounced for HLBL embedding with 50 dimensions.
    Page 4, “Experiments”
  4. Interestingly, for C&W embedding with 25 dimensions, adding the embedding to both heads and words of the two mentions (row 6) performs the best for both in-domain and out-of-domain scenarios.
    Page 4, “Experiments”
  5. For both in-domain and out-of-domain settings with different numbers of dimensions, C&W embedding outperforms HLBL embedding when only the heads of the mentions are augmented while the degree of negative impact of HLBL embedding on chunk heads as well as context words seems less serious than C&W’s.
    Page 4, “Experiments”
  6. Regarding the incremental addition of features (rows 6, 7, 8), C&W is better for the out-of-domain performance when 50 dimensions are used, whereas HLBL (with both 50 and 100 dimensions) is more effective for the in-domain setting.
    Page 4, “Experiments”
  7. Table 3 presents the system performance (F measures) for both in-domain and out-of-domain settings.
    Page 5, “Experiments”
  8. (v): Finally, the in-domain performance is also
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention in-domain.

See all papers in Proc. ACL that mention in-domain.

Back to top.

feature set

Appears in 7 sentences as: Feature Set (1) feature set (6) feature sets (1)
In Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction
  1. Recent research in this area, whether feature-based (Kambhatla, 2004; Boschee et al., 2005; Zhou et al., 2005; Grishman et al., 2005; Jiang and Zhai, 2007a; Chan and Roth, 2010; Sun et al., 2011) or kernel-based (Zelenko et al., 2003; Bunescu and Mooney, 2005a; Bunescu and Mooney, 2005b; Zhang et al., 2006; Qian et al., 2008; Nguyen et al., 2009), attempts to improve the RE performance by enriching the feature sets from multiple sentence analyses and knowledge resources.
    Page 1, “Introduction”
  2. 5.1 Baseline Feature Set
    Page 3, “Feature Set”
  3. (2011) utilize the full feature set from (Zhou et al., 2005) plus some additional features and achieve the state-of-the-art feature-based RE system.
    Page 3, “Feature Set”
  4. Unfortunately, this feature set includes the human-annotated (gold-standard) information on entity and mention types which is often missing or noisy in reality (Plank and Moschitti, 2013).
    Page 3, “Feature Set”
  5. We apply the same feature set as Sun et al.
    Page 3, “Feature Set”
  6. We evaluate word cluster and embedding (denoted by ED) features by adding them individually as well as simultaneously into the baseline feature set .
    Page 5, “Experiments”
  7. This might be explained by the difference between our baseline feature set and the feature set underlying their kernel-based system.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.