Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
Plank, Barbara and Moschitti, Alessandro

Article Structure

Abstract

Relation Extraction (RE) is the task of extracting semantic relationships between entities in text.

Introduction

Relation extraction is the task of extracting semantic relationships between entities in text, e.g.

Semantic Syntactic Tree Kernels

In kernel-based methods, both learning and classification only depend on the inner product between instances.

Related Work

Semantic syntactic tree kernels have been previously used for question classification (Bloehdorn and Moschitti, 2007a; Bloehdorn and Moschitti, 2007b; Croce et al., 2011).

Computational Structures for RE

A common way to represent a constituency-based relation instance is the PET (path-enclosed-tree), the smallest subtree including the two target entities (Zhang et al., 2006).

Experimental Setup

We treat relation extraction as a multi-class classification problem and use SVM-light-TK4 to train the binary classifiers.

Results

6.1 Alignment to Prior Work

Conclusions and Future Work

We proposed syntactic tree kernels enriched by lexical semantic similarity to tackle the portability of a relation extractor to different domains.

Topics

domain adaptation

Appears in 22 sentences as: Domain adaptation (1) domain adaptation (21)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. This is the problem of domain adaptation .
    Page 1, “Abstract”
  2. The empirical evaluation on ACE 2005 domains shows that a suitable combination of syntax and lexical generalization is very promising for domain adaptation .
    Page 1, “Abstract”
  3. This is the problem of domain adaptation (DA) or transfer learning (TL).
    Page 1, “Introduction”
  4. Technically, domain adaptation addresses the problem of leam-ing when the assumption of independent and identically distributed (i.i.d.)
    Page 1, “Introduction”
  5. Domain adaptation has been studied extensively during the last couple of years for various NLP tasks, e.g.
    Page 1, “Introduction”
  6. two shared tasks have been organized on domain adaptation for dependency parsing (Nivre et al., 2007; Petrov and McDonald, 2012).
    Page 1, “Introduction”
  7. Even a weakly supervised system is expected to perform well when applied to any kind of text (other domairfl genre), thus ideally, we believe that combining domain adaptation with weak supervision is the way to go in the future.
    Page 2, “Introduction”
  8. We focus on unsupervised domain adaptation , i.e.
    Page 2, “Introduction”
  9. Moreover, we consider a particular domain adaptation setting: single-system DA, i.e.
    Page 2, “Introduction”
  10. We consider this as a shift in what was considered domain adaptation in the past (adapt from source to a specific target) and what can be considered a somewhat different recent view of DA, that became widespread since 2011/2012.
    Page 2, “Introduction”
  11. In this setup, the domain adaptation problem boils down to finding a more robust system (Sszsgaard and Johannsen, 2012), i.e.
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

tree kernels

Appears in 22 sentences as: tree kernel (2) Tree Kernels (2) tree kernels (18)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. We encode word clusters or similarity in tree kernels , which, in turn, produce spaces of tree fragments.
    Page 2, “Introduction”
  2. Rather than only matching the surface string of words, lexical similarity enables soft matches between similar words in convolution tree kernels .
    Page 2, “Introduction”
  3. In the empirical evaluation on Automatic Content Extraction (ACE) data, we evaluate the impact of convolution tree kernels embedding lexical semantic similarities.
    Page 2, “Introduction”
  4. Commonly used kernels in NLP are string kernels (Lodhi et al., 2002) and tree kernels (Moschitti, 2006; Moschitti, 2008).
    Page 2, “Semantic Syntactic Tree Kernels”
  5. Figure l: Syntactic tree kernel (STK).
    Page 2, “Semantic Syntactic Tree Kernels”
  6. Syntactic tree kernels (Collins and Duffy, 2001) compute the similarity between two trees T1 and T2 by counting common sub-trees (cf.
    Page 2, “Semantic Syntactic Tree Kernels”
  7. Semantic syntactic tree kernels (Bloehdorn and Moschitti, 2007a; Bloehdorn and Moschitti, 2007b; Croce et al., 2011) provide one way to address this problem by introducing similarity 0 that allows soft matches between words and, consequently, between fragments containing them.
    Page 3, “Semantic Syntactic Tree Kernels”
  8. Semantic syntactic tree kernels have been previously used for question classification (Bloehdorn and Moschitti, 2007a; Bloehdorn and Moschitti, 2007b; Croce et al., 2011).
    Page 3, “Related Work”
  9. Thus, we present a novel application of semantic syntactic tree kernels and Brown clusters for domain adaptation of tree-kernel based relation extraction.
    Page 3, “Related Work”
  10. In fact, lexical information is highly affected by data-sparseness, thus tree kernels combined with semantic information created from additional resources should provide a way to obtain a more robust system.
    Page 4, “Computational Structures for RE”
  11. The idea is to (i) learn semantic similarity between words on the pivot corpus and (ii) use tree kernels embedding such a similarity to learn a RE system on the source, which allows to generalize to the new target domain.
    Page 4, “Computational Structures for RE”

See all papers in Proc. ACL 2013 that mention tree kernels.

See all papers in Proc. ACL that mention tree kernels.

Back to top.

relation extraction

Appears in 12 sentences as: Relation Extraction (1) Relation extraction (1) relation extraction (7) relation extractor (2) relation extractors (1)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. Relation Extraction (RE) is the task of extracting semantic relationships between entities in text.
    Page 1, “Abstract”
  2. Recent studies on relation extraction are mostly supervised.
    Page 1, “Abstract”
  3. In this paper, we propose to combine (i) term generalization approaches such as word clustering and latent semantic analysis (LSA) and (ii) structured kernels to improve the adaptability of relation extractors to new text genres/domains.
    Page 1, “Abstract”
  4. Relation extraction is the task of extracting semantic relationships between entities in text, e.g.
    Page 1, “Introduction”
  5. Recent studies on relation extraction have shown that supervised approaches based on either feature or kernel methods achieve state-of-the-art accuracy (Ze-lenko et al., 2002; Culotta and Sorensen, 2004;
    Page 1, “Introduction”
  6. However, to the best of our knowledge, there is almost no work on adapting relation extraction (RE) systems to new domains.1 There are some prior studies on the related tasks of malti-task transfer learning (Xu et al., 2008; Jiang, 2009) and distant supervision (Mintz et al., 2009), which are clearly related but different: the former is the problem of how to transfer knowledge from old to new relation types, while distant supervision tries to learn new relations from unlabeled text by exploiting weak-supervision in the form of a knowledge resource (e.g.
    Page 1, “Introduction”
  7. Weak supervision is a promising approach to improve a relation extraction system, especially to increase its coverage in terms of types of relations covered.
    Page 2, “Introduction”
  8. We propose to combine (i) term generalization approaches and (ii) structured kernels to improve the performance of a relation extractor on new domains.
    Page 2, “Introduction”
  9. Thus, we present a novel application of semantic syntactic tree kernels and Brown clusters for domain adaptation of tree-kernel based relation extraction .
    Page 3, “Related Work”
  10. (2006), including gold-standard information on entity and mention type substantially improves relation extraction performance.
    Page 4, “Computational Structures for RE”
  11. We treat relation extraction as a multi-class classification problem and use SVM-light-TK4 to train the binary classifiers.
    Page 5, “Experimental Setup”

See all papers in Proc. ACL 2013 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

semantic similarity

Appears in 10 sentences as: semantic similarities (1) semantic similarity (9)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. In the empirical evaluation on Automatic Content Extraction (ACE) data, we evaluate the impact of convolution tree kernels embedding lexical semantic similarities .
    Page 2, “Introduction”
  2. After introducing related work, we will discuss computational structures for RE and their extension with semantic similarity .
    Page 3, “Semantic Syntactic Tree Kernels”
  3. Combining syntax with semantics has a clear advantage: it generalizes lexical information encapsulated in syntactic parse trees, while at the same time syntax guides semantics in order to obtain an effective semantic similarity .
    Page 4, “Computational Structures for RE”
  4. We exploit this idea here for domain adaptation (DA): if words are generalized by semantic similarity LS, then in a hypothetical world changing LS such that it reflects the target domain would
    Page 4, “Computational Structures for RE”
  5. The question remains how to establish a link between the semantic similarity in the source and target domain.
    Page 4, “Computational Structures for RE”
  6. The idea is to (i) learn semantic similarity between words on the pivot corpus and (ii) use tree kernels embedding such a similarity to learn a RE system on the source, which allows to generalize to the new target domain.
    Page 4, “Computational Structures for RE”
  7. Thus, in contrast to SCL, we do not need to select a set of pivot features but rather rely on the distributional hypothesis to infer a semantic similarity from a large unlabeled corpus.
    Page 4, “Computational Structures for RE”
  8. Then, this similarity is incorporated into the tree kernel that provides the necessary restriction for an effective semantic similarity calculation.
    Page 4, “Computational Structures for RE”
  9. Since we focus on evaluating the impact of semantic similarity in tree kernels, we think our system is very competitive.
    Page 6, “Results”
  10. We proposed syntactic tree kernels enriched by lexical semantic similarity to tackle the portability of a relation extractor to different domains.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

lexical semantic

Appears in 4 sentences as: lexical semantic (4)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. In the empirical evaluation on Automatic Content Extraction (ACE) data, we evaluate the impact of convolution tree kernels embedding lexical semantic similarities.
    Page 2, “Introduction”
  2. The same holds also for the lexical semantic kernel based on LSA (P_LSA), however, to only two out of three domains.
    Page 7, “Results”
  3. As the two semantically enriched kernels, PETLSA and PET_WC, seem to capture different information we use composite kernels (rows 10-11): the baseline kernel (PET) summed with the lexical semantic kernels.
    Page 8, “Results”
  4. We proposed syntactic tree kernels enriched by lexical semantic similarity to tackle the portability of a relation extractor to different domains.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention lexical semantic.

See all papers in Proc. ACL that mention lexical semantic.

Back to top.

state of the art

Appears in 4 sentences as: state of the art (4)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. We first show that our system aligns well with the state of the art on the ACE 2004 benchmark.
    Page 2, “Introduction”
  2. We will use this gold information also in Section 6.1 to show that our system aligns well to the state of the art on the ACE 2004 benchmark.
    Page 4, “Computational Structures for RE”
  3. To compare our model against the state of the art we use the ACE 2004 data.
    Page 5, “Experimental Setup”
  4. Table 3 shows that our system (bottom) aligns well with the state of the art .
    Page 6, “Results”

See all papers in Proc. ACL 2013 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

binary classifier

Appears in 3 sentences as: binary classifier (2) binary classifiers (1)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. We will use a binary classifier trained on RE instance representations.
    Page 3, “Related Work”
  2. We treat relation extraction as a multi-class classification problem and use SVM-light-TK4 to train the binary classifiers .
    Page 5, “Experimental Setup”
  3. To estimate the importance weights, we train a binary classifier that distinguishes between source and target domain instances.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2013 that mention binary classifier.

See all papers in Proc. ACL that mention binary classifier.

Back to top.

labeled data

Appears in 3 sentences as: labeled data (3)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
    Page 1, “Abstract”
  2. However, the clear drawback of supervised methods is the need of training data, which can slow down the delivery of commercial applications in new domains: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
    Page 1, “Introduction”
  3. These correspondences are then integrated as new features in the labeled data of the source domain.
    Page 3, “Related Work”

See all papers in Proc. ACL 2013 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

Latent Semantic

Appears in 3 sentences as: Latent Semantic (2) latent semantic (1)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. In this paper, we propose to combine (i) term generalization approaches such as word clustering and latent semantic analysis (LSA) and (ii) structured kernels to improve the adaptability of relation extractors to new text genres/domains.
    Page 1, “Abstract”
  2. The latter is derived in two ways with: (a) Brown word clustering (Brown et al., 1992); and (b) Latent Semantic Analysis (LSA).
    Page 2, “Introduction”
  3. We study two ways for term generalization in tree kernels: Brown words clusters and Latent Semantic Analysis (LSA), both briefly described next.
    Page 4, “Computational Structures for RE”

See all papers in Proc. ACL 2013 that mention Latent Semantic.

See all papers in Proc. ACL that mention Latent Semantic.

Back to top.

loss function

Appears in 3 sentences as: loss function (3)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. Instance weighting is a method for domain adaptation in which instance-dependent weights are assigned to the loss function that is minimized during the training process.
    Page 3, “Related Work”
  2. Let [(30, y, 6) be some loss function .
    Page 3, “Related Work”
  3. Then, as shown in Jiang and Zhai (2007), the loss function can be weighted by 6il(:c,y,6), such that [3, = where P8 and Pt are the source and target distributions, respectively.
    Page 3, “Related Work”

See all papers in Proc. ACL 2013 that mention loss function.

See all papers in Proc. ACL that mention loss function.

Back to top.

semantic relationships

Appears in 3 sentences as: semantic relationships (2) semantically related (1)
In Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
  1. Relation Extraction (RE) is the task of extracting semantic relationships between entities in text.
    Page 1, “Abstract”
  2. Relation extraction is the task of extracting semantic relationships between entities in text, e.g.
    Page 1, “Introduction”
  3. For instance, the fragments corresponding to governor from Texas and head of Maryland are intuitively semantically related and should obtain a higher match when compared to mother of them.
    Page 3, “Semantic Syntactic Tree Kernels”

See all papers in Proc. ACL 2013 that mention semantic relationships.

See all papers in Proc. ACL that mention semantic relationships.

Back to top.