A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Kim, Seokhwan and Lee, Gary Geunbae

Article Structure

Abstract

Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances.

Introduction

Relation extraction aims to identify semantic relations of entities in a document.

Cross-lingual Annotation Projection for Relation Extraction

Relation extraction can be considered to be a classification problem by the following classifier:

Graph Construction

The most crucial factor in the success of graph-based learning approaches is how to construct a graph that is appropriate for the target task.

Label Propagation

To induce labels for all of the unlabeled vertices on the graph constructed in Section 3, we utilize the label propagation algorithm (Zhu and Ghahramani, 2002), which is a graph-based semi-supervised learning algorithm.

Implementation

To demonstrate the effectiveness of the graph-based projection approach for relation extraction, we developed a Korean relation extraction system that was trained with projected annotations from English resources.

Evaluation

The experiments were performed on the manually annotated Korean test dataset.

Conclusions

This paper presented a novel graph-based projection approach for relation extraction.

Topics

relation extraction

Appears in 23 sentences as: Relation Extraction (1) Relation extraction (2) relation extraction (17) relation extractor (2) relation extractors (2)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances.
    Page 1, “Abstract”
  2. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.
    Page 1, “Abstract”
  3. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus.
    Page 1, “Abstract”
  4. Relation extraction aims to identify semantic relations of entities in a document.
    Page 1, “Introduction”
  5. Although many supervised machine learning approaches have been successfully applied to relation extraction tasks (Ze-lenko et al., 2003; Kambhatla, 2004; Bunescu and Mooney, 2005; Zhang et al., 2006), applications of these approaches are still limited because they require a sufficient number of training examples to obtain good extraction results.
    Page 1, “Introduction”
  6. Although these datasets encourage the development of relation extractors for these major languages, there are few labeled training samples for learning new systems in
    Page 1, “Introduction”
  7. Relation Extraction
    Page 1, “Introduction”
  8. Because manual annotation of semantic relations for such resource-poor languages is very expensive, we instead consider weakly supervised learning techniques (Riloff and Jones, 1999; Agichtein and Gravano, 2000; Zhang, 2004; Chen et al., 2006) to learn the relation extractors without significant annotation efforts.
    Page 1, “Introduction”
  9. Recently, some researchers attempted to use external resources, such as treebank (Banko et al., 2007) and Wikipedia (Wu and Weld, 2010), that were not specially constructed for relation extraction instead of using task-specific training or seed examples.
    Page 1, “Introduction”
  10. We previously proposed to leverage parallel corpora as a new kind of external resource for relation extraction (Kim et al., 2010).
    Page 1, “Introduction”
  11. To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were generated by a relation extraction system in a resource-rich source language.
    Page 1, “Introduction”

See all papers in Proc. ACL 2012 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

graph-based

Appears in 16 sentences as: graph-based (16)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus.
    Page 1, “Abstract”
  2. In this paper, we propose a graph-based projection approach for weakly supervised relation extraction.
    Page 1, “Introduction”
  3. The goal of our graph-based approach is to improve the robustness of the extractor with respect to errors that are generated and accumulated by preprocessors.
    Page 1, “Introduction”
  4. To solve both of these problems at once, we propose a graph-based projection approach for relation extraction.
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  5. The most crucial factor in the success of graph-based learning approaches is how to construct a graph that is appropriate for the target task.
    Page 2, “Graph Construction”
  6. Das and Petrov (Das and Petrov, 2011) proposed a graph-based bilingual projection of part-of-speech tagging by considering the tagged words in the source language as labeled examples and connecting them to the unlabeled words in the target language, while referring to the word alignments.
    Page 2, “Graph Construction”
  7. The graph for our graph-based projection is constructed by connecting related vertex pairs by weighted edges.
    Page 3, “Graph Construction”
  8. To induce labels for all of the unlabeled vertices on the graph constructed in Section 3, we utilize the label propagation algorithm (Zhu and Ghahramani, 2002), which is a graph-based semi-supervised learning algorithm.
    Page 3, “Label Propagation”
  9. To demonstrate the effectiveness of the graph-based projection approach for relation extraction, we developed a Korean relation extraction system that was trained with projected annotations from English resources.
    Page 3, “Implementation”
  10. Table 1: Comparison between direct and graph-based projection approaches to extract semantic relationships for four relation types
    Page 4, “Implementation”
  11. The graph-based projection was performed by the Junto toolkit 4 with the maximum number of iterations of 10 for each execution.
    Page 4, “Implementation”

See all papers in Proc. ACL 2012 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

semantic relationships

Appears in 11 sentences as: semantic relations (4) semantic relationship (1) semantic relationships (5) semantically related (1)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. Relation extraction aims to identify semantic relations of entities in a document.
    Page 1, “Introduction”
  2. Several datasets that provide manual annotations of semantic relationships are available from MUC (Grishman and Sund-heim, 1996) and ACE (Doddington et al., 2004) projects, but these datasets contain labeled training examples in only a few major languages, including English, Chinese, and Arabic.
    Page 1, “Introduction”
  3. Because manual annotation of semantic relations for such resource-poor languages is very expensive, we instead consider weakly supervised learning techniques (Riloff and Jones, 1999; Agichtein and Gravano, 2000; Zhang, 2004; Chen et al., 2006) to learn the relation extractors without significant annotation efforts.
    Page 1, “Introduction”
  4. Although some noise reduction strategies for projecting semantic relations were proposed (Kim et al., 2010), the direct projection approach is still vulnerable to erroneous inputs generated by submodules.
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  5. Graph construction for projecting semantic relationships is more complicated than part-of-speech tagging because the unit instance of projection is a pair of entities and not a word or morpheme that is equivalent to the alignment unit.
    Page 2, “Graph Construction”
  6. The larger the y+ value, the more likely the instance has a semantic relationship .
    Page 2, “Graph Construction”
  7. The other type of vertices, context vertices, are used for identifying relation descriptors that are contextual subtexts that represent semantic relationships of the positive instances.
    Page 3, “Graph Construction”
  8. In the case of English, we define the context vertex for each trigram that is located between a given entity pair that is semantically related .
    Page 3, “Graph Construction”
  9. Each context vertex us 6 U S and ut E U; also has y+ and y‘, which represent how likely the context is to denote semantic relationships .
    Page 3, “Graph Construction”
  10. Table 1: Comparison between direct and graph-based projection approaches to extract semantic relationships for four relation types
    Page 4, “Implementation”
  11. Experimental results show that our graph-based projection helped to improve the performance of the cross-lingual annotation projection of the semantic relations , and our system outperforms the other systems, which incorporate monolingual external resources.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention semantic relationships.

See all papers in Proc. ACL that mention semantic relationships.

Back to top.

parallel corpus

Appears in 7 sentences as: parallel corpus (7)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus .
    Page 1, “Abstract”
  2. To accomplish that goal, the method automatically creates a set of annotated text for ft, utilizing a well-made extractor f3 for a resource-rich source language L3 and a parallel corpus of LS and Lt.
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  3. where count (us, ut) is the number of alignments between us and ut across the whole parallel corpus .
    Page 3, “Graph Construction”
  4. We used an English-Korean parallel corpus 1 that contains 266,892 bi-sentence pairs in English and Korean.
    Page 3, “Implementation”
  5. 1The parallel corpus collected is available in our website: http://isoft.postech.ac.kr/"megaup/acl/datasets thtpzllreverbcs.washington.edu/
    Page 3, “Implementation”
  6. The English sentence annotations in the parallel corpus were then propagated into the corresponding Korean sentences.
    Page 4, “Implementation”
  7. We used the GIZA++ software 3 (Och and Ney, 2003) to obtain the word alignments for each bi-sentence in the parallel corpus .
    Page 4, “Implementation”

See all papers in Proc. ACL 2012 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

cross-lingual

Appears in 5 sentences as: Cross-lingual (1) cross-lingual (4)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.
    Page 1, “Abstract”
  2. To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were generated by a relation extraction system in a resource-rich source language.
    Page 1, “Introduction”
  3. Cross-lingual annotation projection intends to learn an extractor ft for good performance without significant effort toward building resources for a resource-poor target language L5.
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  4. Early studies in cross-lingual annotation projection were accomplished for various natural language processing tasks (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Hwa et al., 2005; Zitouni and Florian, 2008; Pado and Lapata, 2009).
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  5. Experimental results show that our graph-based projection helped to improve the performance of the cross-lingual annotation projection of the semantic relations, and our system outperforms the other systems, which incorporate monolingual external resources.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

extraction system

Appears in 5 sentences as: extraction system (5)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus.
    Page 1, “Abstract”
  2. To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were generated by a relation extraction system in a resource-rich source language.
    Page 1, “Introduction”
  3. To demonstrate the effectiveness of the graph-based projection approach for relation extraction, we developed a Korean relation extraction system that was trained with projected annotations from English resources.
    Page 3, “Implementation”
  4. We obtained 155,409 positive instances from the English sentences using an off-the-shelf relation extraction system , ReVerb 2 (Fader et al., 201 l).
    Page 3, “Implementation”
  5. The feasibility of our approach was demonstrated by our Korean relation extraction system .
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention extraction system.

See all papers in Proc. ACL that mention extraction system.

Back to top.

word alignments

Appears in 4 sentences as: word alignment (1) word alignments (3)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. However, these automatic annotations can be unreliable because of source text misclassification and word alignment errors; thus, it can cause a critical falling-off in the annotation projection quality.
    Page 2, “Cross-lingual Annotation Projection for Relation Extraction”
  2. Das and Petrov (Das and Petrov, 2011) proposed a graph-based bilingual projection of part-of-speech tagging by considering the tagged words in the source language as labeled examples and connecting them to the unlabeled words in the target language, while referring to the word alignments .
    Page 2, “Graph Construction”
  3. If the context vertices U S for the source language sentences are defined, then the units of context in the target language can also be created based on the word alignments .
    Page 3, “Graph Construction”
  4. We used the GIZA++ software 3 (Och and Ney, 2003) to obtain the word alignments for each bi-sentence in the parallel corpus.
    Page 4, “Implementation”

See all papers in Proc. ACL 2012 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

edge weight

Appears in 3 sentences as: edge weight (2) Edge Weights (1)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. 3.2 Edge Weights
    Page 3, “Graph Construction”
  2. If U” is matched to uk, the edge weight w (Ui’j , is assigned to 1.
    Page 3, “Graph Construction”
  3. The edge weight w(uk, ul) is computed by J accard’s coefficient between uk and ul.
    Page 3, “Graph Construction”

See all papers in Proc. ACL 2012 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.

manual annotation

Appears in 3 sentences as: manual annotation (1) manual annotations (1) manually annotated (1)
In A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
  1. Several datasets that provide manual annotations of semantic relationships are available from MUC (Grishman and Sund-heim, 1996) and ACE (Doddington et al., 2004) projects, but these datasets contain labeled training examples in only a few major languages, including English, Chinese, and Arabic.
    Page 1, “Introduction”
  2. Because manual annotation of semantic relations for such resource-poor languages is very expensive, we instead consider weakly supervised learning techniques (Riloff and Jones, 1999; Agichtein and Gravano, 2000; Zhang, 2004; Chen et al., 2006) to learn the relation extractors without significant annotation efforts.
    Page 1, “Introduction”
  3. The experiments were performed on the manually annotated Korean test dataset.
    Page 4, “Evaluation”

See all papers in Proc. ACL 2012 that mention manual annotation.

See all papers in Proc. ACL that mention manual annotation.

Back to top.