SciSurf: Index of 'Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web'

Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

Yan, Yulan and Okazaki, Naoaki and Matsuo, Yutaka and Yang, Zhenglu and Ishizuka, Mitsuru

Published in Proc. ACL, 2009

Article Structure

Abstract

This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates.

Introduction

Machine learning approaches for relation extraction tasks require substantial human effort, particularly when applied to the broad range of documents, entities, and relations existing on the Web.

Related Work

(Hasegawa et al., 2004) introduced a method for discovering a relation by clustering pairs of co-occurring entities represented as vectors of context features.

Characteristics of Wikipedia articles

Wikipedia, unlike the whole Web corpus, has several characteristics that markedly facilitate information extraction.

Pattern Combination Method for Relation Extraction

With the scene and challenges stated, we propose a solution in the following way.

Experiments

We apply our algorithm to two categories in Wikipedia: “American chief executives” and “Companies”.

Conclusions

To discover a range of semantic relations from a large corpus, we present an unsupervised relation extraction method using deep linguistic information to alleviate surface and noisy surface patterns generated from a large corpus, and use Web frequency information to ease the sparseness of linguistic information.

Topics

relation extraction

Appears in 16 sentences as: Relation Extraction (1) relation extraction (15)

In Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates.
Page 1, “Abstract”
Machine learning approaches for relation extraction tasks require substantial human effort, particularly when applied to the broad range of documents, entities, and relations existing on the Web.
Page 1, “Introduction”
Linguistic analysis is another effective technology for semantic relation extraction , as described in many reports such as (Kambhatla, 2004); (Bunescu and Mooney, 2005); (Harabagiu et al., 2005); (Nguyen et al., 2007).
Page 1, “Introduction”
Currently, linguistic approaches for semantic relation extraction are mostly supervised, relying on pre-specification of the desired relation or initial seed words or patterns from hand-coding.
Page 1, “Introduction”
As described herein, we consider integrating linguistic analysis with Web frequency information to improve the performance of unsupervised relation extraction .
Page 1, “Introduction”
Particularly, we specifically examine unsupervised relation extraction from existing texts of Wikipedia articles.
Page 1, “Introduction”
0 Our experimental results reveal that relations are extractable with good precision using linguistic patterns, whereas surface patterns from Web frequency information contribute greatly to the coverage of relation extraction .
Page 2, “Introduction”
0 The combination of these patterns produces a clustering method to achieve high precision for different Information Extraction applications, especially for bootstrapping a high-recall semi-supervised relation extraction system.
Page 2, “Introduction”
(Rosenfeld and Feldman, 2006) showed that the clusters discovered by URI are useful for seeding a semi-supervised relation extraction system.
Page 2, “Related Work”
In this paper, we propose an unsupervised relation extraction method that combines patterns of two types: surface patterns and dependency patterns.
Page 2, “Related Work”
Surface patterns are generated from the Web corpus to provide redundancy information for relation extraction .
Page 2, “Related Work”

See all papers in Proc. ACL 2009 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

semantic relation

Appears in 10 sentences as: semantic relation (6) semantic relations (4)

In Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

A salient challenge and research interest for frequent pattern mining is abstraction away from different surface realizations of semantic relations to discover discriminative patterns efficiently.
Page 1, “Introduction”
Linguistic analysis is another effective technology for semantic relation extraction, as described in many reports such as (Kambhatla, 2004); (Bunescu and Mooney, 2005); (Harabagiu et al., 2005); (Nguyen et al., 2007).
Page 1, “Introduction”
Currently, linguistic approaches for semantic relation extraction are mostly supervised, relying on pre-specification of the desired relation or initial seed words or patterns from hand-coding.
Page 1, “Introduction”
(Turney, 2006) presented an unsupervised algorithm for mining the Web for patterns expressing implicit semantic relations .
Page 2, “Related Work”
In addition, to obtain semantic information for concept pairs, we generate dependency patterns to abstract away from different surface realizations of semantic relations .
Page 2, “Related Work”
A common assumption is that, when investigating the semantics in articles such as those in Wikipedia (e. g. semantic Wikipedia (Volkel et al., 2006)), key information related to a concept described on a page p lies within the set of links l(p) on that page; particularly, it is likely that a salient semantic relation 7“ exists between p and a related page 19’ E l(p).
Page 3, “Characteristics of Wikipedia articles”
Given a concept described in a Wikipedia article, our idea of preprocessing executes initial consideration of all anchor-text concepts linking to other Wikipedia articles in the article as related concepts that might share a semantic relation with the entitled concept.
Page 4, “Pattern Combination Method for Relation Extraction”
Querying a concept pair using a search engine (Google), we characterize the semantic relation between the pair by leveraging the vast size of the Web.
Page 4, “Pattern Combination Method for Relation Extraction”
A salient difficulty posed by dependency pattern clustering is that concept pairs of the same semantic relation cannot be merged if they are expressed in different dependency structures.
Page 6, “Pattern Combination Method for Relation Extraction”
To discover a range of semantic relations from a large corpus, we present an unsupervised relation extraction method using deep linguistic information to alleviate surface and noisy surface patterns generated from a large corpus, and use Web frequency information to ease the sparseness of linguistic information.
Page 8, “Conclusions”

See all papers in Proc. ACL 2009 that mention semantic relation.

See all papers in Proc. ACL that mention semantic relation.

dependency path

Appears in 4 sentences as: dependency path (4)

In Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

We define dependency patterns as sub-paths of the shortest dependency path between a concept pair for two reasons.
Page 5, “Pattern Combination Method for Relation Extraction”
Shortest dependency path inducement.
Page 5, “Pattern Combination Method for Relation Extraction”
From the original dependency tree structure by parsing the selected sentence for each concept pair, we first induce the shortest dependency path with the entitled concept and related concept.
Page 5, “Pattern Combination Method for Relation Extraction”
We use a frequent tree-mining algorithm (Zaki, 2002) to generate sub-paths as dependency patterns from the shortest dependency path for relation clustering.
Page 5, “Pattern Combination Method for Relation Extraction”

See all papers in Proc. ACL 2009 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

semi-supervised

Appears in 3 sentences as: semi-supervised (3)

In Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

Even with semi-supervised approaches, which use a large unlabeled corpus, manual construction of a small set of seeds known as true instances of the target entity or relation is susceptible to arbitrary human decisions.
Page 1, “Introduction”
0 The combination of these patterns produces a clustering method to achieve high precision for different Information Extraction applications, especially for bootstrapping a high-recall semi-supervised relation extraction system.
Page 2, “Introduction”
(Rosenfeld and Feldman, 2006) showed that the clusters discovered by URI are useful for seeding a semi-supervised relation extraction system.
Page 2, “Related Work”

See all papers in Proc. ACL 2009 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.