Extracting Lexical Reference Rules from Wikipedia
Shnarch, Eyal and Barak, Libby and Dagan, Ido

Article Structure

Abstract

This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms.

Introduction

A most common need in applied semantic inference is to infer the meaning of a target term from other terms in a text.

Background

Many works on machine readable dictionaries utilized definitions to identify semantic relations between words (Ide and Jean, 1993).

Extracting Rules from Wikipedia

Our goal is to utilize the broad knowledge of Wikipedia to extract a knowledge base of lexical reference rules.

Extraction Methods Analysis

We applied our rule extraction methods over a version of Wikipedia available in a database constructed by (Zesch et al., 2007)2.

The asterisk denotes an incorrect rule

not bear its primary meaning, while it has a modifier which serves as the semantic head (Fillmore et al., 2002; Grishman et al., 1986).

Application Oriented Evaluations

Our primary application oriented evaluation is within an unsupervised lexical expansion scenario applied to a text categorization data set (Section 6.1).

Conclusions and Future Work

We presented construction of a large-scale resource of lexical reference rules, as useful in applied lexical inference.

Topics

WordNet

Appears in 29 sentences as: WordNet (29) WordNet’s (2)
In Extracting Lexical Reference Rules from Wikipedia
  1. Our rule-base yields comparable performance to WordNet while providing largely complementary information.
    Page 1, “Abstract”
  2. A prominent available resource is WordNet (Fellbaum, 1998), from which classical relations such as synonyms, hyponyms and some cases of meronyms may be used as LR rules.
    Page 1, “Introduction”
  3. An extension to WordNet was presented by (Snow et al., 2006).
    Page 1, “Introduction”
  4. The rule base utility was evaluated within two lexical expansion applications, yielding better results than other automatically constructed baselines and comparable results to WordNet .
    Page 2, “Introduction”
  5. A combination with WordNet achieved the best performance, indicating the significant marginal contribution of our rule base.
    Page 2, “Introduction”
  6. Ponzetto and Strube (2007) identified the subsumption (ISA) relation from Wikipedia’s category tags, while in Yago (Suchanek et al., 2007) these tags, redirect links and WordNet were used to identify instances of 14 predefined specific semantic relations.
    Page 2, “Background”
  7. classifiers, whose training examples are derived automatically from WordNet .
    Page 2, “Background”
  8. They use these classifiers to suggest extensions to the WordNet hierarchy, the largest one consisting of 400K new links.
    Page 2, “Background”
  9. This is in contrast to (Snow et al., 2005) which focused only on hyponymy and synonymy relations and could therefore extract positive and negative examples from WordNet .
    Page 5, “The asterisk denotes an incorrect rule”
  10. N0 Expansion 0.19 0.54 0.28 WikiBL 0.19 0.53 0.28 Sn0w400K 0.19 0.54 0.28 Lin 0.25 0.39 0.30 WordNet 0.30 0.47 0.37 Extraction Methods from Wikipedia:
    Page 7, “Application Oriented Evaluations”
  11. WordNet + WikiAuJules+mce 0.35 0.47 0.40
    Page 7, “Application Oriented Evaluations”

See all papers in Proc. ACL 2009 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

semantic relations

Appears in 4 sentences as: semantic relations (3) semantically “related” (1)
In Extracting Lexical Reference Rules from Wikipedia
  1. Many works on machine readable dictionaries utilized definitions to identify semantic relations between words (Ide and Jean, 1993).
    Page 2, “Background”
  2. Ponzetto and Strube (2007) identified the subsumption (ISA) relation from Wikipedia’s category tags, while in Yago (Suchanek et al., 2007) these tags, redirect links and WordNet were used to identify instances of 14 predefined specific semantic relations .
    Page 2, “Background”
  3. However this is a rather loose notion, which only indicates that terms are semantically “related” and are likely to co-occur with each other.
    Page 2, “Background”
  4. An examination of the paths in All-N reveals, beyond standard hyponymy and synonymy, various semantic relations that satisfy lexical reference, such as Location, Occupation and Creation, as illustrated in Table 3.
    Page 4, “Extraction Methods Analysis”

See all papers in Proc. ACL 2009 that mention semantic relations.

See all papers in Proc. ACL that mention semantic relations.

Back to top.

cosine similarity

Appears in 3 sentences as: cosine similarity (3)
In Extracting Lexical Reference Rules from Wikipedia
  1. We also examined another filtering score, the cosine similarity between the vectors representing the two rule sides in LSA (Latent Semantic Analysis) space (Deerwester et al., 1990).
    Page 6, “The asterisk denotes an incorrect rule”
  2. During classification cosine similarity is measured between the feature vector of the classified document and the expanded vectors of all categories.
    Page 6, “Application Oriented Evaluations”
  3. The first avoids any expansion, classifying documents based on cosine similarity with category names only.
    Page 7, “Application Oriented Evaluations”

See all papers in Proc. ACL 2009 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

knowledge bases

Appears in 3 sentences as: knowledge base (1) knowledge bases (2)
In Extracting Lexical Reference Rules from Wikipedia
  1. To perform such inferences, systems need large scale knowledge bases of LR rules.
    Page 1, “Introduction”
  2. Our goal is to utilize the broad knowledge of Wikipedia to extract a knowledge base of lexical reference rules.
    Page 2, “Extracting Rules from Wikipedia”
  3. We note that the last three extraction methods should not be considered as Wikipedia specific, since many Web-like knowledge bases contain redirects, hyperlinks and disambiguation means.
    Page 3, “Extracting Rules from Wikipedia”

See all papers in Proc. ACL 2009 that mention knowledge bases.

See all papers in Proc. ACL that mention knowledge bases.

Back to top.

noun phrases

Appears in 3 sentences as: noun phrases (4)
In Extracting Lexical Reference Rules from Wikipedia
  1. As Wikipedia’s titles are mostly noun phrases, the terms we extract as RHSs are the nouns and noun phrases in the definition.
    Page 2, “Extracting Rules from Wikipedia”
  2. It also enables us extracting additional rules by splitting conjoined noun phrases and by taking both the head noun and the complete base noun phrase as the RHS for separate rules (examples 1—3 in Table 1).
    Page 3, “Extracting Rules from Wikipedia”
  3. Therefore, we further create rules for all head nouns and base noun phrases within the definition (example 4).
    Page 3, “Extracting Rules from Wikipedia”

See all papers in Proc. ACL 2009 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.