SPred: Large-scale Harvesting of Semantic Predicates
Flati, Tiziano and Navigli, Roberto

Article Structure

Abstract

We present SPred, a novel method for the creation of large repositories of semantic predicates.

Introduction

Acquiring semantic knowledge from text automatically is a longstanding issue in Computational Linguistics and Artificial Intelligence.

Preliminaries

We introduce two preliminary definitions which we use in our approach.

Large-Scale Harvesting of Semantic Predicates

The goal of this paper is to provide a fully automatic approach for the creation of a large repository of semantic predicates in three phases.

Experiment 1: Oxford Lexical Predicates

We evaluate on the two forms of output produced by SPred: (i) the top-ranking semantic classes of a lexical predicate, as obtained with Formula 1, and (ii) the classification of a lexical predicate’s argument with the most suitable semantic class, as produced using Formula 3.

Experiment 2: Comparison with Kozareva & Hovy (2010)

Due to the novelty of the task carried out by SPred, the resulting output may be compared with only a limited number of existing approaches.

Related work

The availability of Web-scale corpora has led to the production of large resources of relations (Etzioni et a1., 2005; Yates et a1., 2007; Wu and Weld, 2010; Carlson et a1., 2010; Fader et a1., 2011).

Conclusions

In this paper we present SPred, a novel approach to large-scale harvesting of semantic predicates.

Topics

WordNet

Appears in 25 sentences as: WordNet (28)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. Furthermore, the learned classes are not directly linked to existing resources such as WordNet (Fellbaum, 1998) or Wikipedia.
    Page 1, “Introduction”
  2. 0 We propose SPred, a novel approach which harvests predicates from Wikipedia and generalizes them by leveraging core concepts from WordNet .
    Page 2, “Introduction”
  3. As explained below, we assume the set C to be made up of representative synsets from WordNet .
    Page 3, “Large-Scale Harvesting of Semantic Predicates”
  4. We perform this in two substeps: we first link all our disambiguated arguments to WordNet (Section 3.3.1) and then leverage the WordNet taxonomy to populate the semantic classes in 0 (Section 3.3.2).
    Page 3, “Large-Scale Harvesting of Semantic Predicates”
  5. 3.3.1 Linking to WordNet
    Page 3, “Large-Scale Harvesting of Semantic Predicates”
  6. However, using Wikipedia as our sense inventory is not desirable; in fact, contrarily to other commonly used lexical-semantic networks such as WordNet , Wikipedia is not formally organized in a structured, taxonomic hierarchy.
    Page 3, “Large-Scale Harvesting of Semantic Predicates”
  7. Therefore, as was done in previous work (Mihalcea and Moldovan, ; Ciaramita and Altun, 2006; Izquierdo et al., 2009; Erk and McCarthy, 2009; Huang and Riloff, 2010), we pick out our semantic classes 0 from WordNet and leverage its manually-curated taxonomy to associate our arguments with the most suitable class.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  8. This way we avoid building a new taxonomy and shift the problem to that of projecting the Wikipedia pages —associated with annotated filling arguments — to synsets in WordNet .
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  9. We exploit an existing mapping implemented in BabelNet (Navigli and Ponzetto, 2012), a wide-coverage multilingual semantic network that integrates Wikipedia and WordNet.3 Based on a disambiguation algorithm, BabelNet establishes a mapping ,u : Wikipages —> Synsets which links about 50,000 pages to their most suitable WordNet senses.4
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  10. This is mainly due to the encyclopedic nature of most pages, which do not have a counterpart in the WordNet dictionary.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  11. 4We follow (Navigli, 2009) and denote with w; the i-th sense of w in WordNet with part of speech p.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

synsets

Appears in 13 sentences as: synset (5) Synsets (1) synsets (7)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. As explained below, we assume the set C to be made up of representative synsets from WordNet.
    Page 3, “Large-Scale Harvesting of Semantic Predicates”
  2. This way we avoid building a new taxonomy and shift the problem to that of projecting the Wikipedia pages —associated with annotated filling arguments — to synsets in WordNet.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  3. We exploit an existing mapping implemented in BabelNet (Navigli and Ponzetto, 2012), a wide-coverage multilingual semantic network that integrates Wikipedia and WordNet.3 Based on a disambiguation algorithm, BabelNet establishes a mapping ,u : Wikipages —> Synsets which links about 50,000 pages to their most suitable WordNet senses.4
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  4. Definition 3 (semantic class of a synset ).
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  5. The semantic class for a WordNet synset S is the class 0 among those in C which is the most specific hypernym of 8 according to the WordNet taxonomy.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  6. For instance, given the synset tap water; its semantic class is water}, (while the other more general subsumers in C are not considered, e.g., compound%, chemicalk, liquidi, etc).
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  7. 6http://lcl.uniroma1.it/wcl 7This can rarely happen due to multiple hypernyms available in WordNet for the same synset .
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  8. For each WordNet synset S we create a distributional vector g summing the noun occurrences within all the Wikipedia pages p such that Mp) = 8.
    Page 5, “Large-Scale Harvesting of Semantic Predicates”
  9. where desc(c) is the set of all synsets which are descendants of the semantic class c in WordNet.
    Page 5, “Large-Scale Harvesting of Semantic Predicates”
  10. Let Ca be the set of candidate semantic classes for argument a, i.e., Ca contains the semantic classes for the WordNet synsets of a as well as the semantic classes associated with ,u(p) for all Wikipedia pages p whose lemma is a.
    Page 5, “Large-Scale Harvesting of Semantic Predicates”
  11. As our set C of semantic classes we selected the standard set of 3,299 core nominal synsets available in WordNet.8 However, our approach is flexible and can be used with classes of an arbitrary level of granularity.
    Page 6, “Experiment 1: Oxford Lexical Predicates”

See all papers in Proc. ACL 2013 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

hypernym

Appears in 4 sentences as: hypernym (5) hypernyms (1)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. we extract the hypernym from the textual definition of p by applying Word-Class Lattices (Navigli and Velardi, 2010, WCL6), a domain-independent hypernym extraction system successfully applied to taxonomy learning from scratch (Velardi et al., 2013) and freely available online (Faralli and Navigli, 2013).
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  2. If a hypernym h is successfully extracted and h is linked to a Wikipedia page 19’ for which ,u(p’) is defined, then we extend the mapping by setting Mp) :2 ,u(p’ For instance, the mapping provided by BabelNet does not provide any link for the page Peter Spence; thanks to WCL, though, we are able to set the page Journalist as its hypernym , and link it to the WordNet synsetjournalistk.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  3. The semantic class for a WordNet synset S is the class 0 among those in C which is the most specific hypernym of 8 according to the WordNet taxonomy.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”
  4. 6http://lcl.uniroma1.it/wcl 7This can rarely happen due to multiple hypernyms available in WordNet for the same synset.
    Page 4, “Large-Scale Harvesting of Semantic Predicates”

See all papers in Proc. ACL 2013 that mention hypernym.

See all papers in Proc. ACL that mention hypernym.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probability (3)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. Since not all classes are equally relevant to the lexical predicate 7r, we estimate the conditional probability of each class c E 0 given 7r on the basis of the number of sentences which contain an argument in that class.
    Page 5, “Large-Scale Harvesting of Semantic Predicates”
  2. To enable wide coverage we estimate a second conditional probability based on the distributional semantic profile of each class.
    Page 5, “Large-Scale Harvesting of Semantic Predicates”
  3. For each lexical predicate we calculated the conditional probability of each semantic class using Formula 1, resulting in a ranking of semantic classes.
    Page 6, “Experiment 1: Oxford Lexical Predicates”

See all papers in Proc. ACL 2013 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

noun phrases

Appears in 3 sentences as: noun phrases (3)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. While in principle * could match any sequence of words, since we aim at generalizing nouns, in what follows we allow * to match only noun phrases (e.g., glass, hot cup, very big bottle, etc.
    Page 2, “Preliminaries”
  2. We search the English Wikipedia for all the token sequences which match n, resulting in a list of noun phrases filling the * argument.
    Page 2, “Large-Scale Harvesting of Semantic Predicates”
  3. As can be seen, a wide range of noun phrases are extracted, from quantities such as glass and cap to other aspects, such as brand and constituent.
    Page 2, “Large-Scale Harvesting of Semantic Predicates”

See all papers in Proc. ACL 2013 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

random sample

Appears in 3 sentences as: random sample (2) randomly sampled (1)
In SPred: Large-scale Harvesting of Semantic Predicates
  1. We show in Table 3 the precision@l<: calculated over a random sample of 50 lexical predicates.11 As can be seen, while the classes quality is pretty high with low values of k, performance gradually degrades as we let k increase.
    Page 6, “Experiment 1: Oxford Lexical Predicates”
  2. Starting from the lexical predicate items obtained as described in Section 4.2, we selected those items belonging to a random sample of 20 usage notes among those provided by the Oxford dictionary, totaling 3,245 items.
    Page 7, “Experiment 1: Oxford Lexical Predicates”
  3. For tuning 04 we used a held-out set of 8 verbs, randomly sampled from the lexical predicates not used in the dataset.
    Page 7, “Experiment 1: Oxford Lexical Predicates”

See all papers in Proc. ACL 2013 that mention random sample.

See all papers in Proc. ACL that mention random sample.

Back to top.