Abstract | We identify whether a candidate word pair has hypernym—hyponym relation by using the word-embedding-based semantic projections between words and their hypernyms . |
Background | The pioneer work by Hearst (1992) has found out that linking two noun phrases (NPs) via certain lexical constructions often implies hypernym relations. |
Background | For example, NP1 is a hypernym of NP2 in the lexical pattern “such NP1 as NP2 Snow et al. |
Introduction | Here, “canine” is called a hypernym of “dog.” Conversely, “dog” is a hyponym of “canine.” As key sources of knowledge, semantic thesauri and ontologies can support many natural language processing applications. |
Introduction | (2013) propose a distant supervision method to extract hypernyms for entities from multiple sources. |
Introduction | The output of their model is a list of hypernyms for a given enity (left panel, Figure 1). |
Introduction | However, unlike the case with smaller manually-curated resources such as WordNet (Fellbaum, 1998), in many large automatically-created resources the taxonomical information is either missing, mixed across resources, e.g., linking Wikipedia categories to WordNet synsets as in YAGO, or coarse-grained, as in DBpedia whose hypernyms link to a small upper taxonomy. |
Phase 1: Inducing the Page Taxonomy | For each p E P our aim is to identify the most suitable generalization ph E P so that we can create the edge (p, ph) and add it to E. For instance, given the page APPLE, which represents the fruit meaning of apple, we want to determine that its hypemym is FRUIT and add the hypernym edge connecting the two pages (i.e., E := E U {(APPLE, FRUIT)}). |
Phase 1: Inducing the Page Taxonomy | 3.1 Syntactic step: hypernym extraction |
Phase 1: Inducing the Page Taxonomy | In the syntactic step, for each page p E P, we extract zero, one or more hypernym lemmas, that is, we output potentially ambiguous hypernyms for the page. |
WiBi: A Wikipedia Bitaxonomy | Creation of the initial page taxonomy: we first create a taxonomy for the Wikipedia pages by parsing textual definitions, extracting the hypernym (s) and disambiguating them according to the page inventory. |
WiBi: A Wikipedia Bitaxonomy | At each iteration, the links in the page taxonomy are used to identify category hypemyms and, conversely, the new category hypernyms are used to identify more page hypernyms . |
GlossBoot | (a) Hypernym extraction: for each newly-acquired term/ gloss pair (75, g) E G k, we automatically extract a candidate hypernym h from the textual gloss 9. |
GlossBoot | To do this we use a simple unsupervised heuristic which just selects the first term in the gloss.5 We show an example of hypernym extraction for some terms in Table 2 (we report the term in column 1, the gloss in column 2 and the hypemyms extracted by the first term hypernym extraction heuristic in column 3). |
GlossBoot | (b) (Term, Hypernym )-ranking: we sort all the glosses in Gk by the number of seed terms found in each gloss. |
Introduction | Given a domain and a language of interest, we bootstrap the glossary learning process with just a few hypernymy relations (such as computer isa device), with the only condition that the (term, hypernym ) pairs must be specific enough to implicitly identify the domain in the target language. |
Related Work | To avoid the use of a large domain corpus, terminologies can be obtained from the Web by using Doubly-Anchored Patterns (DAPs) which, given a (term, hypernym) pair, harvest sentences matching manually-defined patterns like “< hypernym > such as <term>, and *” (Kozareva et al., 2008). |
Related Work | Similarly to our approach, they drop the requirement of a domain corpus and start from a small number of (term, hypernym ) seeds. |
Related Work | In contrast, GlossBoot performs the novel task of multilingual glossary learning from the Web by bootstrapping the extraction process with a few (term, hypernym ) seeds. |
Results and Discussion | Now, an obvious question arises: what if we bootstrapped GlossBoot with fewer hypernym seeds, e.g., just one seed? |
Results and Discussion | To answer this question we replicated our English experiments on each single (term, hypernym ) pair in our seed set. |
Abstract | Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern generalization methods proposed in the literature. |
Introduction | A key feature of our approach is its inherent ability to both identify definitions and extract hypernyms . |
Introduction | WCLs are shown to generalize over lexico-syntactic patterns, and outperform well-known approaches to definition and hypernym extraction. |
Related Work | Hypernym Extraction. |
Related Work | The literature on hypernym extraction offers a higher variability of methods, from simple lexical patterns (Hearst, 1992; Oakes, 2005) to statistical and machine learning techniques (Agirre et al., 2000; Cara-ballo, 1999; Dolan et al., 1993; Sanfilippo and Poznanski, 1992; Ritter et al., 2009). |
Related Work | Finally, they train a hypernym clas-sifer based on these features. |
Word-Class Lattices | o The DEFINIENS field (GF): it includes the genus phrase (usually including the hypernym , e.g., “a first-class function”); |
Word-Class Lattices | For each sentence, the definiendum (that is, the word being defined) and its hypernym are marked in bold and italic, respectively. |
Word-Class Lattices | Furthermore, in the final lattice, nodes associated with the hypernym words in the learning sentences are marked as hypernyms in order to be able to determine the hypernym of a test sentence at classification time. |
Experiments | The gold standards used in the evaluation are hypernym taxonomies extracted from WordNet and GDP (Open Directory Project), and meronym taxonomies extracted from WordNet. |
Experiments | In total, there are 100 hypernym taxonomies, 50 each extracted from WordNet3 and ODP4, and 50 meronym taxonomies from WordNetS. |
Experiments | 3 WordNet hypernym taxonomies are from 12 topics: gathering, professional, people, building, place, milk, meal, water, beverage, alcohol, dish, and herb. |
Related Work | (2004) extended isa relation acquisition towards terascale, and automatically identified hypernym patterns by minimal edit distance. |
The Features | Hypernym Patterns Sibling Patterns |
The Features | We have (11) Hypernym Patterns based on patterns proposed by (Hearst, 1992) and (Snow et al., 2005), (12) Sibling Patterns which are basically conjunctions, and (13) Part-of Patterns based on patterns proposed by (Girju et al., 2003) and (Cimiano and Wenderoth, 2007). |
Gazetteer Induction 2.1 Induction by MN Clustering | The last word in the noun phase is then extracted and becomes the hypernym of the entity described by the article. |
Gazetteer Induction 2.1 Induction by MN Clustering | For example, from the following defining sentence, it extracts “guitarist” as the hypernym for “J imi Hendrix”. |
Gazetteer Induction 2.1 Induction by MN Clustering | # instances page titles processed 550,832 articles found 547,779 (found by redirection) (189,222) first sentences found 545,577 hypernyms extracted 482,599 |
Using Gazetteers as Features of NER | 8They handled “redirections” as well by following redirection links and extracting a hypernym from the article reached. |
Experiments | The spin model approach uses word glosses, WordNet synonym, hypernym , and antonym relations, in addition to co-occurrence statistics extracted from corpus. |
Experiments | The proposed method achieves better performance by only using WordNet synonym, hypernym and similar to relations. |
Experiments | We build a network using only WordNet synonyms and hypernyms . |
Word Polarity | For example, we can use other WordNet relations: hypernyms , similar to,...etc. |
Assessment of Lexical Resources | This includes the WordNet lexicographer’s file name (e.g., noun.time), synsets, and hypernyms . |
Assessment of Lexical Resources | We make extensive use of the file name, but less so from the synsets and hypernyms . |
Assessment of Lexical Resources | However, in general, we find that the file names are too coarse-grained and the synsets and hypernyms too fine-grained for generalizations on the selectors for the complements and the governors. |
See http://clg.wlv.ac.uk/proiects/DVC | The feature extraction rules are (1) word class (we), (2) part of speech (pos), (3) lemma (1), (4) word (w), (5) WordNet lexical name (In), (6) WordNet synonyms (s), (7) WordNet hypernyms (h), (8) whether the word is capitalized (c), and (9) affixes (af). |
See http://clg.wlv.ac.uk/proiects/DVC | For features such as the WordNet lexical name, synonyms and hypernyms , the number of values may be much larger. |
Comparison to Prior Work | We addressed this problem by simply rerunning ASIA with a more specific class name (i.e., the first one returned); however, the result suggests that future work is needed to support automatic construction of hypernym hierarchy using semistructured web |
Comparison to Prior Work | For the experimental comparison, we focused on leaf semantic classes from the extended WordNet that have many hypernyms, so that a meaningful comparison could be made: specifically, we selected nouns that have at least three hypernyms, such that the hypernyms are the leaf nodes in the hypernym hierarchy of WordNet. |
Comparison to Prior Work | Preliminary experiments showed that (as in the experiments with Pasca’s classes above) ASIA did not always converge to the intended meaning; to avoid this problem, we instituted a second filter, and discarded ASIA’s results if the intersection of hypernyms from ASIA and WordNet constituted less than 50% of those in WordNet. |
Introduction | The opposite of hyponym is hypernym . |
Experimental Evaluation | Another local resource was WordNet where we inserted an edge (u, 2)) when U was a direct hypernym or synonym of u. |
Learning Entailment Graph Edges | For each 75, E T with two variables and a single predicate word 21), we extract from WordNet the set H of direct hypernyms and synonyms of 21). |
Learning Entailment Graph Edges | Negative examples are generated analogously, by looking at direct co-hyponyms of 212 instead of hypernyms and synonyms. |
Learning Entailment Graph Edges | Combined with the constraint of transitivity this implies that there must be no path from u to v. This is done in the following two scenarios: (1) When two nodes u and v are identical except for a pair of words wu and my, and mu is an antonym of my, or a hypernym of my at distance 2 2. |
Conclusions | We also intend to link missing concepts in WordNet, by establishing their most likely hypernyms — e.g., a la Snow et al. |
Methodology | 0 Hypernymy/Hyponymy: all synonyms in the synsets H such that H is either a hypernym (i.e., a generalization) or a hyponym (i.e., a specialization) of S. For example, given bal-loo n},, we include the words from its hypernym { lighter-than-air craft}, } and all its hyponyms (e.g. |
Methodology | o Sisterhood: words from the sisters of S. A sister synset S’ is such that S and 8’ have a common direct hypernym . |
Methodology | To do so, we include words from their synsets, hypernyms , hyponyms, sisters, and glosses. |
Large-Scale Harvesting of Semantic Predicates | we extract the hypernym from the textual definition of p by applying Word-Class Lattices (Navigli and Velardi, 2010, WCL6), a domain-independent hypernym extraction system successfully applied to taxonomy learning from scratch (Velardi et al., 2013) and freely available online (Faralli and Navigli, 2013). |
Large-Scale Harvesting of Semantic Predicates | If a hypernym h is successfully extracted and h is linked to a Wikipedia page 19’ for which ,u(p’) is defined, then we extend the mapping by setting Mp) :2 ,u(p’ For instance, the mapping provided by BabelNet does not provide any link for the page Peter Spence; thanks to WCL, though, we are able to set the page Journalist as its hypernym , and link it to the WordNet synsetjournalistk. |
Large-Scale Harvesting of Semantic Predicates | The semantic class for a WordNet synset S is the class 0 among those in C which is the most specific hypernym of 8 according to the WordNet taxonomy. |
Abstract | We present a structured learning approach to inducing hypernym taxonomies using a probabilistic graphical model formulation. |
Experiments | Comparison setup: We also compare our method (as closely as possible) with related previous work by testing on the much larger animal subtree made available by Kozareva and Hovy (2010), who created this dataset by selecting a set of ‘harvested’ terms and retrieving all the WordNet hypernyms between each input term and the root (i.e., animal), resulting in N700 terms and ~4,300 isa ancestor-child links.12 Our training set for this animal test case was generated from WordNet using the following process: First, we strictly remove the full animal subtree from WordNet in order to avoid any possible overlap with the test data. |
Experiments | (2011) and see a small gain in F1, but regardless, we should note that their results are incomparable (denoted by *‘k in Table 2) because they have a different ground-truth data condition: their definition and hypernym extraction phase involves using the Google ole fine keyword, which often returns WordNet glosses itself. |
Introduction | Our model takes a loglinear form and is represented using a factor graph that includes both lst—order scoring factors on directed hypernymy edges (a parent and child in the taxonomy) and 2nd-order scoring factors on sibling edge pairs (pairs of hypernym edges with a shared parent), as well as incorporating a global (directed spanning tree) structural constraint. |
Background | (2005) experimented with first-sense and hypernym features from HowNet and CiLin (both WordNets for Chinese) in a generative parse model applied to the Chinese Penn Treebank. |
Background | The combination of word sense and first-level hypernyms produced a significant improvement over their basic model. |
Integrating Semantics into Parsing | 1In WordNet 2.1, knife and scissors are sister synsets, both of which have TOOL as their 4th hypernym . |
Integrating Semantics into Parsing | Only by mapping them onto their lst hypernym or higher would we be able to capture the semantic generalisation alluded to above. |
Acquisition of Hyponymy Relations from Wikipedia | A hyponymy-relation candidate is then extracted from the tree structure by regarding a node as a hypemym candidate and all its subordinate nodes as hyponym candidates of the hypernym candidate (e.g., (TIGER, TAXONOMY) and (TIGER, SIBERIAN TIGER) from Figure 4). |
Acquisition of Hyponymy Relations from Wikipedia | For example, “List of artists” is converted into “artists” by lexical pattern “list of Hyponymy-relation candidates whose hypernym candidate matches such a lexical pattern are likely to be valid (e.g., (List of artists, Leonardo da Vinci)). |
Motivation | In their approach, a common substring in a hypernym and a hyponym is assumed to be one strong clue for recognizing that the two words constitute a hyponymy relation. |
Automated Classification | 0 {Synonyms, Hypernyms } for all NN and VB entries for each word |
Automated Classification | Intersection of the words’ hypernyms |
Automated Classification | In fact, by themselves they proved roughly as useful as the hypernym features, and their removal had the single strongest negative impact on accuracy for our dataset. |
System Description | We use the direct hypernym relation of WordNet to retrieve the superordinates of the category word (e.g. |
System Description | Although we can obtain only the direct hypernyms in WordNet, no such mechanism exists in ConceptNet. |
System Description | In addition to the direct hypernyms of the category word, we increase the size of the ingredient list by adding synonyms of the category word, the new words coming from the relations and the properties determined by the user. |
Introduction | (2005) for exploiting WordNet to extract hypernym (isa) relations between entities, and is similar to the use of weakly labeled data in bioinfor-matics (Craven and Kumlien, 1999; Morgan et al., |
Previous work | Approaches based on WordNet have often only looked at the hypernym (isa) or meronym (part-of) relation (Girju et al., 2003; Snow et al., 2005), while those based on the ACE program (Doddington et al., 2004) have been restricted in their evaluation to a small number of relation instances and corpora of less than a million words. |
Previous work | Hearst (1992) used a small number of regular expressions over words and part-of-speech tags to find examples of the hypernym relation. |
FrameNet — Wiktionary Alignment | For the joint model, we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms , synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739. |
Intermediate Resource FNWKxx | We also extract other related lemma-POS, for instance 487 antonyms, 126 hyponyms, and 19 hypernyms . |
Resource FNWKde | Relation per FrameNet sense per frame SYNONYM 17,713 13,288 HYPONYM 4,818 3,347 HYPERNYM 6,369 3,961 ANTONYM 9,626 6,737 |
Experiments | 0 WordNet link types (link type list) (e.g., attribute, hypernym , entailment) |
Experiments | ° WordNet hypernyms |
Experiments | Curiously, although hypernyms are commonly used as features in NLP classification tasks, gloss terms, which are rarely used for these tasks, are approximately as useful, at least in this particular context. |
Comparison on applications | The maximum for WordNet is 0.8506, where the mean is 3, or the first hypernym synset. |
Comparison on applications | This suggests that the POS and Head are most important for representing text in Roget’s Thesaurus, while the first hypernym is most important for representing text using WordNet. |
Introduction | These hypernym relations were also put towards solving analogy questions. |