A Robust Approach to Aligning Heterogeneous Lexical Resources
Pilehvar, Mohammad Taher and Navigli, Roberto

Article Structure

Abstract

Lexical resource alignment has been an active field of research over the last decade.

Introduction

Lexical resources are repositories of machine-readable knowledge that can be used in virtually any Natural Language Processing task.

Resource Alignment

Preliminaries.

Lexical Resource Ontologization

In Section 2, we presented our approach for aligning lexical resources.

Experiments

Lexical resources.

Related Work

Resource ontologization.

Conclusions

This paper presents a unified approach for aligning lexical resources.

Topics

WordNet

Appears in 14 sentences as: WordNet (15)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. Our approach leverages a similarity measure that enables the structural comparison of senses across lexical resources, achieving state-of-the-art performance on the task of aligning WordNet to three different collaborative resources: Wikipedia, Wiktionary and OmegaWiki.
    Page 1, “Abstract”
  2. Notable examples are WordNet , Wikipedia and, more recently, collaboratively-curated resources such as OmegaWiki and Wiktionary (Hovy et al., 2013).
    Page 1, “Introduction”
  3. When a lexical resource can be viewed as a semantic graph, as with WordNet or Wikipedia, this limit can be overcome by means of alignment algorithms that exploit the network structure to determine the similarity of concept pairs.
    Page 1, “Introduction”
  4. We report state-of-the-art performance when aligning WordNet to Wikipedia, OmegaWiki and Wiktionary.
    Page 2, “Introduction”
  5. For instance, WordNet can be readily represented as an undirected graph G whose nodes are synsets and edges are modeled after the relations between synsets defined in WordNet (e. g., hypernymy, meronymy, etc.
    Page 2, “Resource Alignment”
  6. Any semantic network with a dense relational structure, providing good coverage of the words appearing in the definitions, is a suitable candidate for H. For this purpose we used the WordNet (Fellbaum, 1998) graph which was further enriched by connecting
    Page 3, “Resource Alignment”
  7. As an example, assume we are given two semantic signatures computed for two concepts in WordNet and Wiktionary.
    Page 4, “Resource Alignment”
  8. 3'For instance, we calculated that more than 80% of the words in WordNet are monosemous, with over 60% of all the synsets containing at least one of them.
    Page 4, “Resource Alignment”
  9. To enable a comparison with the state of the art, we followed Matuschek and Gurevych (2013) and performed an alignment of WordNet synsets (WN) to three different collaboratively-constructed resources: Wikipedia
    Page 5, “Experiments”
  10. As mentioned in Section 2.1.1, we build the WN graph by including all the synsets and semantic relations defined in WordNet (e.g., hypernymy and meronymy) and further populate the relation set by connecting a synset to all the other synsets that appear in its disambiguated gloss.
    Page 5, “Experiments”
  11. A good example is WordNet , which has been exploited as a semantic network in dozens of NLP tasks (Fellbaum, 1998).
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

similarity measure

Appears in 10 sentences as: Similarity Measure (1) similarity measure (7) similarity measurement (2)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. Our approach leverages a similarity measure that enables the structural comparison of senses across lexical resources, achieving state-of-the-art performance on the task of aligning WordNet to three different collaborative resources: Wikipedia, Wiktionary and OmegaWiki.
    Page 1, “Abstract”
  2. Figure 1 illustrates the procedure underlying our cross-resource concept similarity measurement technique.
    Page 2, “Resource Alignment”
  3. The structural similarity component, instead, is a novel graph-based similarity measurement technique which calculates the similarity between a pair of concepts across the semantic networks of the two resources by leveraging the semantic
    Page 2, “Resource Alignment”
  4. To do this, we apply our definitional similarity measure introduced in Section 2.1.
    Page 5, “Lexical Resource Ontologization”
  5. 4.3 Similarity Measure Analysis
    Page 8, “Experiments”
  6. We explained in Section 2.1 that our concept similarity measure consists of two components: the definitional and the structural similarities.
    Page 8, “Experiments”
  7. structural similarity measure in comparison to the Dijkstra-WSA method, we carried out an experiment where our alignment system used only the structural similarity component, a variant of our system we refer to as SemAlignStr.
    Page 8, “Experiments”
  8. Therefore, the considerable performance improvement over Dijkstra-WSA on this resource pair shows the effectiveness of our novel concept similarity measure independently of the underlying semantic network.
    Page 8, “Experiments”
  9. Our method leverages a novel similarity measure which enables a direct structural comparison of concepts across different lexical resources.
    Page 9, “Conclusions”
  10. In future work, we plan to extend our concept similarity measure across different natural languages.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

semantic relations

Appears in 7 sentences as: semantic relations (7)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. However, not all lexical resources provide explicit semantic relations between concepts and, hence, machine-readable dictionaries like Wiktionary have first to be transformed into semantic graphs before such graph-based approaches can be applied to them.
    Page 1, “Introduction”
  2. Therefore, we assume that a lexical resource L can be represented as an undirected graph G = (V, E) where V is the set of nodes, i.e., the concepts defined in the resource, and E is the set of undirected edges, i.e., semantic relations between concepts.
    Page 2, “Resource Alignment”
  3. However, other resources such as Wiktionary do not provide semantic relations between concepts and, therefore, have first to be transformed into semantic networks before they can be aligned using our alignment algorithm.
    Page 2, “Resource Alignment”
  4. Our ontologization algorithm takes as input a lexicon L and outputs a semantic graph G = (V, E) where, as already defined in Section 2, V is the set of concepts in L and E is the set of semantic relations between these concepts.
    Page 4, “Lexical Resource Ontologization”
  5. As mentioned in Section 2.1.1, we build the WN graph by including all the synsets and semantic relations defined in WordNet (e.g., hypernymy and meronymy) and further populate the relation set by connecting a synset to all the other synsets that appear in its disambiguated gloss.
    Page 5, “Experiments”
  6. The other two resources, i.e., WT and OW, do not provide a reliable network of semantic relations , therefore we used our ontologization approach to construct their corresponding semantic graphs.
    Page 5, “Experiments”
  7. usually the case with machine-readable dictionaries, where structuring the resource involves the arduous task of connecting lexicographic senses by means of semantic relations .
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention semantic relations.

See all papers in Proc. ACL that mention semantic relations.

Back to top.

content words

Appears in 6 sentences as: content words (7)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. In this component the personalization vector vi is set by uniformly distributing the probability mass over the nodes corresponding to the senses of all the content words in the extended definition of di according to the sense inventory of a semantic network H. We use the same semantic graph H for computing the semantic signatures of both definitions.
    Page 3, “Resource Alignment”
  2. We first create the empty undirected graph G L = (V, E) such that V is the set of concepts in L and E = (D. For each source concept c E V we create a bag of content words W = {2121, .
    Page 5, “Lexical Resource Ontologization”
  3. ,wn} which includes all the content words in its definition d and, if available, additional related words obtained from lexicon relations (e.g., synonyms in Wiktionary).
    Page 5, “Lexical Resource Ontologization”
  4. The definition contains two content words : fruitn and conifern.
    Page 5, “Lexical Resource Ontologization”
  5. For ontologizing WT and OW, the bag of content words W is given by the content words in sense definitions and, if available, additional related words obtained from lexicon relations (see Section 3).
    Page 5, “Experiments”
  6. tively small in number, are already disambiguated and, therefore, the ontologization was just performed on the definition’s content words .
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention content words.

See all papers in Proc. ACL that mention content words.

Back to top.

synsets

Appears in 5 sentences as: synset (2) synsets (6)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. For instance, WordNet can be readily represented as an undirected graph G whose nodes are synsets and edges are modeled after the relations between synsets defined in WordNet (e. g., hypernymy, meronymy, etc.
    Page 2, “Resource Alignment”
  2. ), and LG is the mapping between each synset node and the set of synonyms which express the concept.
    Page 2, “Resource Alignment”
  3. 3'For instance, we calculated that more than 80% of the words in WordNet are monosemous, with over 60% of all the synsets containing at least one of them.
    Page 4, “Resource Alignment”
  4. To enable a comparison with the state of the art, we followed Matuschek and Gurevych (2013) and performed an alignment of WordNet synsets (WN) to three different collaboratively-constructed resources: Wikipedia
    Page 5, “Experiments”
  5. As mentioned in Section 2.1.1, we build the WN graph by including all the synsets and semantic relations defined in WordNet (e.g., hypernymy and meronymy) and further populate the relation set by connecting a synset to all the other synsets that appear in its disambiguated gloss.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

graph-based

Appears in 4 sentences as: graph-based (4)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. However, not all lexical resources provide explicit semantic relations between concepts and, hence, machine-readable dictionaries like Wiktionary have first to be transformed into semantic graphs before such graph-based approaches can be applied to them.
    Page 1, “Introduction”
  2. The structural similarity component, instead, is a novel graph-based similarity measurement technique which calculates the similarity between a pair of concepts across the semantic networks of the two resources by leveraging the semantic
    Page 2, “Resource Alignment”
  3. DWSA stands for Dijkstra-WSA, the state-of-the-art graph-based alignment approach of Matuschek and Gurevych (2013).
    Page 7, “Experiments”
  4. Last year Matuschek and Gurevych (2013) proposed Dijkstra-WSA, a graph-based approach relying on shortest paths between two concepts when the two corresponding resources graphs were combined by leveraging monosemous linking.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

semantic similarity

Appears in 3 sentences as: semantic similarity (3)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. Deeper approaches leverage semantic similarity to go beyond the surface realization of definitions (Navigli, 2006; Meyer and Gurevych, 2011; Niemann and Gurevych, 2011).
    Page 1, “Introduction”
  2. These two scores are then combined into an overall score (part (e) of Figure 1) which quantifies the semantic similarity of the two input concepts 01 and 02.
    Page 2, “Resource Alignment”
  3. PPR has been previously used in a wide variety of tasks such as definition similarity-based resource alignment (Niemann and Gurevych, 2011), textual semantic similarity (Hughes and Ramage, 2007; Pilehvar et al., 2013), Word Sense Disambiguation (Agirre and Soroa, 2009; Faralli and Navigli, 2012) and semantic text categorization (Navigli et al., 2011).
    Page 3, “Resource Alignment”

See all papers in Proc. ACL 2014 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

Sense Disambiguation

Appears in 3 sentences as: sense disambiguated (1) Sense Disambiguation (2)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. Owing to its ability to bring together features like multilin-guality and increasing coverage, over the past few years resource alignment has proven beneficial to a wide spectrum of tasks, such as Semantic Parsing (Shi and Mihalcea, 2005), Semantic Role Labeling (Palmer et al., 2010), and Word Sense Disambiguation (Navigli and Ponzetto, 2012).
    Page 1, “Introduction”
  2. PPR has been previously used in a wide variety of tasks such as definition similarity-based resource alignment (Niemann and Gurevych, 2011), textual semantic similarity (Hughes and Ramage, 2007; Pilehvar et al., 2013), Word Sense Disambiguation (Agirre and Soroa, 2009; Faralli and Navigli, 2012) and semantic text categorization (Navigli et al., 2011).
    Page 3, “Resource Alignment”
  3. The edges obtained from unambiguous entries are essentially sense disambiguated on both sides whereas those obtained from ambiguous terms are a result of our similarity-based disambiguation.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention Sense Disambiguation.

See all papers in Proc. ACL that mention Sense Disambiguation.

Back to top.

similarity score

Appears in 3 sentences as: similarity score (2) similarity scores (1)
In A Robust Approach to Aligning Heterogeneous Lexical Resources
  1. If their similarity score exceeds a certain value denoted by 6
    Page 2, “Resource Alignment”
  2. Each of these components gets, as its input, a pair of concepts belonging to two different semantic networks and produces a similarity score .
    Page 2, “Resource Alignment”
  3. In this case, both the definitional and structural similarity scores are treated as equally important and two concepts are aligned if their overall similarity exceeds the middle point of the similarity scale.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.