Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Han, Xianpei and Zhao, Jun

Article Structure

Abstract

Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods.

Introduction

Name ambiguity problem is common on the Web.

The Structural Semantic Relatedness Measure

In this section, we demonstrate the structural semantic relatedness measure, which can capture the structural semantic knowledge in multiple knowledge sources.

Named Entity Disambiguation by Leveraging Semantic Knowledge

In this section we describe how to leverage the semantic knowledge captured in the structural semantic relatedness measure for named entity disambiguation.

Experiments

To assess the performance of our method and compare it with traditional methods, we conduct a series of experiments.

Related Work

In this section, we briefly review the related work.

Conclusions and Future Works

In this paper we demonstrate how to enhance the named entity disambiguation by capturing and exploiting the semantic knowledge existed in multiple knowledge sources.

Topics

semantic relatedness

Appears in 52 sentences as: semantic related (2) Semantic Relatedness (6) semantic relatedness (32) semantic relation (10) semantic relations (23) semantically related (4)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources.
    Page 1, “Abstract”
  2. This model measures similarity based on only the co-occurrence statistics of terms, without considering all the semantic relations like social relatedness between named entities, associative relatedness between concepts, and lexical relatedness (e.g., acronyms, synonyms) between key terms.
    Page 2, “Introduction”
  3. For example, as shown in Figure 2, the link structure of Wikipedia contains rich semantic relations between concepts.
    Page 2, “Introduction”
  4. The problem of these knowledge sources is that they are heterogeneous (e.g., they contain different types of semantic relations and different types of concepts) and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks.
    Page 2, “Introduction”
  5. For example, as shown in Figure 2, the semantic relation between Graphical Model and Computer Science is embedded in the link structure of the Wikipedia.
    Page 2, “Introduction”
  6. (2007)), the ontology connection in DBLP (Hassell et al., 2006) and the semantic relations in Wikipedia (Cucerzan (2007), Han and Zhao (2009)).
    Page 2, “Introduction”
  7. To overcome the deficiencies of previous methods, this paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge from multiple knowledge sources.
    Page 2, “Introduction”
  8. The key point of our method is a reliable semantic relatedness measure between concepts (including WordNet concepts, NEs and Wikipedia concepts), called Structural Semantic Relatedness, which can capture both the explicit semantic relations between concepts and the implicit semantic knowledge embedded in graphs and networks.
    Page 2, “Introduction”
  9. In particular, we first extract the semantic relations between two concepts from a variety of knowledge sources and
    Page 2, “Introduction”
  10. Then based on the principle that “two concepts are semantic related if they are both semantic related to the neighbor concepts of each other”, we construct our Structural Semantic Relatedness measure.
    Page 3, “Introduction”
  11. In the end, we leverage the structural semantic relatedness measure for named entity disambiguation and evaluate the performance on the standard WePS data sets.
    Page 3, “Introduction”

See all papers in Proc. ACL 2010 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.

named entity

Appears in 37 sentences as: named entities (11) named entity (28)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods.
    Page 1, “Abstract”
  2. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best.
    Page 1, “Abstract”
  3. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources.
    Page 1, “Abstract”
  4. So there is an urgent demand for efficient, high-quality named entity disambiguation methods.
    Page 1, “Introduction”
  5. Currently, the common methods for named entity disambiguation include name observation clustering (Bagga and Baldwin, 1998) and entity linking with knowledge base (McNa-mee and Dang, 2009).
    Page 1, “Introduction”
  6. Given a set of observations 0 = {01, 02, on} of the target name to be disambiguated, a named entity disambiguation system should group them into a set of clusters C = {c1, c2, cm}, with each resulting cluster corresponding to one specific entity.
    Page 1, “Introduction”
  7. A named entity disambiguation system should group the 1st and 4th Michael Jordan observations into one cluster for they both refer to the Berke-
    Page 1, “Introduction”
  8. To a human, named entity disambiguation is usually not a difficult task as he can make decisions depending on not only contextual clues, but also the prior background knowledge.
    Page 2, “Introduction”
  9. Figure l. The exploitation of knowledge in human named entity disambiguation
    Page 2, “Introduction”
  10. Conventionally, the named entity disambiguation methods measure the similarity between name observations using the bag of words (BOW) model (Bagga and Baldwin (1998); Mann and Yarowsky (2006); Fleischman and Hovy (2004); Pedersen et al.
    Page 2, “Introduction”
  11. This model measures similarity based on only the co-occurrence statistics of terms, without considering all the semantic relations like social relatedness between named entities , associative relatedness between concepts, and lexical relatedness (e.g., acronyms, synonyms) between key terms.
    Page 2, “Introduction”

See all papers in Proc. ACL 2010 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

WordNet

Appears in 15 sentences as: WordNet (20)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet ) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best.
    Page 1, “Abstract”
  2. Fortunately, in recent years, due to the evolution of Web (e.g., the Web 2.0 and the Semantic Web) and many research efforts for the construction of knowledge bases, there is an increasing availability of large-scale knowledge sources, such as Wikipedia and WordNet .
    Page 2, “Introduction”
  3. The key point of our method is a reliable semantic relatedness measure between concepts (including WordNet concepts, NEs and Wikipedia concepts), called Structural Semantic Relatedness, which can capture both the explicit semantic relations between concepts and the implicit semantic knowledge embedded in graphs and networks.
    Page 2, “Introduction”
  4. We extract three types of semantic relations (semantic relatedness between Wikipedia concepts, lexical relatedness between WordNet concepts and social relatedness between NEs) correspondingly from three knowledge sources: Wikipedia, WordNet and NE Co-occurrence Corpus.
    Page 3, “The Structural Semantic Relatedness Measure”
  5. WordNet 3.02 (Fellbaum et al., 1998), a lexical knowledge source includes over 110,000 WordNet concepts (word senses about English words).
    Page 3, “The Structural Semantic Relatedness Measure”
  6. Various lexical relations are recorded between WordNet concepts, such as hyponyms, holonym and synonym.
    Page 3, “The Structural Semantic Relatedness Measure”
  7. The lexical relatedness lr between two WordNet concepts are measured using the Lin (l998)’s WordNet semantic similarity measure.
    Page 3, “The Structural Semantic Relatedness Measure”
  8. The lexical relatedness table of four selected WordNet concepts
    Page 3, “The Structural Semantic Relatedness Measure”
  9. We first gather all the N—grams (up to 8 words) and identify whether they correspond to semantically meaningful concepts: if a N—gram is contained in the WordNet, we identify it as a WordNet concept, and use its primary word sense as its semantic meaning; to find whether a N—gram is a named entity, we match it to the named entity list extracted using the open-Calais API3, which contains more than 30 types of named entities, such as Person, Organization and Award; to find whether a N—gram is a Wikipedia concept, we match it to the Wikipedia anchor dictionary, then find its corresponding Wikipedia concept using the method described in (Medelyan et al, 2008).
    Page 4, “The Structural Semantic Relatedness Measure”
  10. The retained N—grams are identified as concepts, corresponding with their semantic meanings (a concept may have multiple semantic meaning explanation, e.g., the “MVP” has three semantic meaning, as “most valuable player, MVP” in WordNet , as the “Most Valuable Player” in Wikipedia and as a named entity of Award
    Page 4, “The Structural Semantic Relatedness Measure”
  11. That is, for each pair of extracted concepts, we identify whether there are semantic relations between them: 1) If there is only one semantic relation between them, we connect these two concepts with an edge, where the edge weight is the strength of the semantic relation; 2) If there is more than one semantic relations between them, we choose the most reliable semantic relation, i.e., we choose the semantic relation in the knowledge sources according to the order of WordNet , Wikipedia and NE Co-concurrence corpus (Suchanek et al., 2007).
    Page 4, “The Structural Semantic Relatedness Measure”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

Co-occurrence

Appears in 8 sentences as: Co-occurrence (4) co-occurrence (4)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. This model measures similarity based on only the co-occurrence statistics of terms, without considering all the semantic relations like social relatedness between named entities, associative relatedness between concepts, and lexical relatedness (e.g., acronyms, synonyms) between key terms.
    Page 2, “Introduction”
  2. We extract three types of semantic relations (semantic relatedness between Wikipedia concepts, lexical relatedness between WordNet concepts and social relatedness between NEs) correspondingly from three knowledge sources: Wikipedia, WordNet and NE Co-occurrence Corpus.
    Page 3, “The Structural Semantic Relatedness Measure”
  3. NE Co-occurrence Corpus, a corpus of documents for capturing the social relatedness between named entities.
    Page 3, “The Structural Semantic Relatedness Measure”
  4. According to the fuzzy set theory (Baeza-Yates et al., 1999), the degree of named entities co-occurrence in a corpus is a measure of the relatedness between them.
    Page 3, “The Structural Semantic Relatedness Measure”
  5. So the co-occurrence statistics can be used to measure the social relatedness between named entities.
    Page 4, “The Structural Semantic Relatedness Measure”
  6. In this paper, given a NE Co-occurrence Corpus D, the social relatedness scr between two named entities ne1 and neg is measured using the Google Similarity Distance (Cilibrasi and Vitanyi, 2007):
    Page 4, “The Structural Semantic Relatedness Measure”
  7. There were three knowledge sources we used for our experiments: the WordNet 3.0; the Sep. 9, 2007 English version of Wikipedia; and the Web pages of each ambiguous name in WePS datasets as the NE Co-occurrence Corpus.
    Page 7, “Experiments”
  8. (2007) used the co-occurrence statistics between named entities in the Web.
    Page 9, “Related Work”

See all papers in Proc. ACL 2010 that mention Co-occurrence.

See all papers in Proc. ACL that mention Co-occurrence.

Back to top.

Graphical Model

Appears in 6 sentences as: Graphical Model (2) Graphical model (1) Graphical Models (2) Graphical models (1)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. 4) Learning in Graphical Models : Michael Jordan.
    Page 1, “Introduction”
  2. For example, as shown in Figure l, with the background knowledge that both Learning and Graphical models are the topics related to Machine learning, while Machine learning is the sub domain of Computer science, a human can easily determine that the two Michael Jordan in the 15t and 4th observations represent the same person.
    Page 2, “Introduction”
  3. 4)[Learning]in [ Graphical Models } Michael Jordan
    Page 2, “Introduction”
  4. For example, as shown in Figure 2, the semantic relation between Graphical Model and Computer Science is embedded in the link structure of the Wikipedia.
    Page 2, “Introduction”
  5. Researcher Graphical Model W 0.28 Computer 048 Science 041 Learning
    Page 4, “The Structural Semantic Relatedness Measure”
  6. For demonstration, Table 4 shows some structural semantic relatedness values of the Semantic-graph in Figure 3 (CS represents computer science and GM represents Graphical model ).
    Page 6, “The Structural Semantic Relatedness Measure”

See all papers in Proc. ACL 2010 that mention Graphical Model.

See all papers in Proc. ACL that mention Graphical Model.

Back to top.

edge weight

Appears in 5 sentences as: edge weight (5)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. Concretely, the semantic-graph is defined as follows: A semantic-graph is a weighted graph G = (V, E), where each node represents a distinct concept; and each edge between a pair of nodes represents the semantic relation between the two concepts corresponding to these nodes, with the edge weight indicating the strength of the semantic relation.
    Page 4, “The Structural Semantic Relatedness Measure”
  2. That is, for each pair of extracted concepts, we identify whether there are semantic relations between them: 1) If there is only one semantic relation between them, we connect these two concepts with an edge, where the edge weight is the strength of the semantic relation; 2) If there is more than one semantic relations between them, we choose the most reliable semantic relation, i.e., we choose the semantic relation in the knowledge sources according to the order of WordNet, Wikipedia and NE Co-concurrence corpus (Suchanek et al., 2007).
    Page 4, “The Structural Semantic Relatedness Measure”
  3. To simplify the description, we assign each node in se-mantic-graph an integer index from 1 to |V1 and use this index to represent the node, then we can write the adjacency matrix of the semantic-graph G as A, where A[i,j] or Ail- is the edge weight between node i and node j.
    Page 5, “The Structural Semantic Relatedness Measure”
  4. However, these similarity measures are not suitable for our task, because all of them assume that the edges are uniform so that they cannot take edge weight into consideration.
    Page 5, “The Structural Semantic Relatedness Measure”
  5. In order to take both the graph structure and the edge weight into account, we design the structural semantic relatedness measure by extending the measure introduced in Leicht et al.
    Page 5, “The Structural Semantic Relatedness Measure”

See all papers in Proc. ACL 2010 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.

Machine learning

Appears in 4 sentences as: Machine learning (5)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. For example, as shown in Figure l, with the background knowledge that both Learning and Graphical models are the topics related to Machine learning, while Machine learning is the sub domain of Computer science, a human can easily determine that the two Michael Jordan in the 15t and 4th observations represent the same person.
    Page 2, “Introduction”
  2. 1) Michael Jordan is a in Machine learning
    Page 2, “Introduction”
  3. Machine learning Probability Theory V
    Page 2, “Introduction”
  4. Statistics Basketball Machine learning 0.5 8 0.00 MVP 0.00 0.45
    Page 3, “The Structural Semantic Relatedness Measure”

See all papers in Proc. ACL 2010 that mention Machine learning.

See all papers in Proc. ACL that mention Machine learning.

Back to top.

similarity measure

Appears in 4 sentences as: similarity measure (3) similarity measures (1)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. The lexical relatedness lr between two WordNet concepts are measured using the Lin (l998)’s WordNet semantic similarity measure .
    Page 3, “The Structural Semantic Relatedness Measure”
  2. The problem of quantifying the relatedness between nodes in a graph is not a new problem, e.g., the structural equivalence and structural similarity (the SimRank in Jeh and Widom (2002) and the similarity measure in Leicht et al.
    Page 5, “The Structural Semantic Relatedness Measure”
  3. However, these similarity measures are not suitable for our task, because all of them assume that the edges are uniform so that they cannot take edge weight into consideration.
    Page 5, “The Structural Semantic Relatedness Measure”
  4. Because the key problem of named entity disambiguation is to measure the similarity between name observations, we integrate the structural semantic relatedness in the similarity measure , so that it can better reflect the actual similarity between name observations.
    Page 6, “Named Entity Disambiguation by Leveraging Semantic Knowledge”

See all papers in Proc. ACL 2010 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

F-Measure

Appears in 3 sentences as: F-Measure (2) F-measure (1)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. F-Measure (F): the harmonic mean of purity and inverse purity.
    Page 7, “Experiments”
  2. We use F-measure as the primary measure just liking WePSl and WePS2.
    Page 7, “Experiments”
  3. The F-Measure vs. 1 on three data sets
    Page 8, “Experiments”

See all papers in Proc. ACL 2010 that mention F-Measure.

See all papers in Proc. ACL that mention F-Measure.

Back to top.

recursive

Appears in 3 sentences as: recursive (3)
In Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
  1. This definition is recursive , and the starting point we choose is the semantic relatedness in the edge.
    Page 5, “The Structural Semantic Relatedness Measure”
  2. Thus our structural semantic relatedness has two components: the neighbor term of the previous recursive phase which captures the graph structure and the semantic relatedness which captures the edge information.
    Page 5, “The Structural Semantic Relatedness Measure”
  3. Thus, the recursive form of the structural semantic relatedness 5,; between the node i and the node j can be written as:
    Page 5, “The Structural Semantic Relatedness Measure”

See all papers in Proc. ACL 2010 that mention recursive.

See all papers in Proc. ACL that mention recursive.

Back to top.