Summarizing Definition from Wikipedia
Ye, Shiren and Chua, Tat-Seng and LU, Jie

Article Structure

Abstract

Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic.

Introduction

Nowadays, ‘ask Wikipedia’ has become as popular as ‘Google it’ during Internet surfing, as Wikipedia is able to provide reliable information about the concept (entity) that the users want.

Background

Besides some heuristic rules such as sentence position and cue words, typical summarization systems measure the associations (links) between sentences by term repetitions (e. g., LexRank (Erkan and Radev, 2004)).

Our Approach 3.1 Wiki Concepts

In this subsection, we address how to find reasonable and reliable links between sentences using wiki concepts.

Experiments

The purposes of our experiment are twofold: (i) evaluate the effects of wiki definition to the TREC—QA task; and (ii) examine the characteristics and summarization performance of EDCL.

Conclusion and Future Work

Wikipedia recursively defines enormous concepts in huge vector space of wiki concepts.

Topics

WordNet

Appears in 10 sentences as: WordNet (9) WordNet’s (1)
In Summarizing Definition from Wikipedia
  1. Its results are better than other external resources such as WordNet , Gazetteers and Google’s define operator, especially for definition QA (Lita et al., 2004).
    Page 1, “Introduction”
  2. Most of these concepts are multi—word terms, whereas WordNet has only 50,000 plus multi—word terms.
    Page 1, “Introduction”
  3. Any term could appear in the definition of a concept if necessary, while the total vocabulary existing in WordNet’s glossary definition is less than 2000.
    Page 1, “Introduction”
  4. (2007) compute the semantic similarity using WordNet .
    Page 2, “Background”
  5. Its definition is much larger than that of WordNet .
    Page 2, “Background”
  6. We ap—p1y standard DCL in Run 2, where concepts are determined according to their definitions in WordNet (Ye et a1., 2007).
    Page 7, “Experiments”
  7. When we merge diverse words having similar semantic according to WordNet concepts , we obtain 873 concepts per article on average in Run 2.
    Page 7, “Experiments”
  8. Even though the reduction of total concepts is limited, these new wiki concepts will group the terms that cannot be detected by WordNet .
    Page 7, “Experiments”
  9. (3) DCL based on WordNet concepts has less derived nodes (Run 3) than DCL based on wiki concepts does, although the former has more concepts.
    Page 7, “Experiments”
  10. Run 2 Basic DCL model ( WordNet concepts) Run 3 DCL + wiki concepts
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

structural features

Appears in 6 sentences as: structural features (6)
In Summarizing Definition from Wikipedia
  1. The rest of this paper is organized as follows: In the next section, we discuss the background of summarization using both textual and structural features .
    Page 2, “Introduction”
  2. In order to extract some salient sentences from the article as definition summaries, we will build a summarization model that describes the relations between the sentences, where both textual and structural features are considered.
    Page 3, “Background”
  3. Different from free text and general web documents, wiki articles contain structural features , such as infoboxes and outlines, which correlate strongly to nuggets in definition TREC—QA.
    Page 5, “Our Approach 3.1 Wiki Concepts”
  4. By integrating these structural features , we will generate better RP measures in derived topics which facilitates better priority assignment in local topics.
    Page 5, “Our Approach 3.1 Wiki Concepts”
  5. Based on the performance of EDCL for TREC—QA definition task listed in Table 5, we observe that: (i) When EDCL considers wiki concepts and structural features such as outline and infobox, its F—scores increase significantly (Run 3 and Run 4).
    Page 8, “Experiments”
  6. Wikipedia’s special structural features , such as wiki links, infobox and outline, reflect the hidden human knowledge.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention structural features.

See all papers in Proc. ACL that mention structural features.

Back to top.

semantic similarity

Appears in 5 sentences as: semantic similarities (1) semantic similarity (4)
In Summarizing Definition from Wikipedia
  1. (2007) compute the semantic similarity using WordNet.
    Page 2, “Background”
  2. The term pairs with semantic similarity higher than a predefined threshold will be grouped together.
    Page 2, “Background”
  3. We measure the semantic similarity between two concepts by using cosine distance between their wiki articles, which are represented as the vectors of wiki concepts as well.
    Page 3, “Our Approach 3.1 Wiki Concepts”
  4. For computation efficiency, we calculate semantic similarities between all promising concept pairs beforehand, and then retrieve the value in a Hash table directly.
    Page 3, “Our Approach 3.1 Wiki Concepts”
  5. Merge concepts whose semantic similarity is larger than predefined threshold (0.35 in our experiments) into the one with largest idf.
    Page 4, “Our Approach 3.1 Wiki Concepts”

See all papers in Proc. ACL 2009 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.