Automatic Keyphrase Extraction: A Survey of the State of the Art
Hasan, Kazi Saidul and Ng, Vincent

Article Structure

Abstract

While automatic keyphrase extraction has been examined extensively, state-of-the-art performance on this task is still much lower than that on many core natural language processing tasks.

Introduction

Automatic keyphrase extraction concerns “the automatic selection of important and topical phrases from the body of a document” (Turney, 2000).

Corpora

Automatic keyphrase extraction systems have been evaluated on corpora from a variety of

Keyphrase Extraction Approaches

A keyphrase extraction system typically operates in two steps: (1) extracting a list of words/phrases that serve as candidate keyphrases using some

Evaluation

In this section, we describe metrics for evaluating keyphrase extraction systems as well as state-of-the-art results on commonly-used datasets.

Analysis

With the goal of providing directions for future work, we identify the errors commonly made by state-of-the-art keyphrase extractors below.

Conclusion and Future Directions

We have presented a survey of the state of the art in automatic keyphrase extraction.

Topics

graph-based

Appears in 7 sentences as: Graph-Based (1) Graph-based (1) graph-based (6)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. 3.3.1 Graph-Based Ranking
    Page 4, “Keyphrase Extraction Approaches”
  2. The basic idea behind a graph-based approach is to build a graph from the input document and rank its nodes according to their importance using a graph-based ranking method (e.g., Erin and Page (1998)).
    Page 4, “Keyphrase Extraction Approaches”
  3. TextRank (Mihalcea and Tarau, 2004) is one of the most well-known graph-based approaches to keyphrase extraction.
    Page 4, “Keyphrase Extraction Approaches”
  4. This instantiation of a graph-based approach overlooks an important aspect of keyphrase extraction, however.
    Page 4, “Keyphrase Extraction Approaches”
  5. Despite this weakness, a graph-based representation of text was adopted by many approaches that propose different ways of computing the similarity between two candidates.
    Page 4, “Keyphrase Extraction Approaches”
  6. Zha (2002) proposes the first graph-based approach for simultaneous summarization and keyphrase extraction, motivated by a key observation: a sentence is important if it contains important words, and important words appear in important sentences.
    Page 5, “Keyphrase Extraction Approaches”
  7. (Grineva et al., 2009) [X] News Graph-based ranking (DUC for extended neighborhood 28.8 35.4 31.7 -2001) (Wan and Xiao, 2008b) [><] Papers Statistical, semantic, and
    Page 7, “Analysis”

See all papers in Proc. ACL 2014 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

extraction systems

Appears in 6 sentences as: extraction system (2) extraction systems (4)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. Automatic keyphrase extraction systems have been evaluated on corpora from a variety of
    Page 1, “Corpora”
  2. A keyphrase extraction system typically operates in two steps: (1) extracting a list of words/phrases that serve as candidate keyphrases using some
    Page 2, “Keyphrase Extraction Approaches”
  3. In this section, we describe metrics for evaluating keyphrase extraction systems as well as state-of-the-art results on commonly-used datasets.
    Page 6, “Evaluation”
  4. To score the output of a keyphrase extraction system , the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).
    Page 6, “Evaluation”
  5. Although a few researchers have presented a sample of their systems’ output and the corresponding gold keyphrases to show the differences between them (Witten et al., 1999; Nguyen and Kan, 2007; Medelyan et al., 2009), a systematic analysis of the major types of errors made by state-of-the-art keyphrase extraction systems is missing.
    Page 7, “Analysis”
  6. To fill this gap, we ran four keyphrase extraction systems on four commonly-used datasets of varying sources, including Inspec abstracts (Hulth, 2003), DUC-2001 news articles (Over, 2001), scientific papers (Kim et al., 2010b), and meeting transcripts (Liu et al., 2009a).
    Page 7, “Analysis”

See all papers in Proc. ACL 2014 that mention extraction systems.

See all papers in Proc. ACL that mention extraction systems.

Back to top.

news articles

Appears in 6 sentences as: news article (2) news articles (4)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. Consequently, it is harder to extract keyphrases from scientific papers, technical reports, and meeting transcripts than abstracts, emails, and news articles .
    Page 1, “Corpora”
  2. Topic change An observation commonly exploited in keyphrase extraction from scientific articles and news articles is that keyphrases typically appear not only at the beginning (Witten et al., 1999) but also at the end (Medelyan et al., 2009) of a document.
    Page 2, “Corpora”
  3. Topic correlation Another observation commonly exploited in keyphrase extraction from scientific articles and news articles is that the keyphrases in a document are typically related to each other (Turney, 2003; Mihalcea and Tarau, 2004).
    Page 2, “Corpora”
  4. To fill this gap, we ran four keyphrase extraction systems on four commonly-used datasets of varying sources, including Inspec abstracts (Hulth, 2003), DUC-2001 news articles (Over, 2001), scientific papers (Kim et al., 2010b), and meeting transcripts (Liu et al., 2009a).
    Page 7, “Analysis”
  5. To be more concrete, consider the news article on athlete Ben Johnson in Figure 1, where the keyphrases are boldfaced.
    Page 7, “Analysis”
  6. Figure 1: A news article on Ben Johnson from the DUC-2001 dataset.
    Page 8, “Analysis”

See all papers in Proc. ACL 2014 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.

state of the art

Appears in 5 sentences as: State of the Art (2) state of the art (3)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
    Page 1, “Abstract”
  2. Our goal in this paper is to survey the state of the art in keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
    Page 1, “Introduction”
  3. n: A Survey of the State of the Art
    Page 1, “Corpora”
  4. 4.2 The State of the Art
    Page 7, “Evaluation”
  5. We have presented a survey of the state of the art in automatic keyphrase extraction.
    Page 9, “Conclusion and Future Directions”

See all papers in Proc. ACL 2014 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

co-occurrence

Appears in 4 sentences as: co-occurrence (4)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. Researchers have computed relatedness between candidates using co-occurrence counts (Mihalcea and Tarau, 2004; Matsuo and Ishizuka, 2004) and semantic relatedness (Grineva et al., 2009), and represented the relatedness information collected from a document as a graph (Mihalcea and Tarau, 2004; Wan and Xiao, 2008a; Wan and Xiao, 2008b; Bougouin et al., 2013).
    Page 4, “Keyphrase Extraction Approaches”
  2. Finally, an edge weight in a WW graph denotes the co-occurrence or knowledge-based similarity between the two connected words.
    Page 5, “Keyphrase Extraction Approaches”
  3. As discussed before, the relationship between two candidates is traditionally established using co-occurrence information.
    Page 9, “Analysis”
  4. However, using co-occurrence windows has its shortcomings.
    Page 9, “Analysis”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

evaluation metrics

Appears in 4 sentences as: Evaluation Metrics (1) evaluation metrics (3)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. 4.1 Evaluation Metrics
    Page 6, “Evaluation”
  2. Designing evaluation metrics for keyphrase extraction is by no means an easy task.
    Page 6, “Evaluation”
  3. To score the output of a keyphrase extraction system, the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).
    Page 6, “Evaluation”
  4. For this reason, researchers have experimented with two types of automatic evaluation metrics .
    Page 6, “Evaluation”

See all papers in Proc. ACL 2014 that mention evaluation metrics.

See all papers in Proc. ACL that mention evaluation metrics.

Back to top.

language model

Appears in 4 sentences as: language model (2) Language Modeling (1) language models (2)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. 3.3.4 Language Modeling
    Page 6, “Keyphrase Extraction Approaches”
  2. These feature values are estimated using language models (LMs) trained on a foreground corpus and a background corpus.
    Page 6, “Keyphrase Extraction Approaches”
  3. In sum, LMA uses a language model rather than heuristics to identify phrases, and relies on the language model trained on the background corpus to determine how “unique” a candidate keyphrase is to the domain represented by the foreground corpus.
    Page 6, “Keyphrase Extraction Approaches”
  4. While the use of language models to identify phrases cannot be considered a major strength of this approach (because heuristics can identify phrases fairly reliably), the use of a background corpus to identify candidates that are unique to the foreground’s domain is a unique aspect of this approach.
    Page 6, “Keyphrase Extraction Approaches”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

semantic relatedness

Appears in 4 sentences as: Semantic relatedness (1) semantic relatedness (2) semantically related (1)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. While the aforementioned external resource-based features attempt to encode how salient a candidate keyphrase is, Turney (2003) proposes features that encode the semantic relatedness between two candidate keyphrases.
    Page 4, “Keyphrase Extraction Approaches”
  2. Noting that candidate keyphrases that are not semantically related to the predicted keyphrases are unlikely to be keyphrases in technical reports, Turney employs coherence features to identify such candidate keyphrases.
    Page 4, “Keyphrase Extraction Approaches”
  3. Semantic relatedness is encoded in the coherence features as two candidate keyphrases’ pointwise mutual information, which Turney computes by using the Web as a corpus.
    Page 4, “Keyphrase Extraction Approaches”
  4. Researchers have computed relatedness between candidates using co-occurrence counts (Mihalcea and Tarau, 2004; Matsuo and Ishizuka, 2004) and semantic relatedness (Grineva et al., 2009), and represented the relatedness information collected from a document as a graph (Mihalcea and Tarau, 2004; Wan and Xiao, 2008a; Wan and Xiao, 2008b; Bougouin et al., 2013).
    Page 4, “Keyphrase Extraction Approaches”

See all papers in Proc. ACL 2014 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.

edge weight

Appears in 3 sentences as: edge weight (3)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. The edge weight is proportional to the syntactic and/or semantic relevance between the connected candidates.
    Page 4, “Keyphrase Extraction Approaches”
  2. An edge weight in a SW graph denotes the word’s importance in the sentence it appears.
    Page 5, “Keyphrase Extraction Approaches”
  3. Finally, an edge weight in a WW graph denotes the co-occurrence or knowledge-based similarity between the two connected words.
    Page 5, “Keyphrase Extraction Approaches”

See all papers in Proc. ACL 2014 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.

LM

Appears in 3 sentences as: LM (8)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. A unigram LM and an n-gram LM are constructed for each of these two corpora.
    Page 6, “Keyphrase Extraction Approaches”
  2. Phraseness, defined using the foreground LM, is calculated as the loss of information incurred as a result of assuming a unigram LM (i.e., conditional independence among the words of the phrase) instead of an n-gram LM (i.e., the phrase is drawn from an n-gram LM ).
    Page 6, “Keyphrase Extraction Approaches”
  3. Informativeness is computed as the loss that results because of the assumption that the candidate is sampled from the background LM rather than the foreground LM .
    Page 6, “Keyphrase Extraction Approaches”

See all papers in Proc. ACL 2014 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

shared task

Appears in 3 sentences as: shared task (3)
In Automatic Keyphrase Extraction: A Survey of the State of the Art
  1. To score the output of a keyphrase extraction system, the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).
    Page 6, “Evaluation”
  2. For example, KP-Miner (El-Beltagy and Rafea, 2010), an unsupervised system, ranked third in the SemEval-2010 shared task with an F-score of 25.2, which is comparable to the best supervised system scoring 27.5.
    Page 7, “Evaluation”
  3. 3 A more detailed analysis of the results of the SemEval-2010 shared task and the approaches adopted by the participating systems can be found in Kim et a1.
    Page 7, “Analysis”

See all papers in Proc. ACL 2014 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.