Employing Topic Models for Pattern-based Semantic Class Discovery
Zhang, Huibin and Zhu, Mingjie and Shi, Shuming and Wen, Ji-Rong

Article Structure

Abstract

A semantic class is a collection of items (words or phrases) which have semantically peer or sibling relationship.

Introduction

Semantic class construction (Lin and Pantel, 2001; Pantel and Lin, 2002; Pasca, 2004; Shinzato and Torisawa, 2005; Ohshima et a1., 2006) tries to discover the peer or sibling relationship among terms or phrases by organizing them into semantic classes.

Topic Models

In this section we briefly introduce the two widely used topic models which are adopted in our paper.

Our Approach

The source data of our approach is a collection (denoted as CR) of RASCs extracted via applying patterns to a large collection of web pages.

Experiments

4.1 Experimental Setup

Related Work

Several categories of work are related to ours.

Conclusions

We presented an approach that employs topic modeling for semantic class construction.

Topics

topic models

Appears in 62 sentences as: topic model (12) Topic modeling (2) topic modeling (23) Topic Models (1) topic models (27)
In Employing Topic Models for Pattern-based Semantic Class Discovery
  1. This paper studies the employment of topic models to automatically construct semantic classes, taking as the source data a collection of raw semantic classes (RASCs), which were extracted by applying predefined patterns to web pages.
    Page 1, “Abstract”
  2. To adopt topic models , we treat RASCs as “documents”, items as “words”, and the final semantic classes as “topics”.
    Page 1, “Abstract”
  3. Appropriate preprocessing and postprocessing are performed to improve results quality, to reduce computation cost, and to tackle the fixed-k constraint of a typical topic model .
    Page 1, “Abstract”
  4. In this paper, we propose to use topic models to address the problem.
    Page 2, “Introduction”
  5. In some topic models , a document is modeled as a mixture of hidden topics.
    Page 2, “Introduction”
  6. Topic modeling provides a formal and convenient way of dealing with multi-membership, which is our primary motivation of adopting topic models here.
    Page 2, “Introduction”
  7. To employ topic models , we treat RASCs as “documents”, items as “words”, and the final semantic classes as “topics”.
    Page 2, “Introduction”
  8. There are, however, several challenges in applying topic models to our problem.
    Page 2, “Introduction”
  9. Second, typical topic models require the number of topics (k) to be given.
    Page 2, “Introduction”
  10. For the first challenge, we choose to apply topic models to the RASCs containing an item q, rather than the whole RASC collection.
    Page 2, “Introduction”
  11. And then a postprocessing operation is performed to merge the results of topic models to generate the ultimate semantic classes.
    Page 2, “Introduction”

See all papers in Proc. ACL 2009 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

LDA

Appears in 14 sentences as: LDA (16) LDAS (1)
In Employing Topic Models for Pattern-based Semantic Class Discovery
  1. 1 Z LDA (Blei et al., 2003): In LDA , the topic mixture is drawn from a conjugate Dirichlet prior that remains the same for all documents (Figure l).
    Page 2, “Topic Models”
  2. Figure l. Graphical model representation of LDA , from Blei et a1.
    Page 2, “Topic Models”
  3. And at the same time, one document could be related to multiple topics in some topic models (e.g., pLSI and LDA ).
    Page 3, “Our Approach”
  4. Here we use LDA as an example to
    Page 3, “Our Approach”
  5. According to the assumption of LDA and our concept mapping in Table 3, a RASC (“document”) is viewed as a mixture of hidden semantic classes (“topics”).
    Page 4, “Our Approach”
  6. The number of topics k is assumed known and fixed in LDA .
    Page 4, “Our Approach”
  7. LDA: Our approach with LDA as the topic model.
    Page 6, “Experiments”
  8. The implementation of LDA is based on Blei’s code of variational EM for LDAS .
    Page 6, “Experiments”
  9. Table 4 shows the average query processing time and results quality of the LDA approach, by varying frequency threshold h. Similar results are observed for the pLSI approach.
    Page 7, “Experiments”
  10. Time complexity and quality comparison among LDA approaches of different thresholds
    Page 7, “Experiments”
  11. LDA pLSI
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

similarity measure

Appears in 3 sentences as: similarity measure (3)
In Employing Topic Models for Pattern-based Semantic Class Discovery
  1. One simple and straightforward similarity measure is the J accard
    Page 5, “Our Approach”
  2. Siml and Sim2 respectively mean Formula 3.1 and Formula 3.2 are used in postprocessing as the similarity measure between
    Page 7, “Experiments”
  3. Sim2 achieves more performance improvement than Siml, which demonstrates the effectiveness of the similarity measure in Formula 3.2.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.