Abstract | This paper studies the employment of topic models to automatically construct semantic classes, taking as the source data a collection of raw semantic classes (RASCs), which were extracted by applying predefined patterns to web pages. |
Abstract | To adopt topic models , we treat RASCs as “documents”, items as “words”, and the final semantic classes as “topics”. |
Abstract | Appropriate preprocessing and postprocessing are performed to improve results quality, to reduce computation cost, and to tackle the fixed-k constraint of a typical topic model . |
Introduction | In this paper, we propose to use topic models to address the problem. |
Introduction | In some topic models , a document is modeled as a mixture of hidden topics. |
Introduction | Topic modeling provides a formal and convenient way of dealing with multi-membership, which is our primary motivation of adopting topic models here. |
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | 2In topic modeling literature, attributes are words and attribute sets are documents. |
Introduction | In this paper, we show that both of these goals can be realized jointly using a probabilistic topic model , namely hierarchical Latent Dirichlet Allocation (LDA) (Blei et al., 2003b). |
Introduction | There are three main advantages to using a topic model as the annotation procedure: (1) Unlike hierarchical clustering (Duda et al., 2000), the attribute distribution at a concept node is not composed of the distributions of its children; attributes found specific to the concept Painter would not need to appear in the distribution of attributes for Person, making the internal distributions at each concept more meaningful as attributes specific to that concept; (2) Since LDA is fully Bayesian, its model semantics allow additional prior information to be included, unlike standard models such as Latent Semantic Analysis (Hofmann, 1999), improving annotation precision; (3) Attributes with multiple related meanings (i.e., polysemous attributes) are modeled implicitly: if an attribute (e.g., “style”) occurs in two separate input classes (e.g., poets and car models), then that attribute might attach at two different concepts in the ontology, which is better than attaching it at their most specific common ancestor (Whole) if that ancestor is too general to be useful. |
Introduction | The remainder of this paper is organized as follows: §2 describes the full ontology annotation framework, §3 introduces the LDA-based topic models , §4 gives the experimental setup, §5 gives results, §6 gives related work and §7 concludes. |