SciSurf: Index of 'Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?'

Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

Deveaud, Romain and SanJuan, Eric and Bellot, Patrice

Published in Proc. ACL, 2013

Article Structure

Abstract

The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics.

Introduction

Representing documents as mixtures of “topics” has always been a challenge and an objective for researchers working in text-related fields.

Topic-Driven Relevance Models

2.1 Relevance Models

Evaluation

3.1 Experimental setup

Conclusions & Future Work

Overall, modeling query-oriented topic models and estimating the feedback query model using these topics greatly improves ad hoc Information Retrieval, compared to state-of-the-art relevance models.

Topics

topic models

Appears in 20 sentences as: topic model (4) topic modeling (2) topic models (14)

In Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics.
Page 1, “Abstract”
We propose a model-based feedback approach that learns Latent Dirichlet Allocation topic models on the top-ranked pseudo-relevant feedback, and we measure the semantic coherence of those topics.
Page 1, “Abstract”
Based on the words used within a document, topic models learn topic level relations by assuming that the document covers a small set of concepts.
Page 1, “Introduction”
This is one of the reasons of the intensive use of topic models (and especially LDA) in current research in Natural Language Processing (NLP) related areas.
Page 1, “Introduction”
From that perspective, topic models seem attractive in the sense that they can provide a descriptive and intuitive representation of concepts.
Page 1, “Introduction”
Recently, researchers developed measures which evaluate the semantic coherence of topic models (Newman et al., 2010; Mimno et al., 2011; Stevens et al., 2012).
Page 1, “Introduction”
Several studies concentrated on improving the quality of document ranking using topic models , especially probabilistic ones.
Page 1, “Introduction”
The reported results suggest that the words and the probability distributions learned by probabilistic topic models are effective for query expansion.
Page 1, “Introduction”
0 we introduce Topic-Driven Relevance Models, a model-based feedback approach (Zhai and Lafferty, 2001) for integrating topic models into relevance models by learning topics on pseudo-relevant feedback documents (as opposed to the entire document collection),
Page 2, “Introduction”
Instead of viewing 9 as a set of document language models that are likely to contain topical information about the query, we take a probabilistic topic modeling approach.
Page 2, “Topic-Driven Relevance Models”
Figure 1: Semantic coherence of the topic models f( N of feedback documents.
Page 3, “Topic-Driven Relevance Models”

See all papers in Proc. ACL 2013 that mention topic models.

See all papers in Proc. ACL that mention topic models.

LDA

Appears in 8 sentences as: LDA (8)

In Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

Latent Semantic Indexing (Deerwester et al., 1990) (LSI), probabilistic Latent Semantic Analysis (Hofmann, 2001) (pLSA) and Latent Dirichlet Allocation (Blei et al., 2003) ( LDA ) are the most famous approaches that tried to tackle this problem throughout the years.
Page 1, “Introduction”
This is one of the reasons of the intensive use of topic models (and especially LDA ) in current research in Natural Language Processing (NLP) related areas.
Page 1, “Introduction”
The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results.
Page 1, “Introduction”
Following this pioneering work, several studies explored the use of pLSA and LDA under different experimental settings (Park and Ra-mamohanarao, 2009; Yi and Allan, 2009; Andrze-jewski and Buttler, 2011; Lu et al., 2011).
Page 1, “Introduction”
LDA and learns topics directly on a limited set of documents.
Page 2, “Introduction”
We specifically focus on Latent Dirichlet Allocation ( LDA ), since it is currently one of the most representative.
Page 2, “Topic-Driven Relevance Models”
In LDA , each topic multinomial distribution gbk is generated by a conjugate Dirichlet prior with parameter [3, while each document multinomial distribution 6d is generated by a conjugate Dirichlet prior with parameter 04.
Page 2, “Topic-Driven Relevance Models”
TDRM relies on two important parameters: the number of topics K that we want to learn, and the number of feedback documents N from which LDA learns the topics.
Page 3, “Topic-Driven Relevance Models”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

language model

Appears in 4 sentences as: language model (2) language models (2)

In Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results.
Page 1, “Introduction”
where 9 is a set of pseudo-relevant feedback documents and 6D is the language model of document D. This notion of estimating a query model is
Page 2, “Topic-Driven Relevance Models”
We tackle the null probabilities problem by smoothing the document language model using the well-known Dirichlet smoothing (Zhai and Lafferty, 2004).
Page 2, “Topic-Driven Relevance Models”
Instead of viewing 9 as a set of document language models that are likely to contain topical information about the query, we take a probabilistic topic modeling approach.
Page 2, “Topic-Driven Relevance Models”

See all papers in Proc. ACL 2013 that mention language model.

See all papers in Proc. ACL that mention language model.

news articles

Appears in 4 sentences as: news articles (4)

In Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

Robust04 is composed 528,155 of news articles coming from three newspapers and the FBIS.
Page 3, “Evaluation”
We hypothesize that the heterogeneous nature of the web allows to model very different topics covering several aspects of the query, while news articles are contributions focused on a single subject.
Page 4, “Evaluation”
Although topics coming from news articles may be limited, they benefit from the rich vocabulary of professional writers who are trained to avoid repetition.
Page 4, “Evaluation”
els do not seem to be effective in the context of a news articles search task, they are a good indicator of effectiveness in the context of web search.
Page 5, “Conclusions & Future Work”

See all papers in Proc. ACL 2013 that mention news articles.

See all papers in Proc. ACL that mention news articles.