A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
Weerkamp, Wouter and Balog, Krisztian and de Rijke, Maarten

Article Structure

Abstract

User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage.

Introduction

One of the grand challenges in information retrieval is to bridge the vocabulary gap between a user and her information need on the one hand and the relevant documents on the other (Baeza-Yates and Ribeiro-Neto, 1999).

Related Work

Related work comes in two main flavors: (i) query modeling in general, and (ii) query expansion using external sources (external expansion).

Retrieval Framework

We work in the setting of generative language models.

Query Modeling Approach

Our goal is to build an expanded query model that combines evidence from multiple external collections.

Estimating Components

The models introduced above offer us several choices in estimating the main components.

Experimental Setup

In his section we detail our experimental setup: the (external) collections we use, the topic sets and relevance judgements available, and the significance testing we perform.

Results

We first discuss the parameter tuning for our four EEM models in Section 7.1.

Discussion

Rather than providing a pairwise comparison of all runs listed in the previous section, we consider two pairwise comparisons—between (an instantion of) our model and the baseline, and between two instantiations of our model—and highlight phenomena that we also observed in other pairwise comparisons.

Conclusions

We explored the use of external corpora for query expansion in a user generated content setting.

Topics

language modeling

Appears in 6 sentences as: language model (1) language modeling (4) language models (3)
In A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
  1. In the setting of language modeling approaches to query expansion, the local analysis idea has been instantiated by estimating additional query language models (Lafferty and Zhai, 2003; Tao and Zhai, 2006) or relevance models (Lavrenko and Croft, 2001) from a set of feedback documents.
    Page 2, “Related Work”
  2. (2005) also try to uncover multiple aspects of a query, and to that they provide an iterative “pseudo-query” generation technique, using cluster-based language models .
    Page 2, “Related Work”
  3. Diaz and Metzler (2006) were the first to give a systematic account of query expansion using an external corpus in a language modeling setting, to improve the estimation of relevance models.
    Page 2, “Related Work”
  4. We work in the setting of generative language models .
    Page 3, “Retrieval Framework”
  5. Within the language modeling approach, one builds a language model from each document, and ranks documents based on the probability of the document model generating the query.
    Page 3, “Retrieval Framework”
  6. The particulars of the language modeling approach have been discussed extensively in the literature (see, e.g., Balog et al.
    Page 3, “Retrieval Framework”

See all papers in Proc. ACL 2009 that mention language modeling.

See all papers in Proc. ACL that mention language modeling.

Back to top.

general model

Appears in 5 sentences as: general model (2) generative model (2) generative models (1)
In A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
  1. We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled.
    Page 1, “Abstract”
  2. Our aim in this paper is to define and evaluate generative models for expanding queries using external collections.
    Page 1, “Introduction”
  3. As will become clear in §4, Diaz and Metzler’s approach is an instantiation of our general model for external expansion.
    Page 2, “Related Work”
  4. We are driven by the same motivation, but where they considered rank-based result combinations and simple mixtures of query models, we take a more principled and structured approach, and develop four versions of a generative model for query expansion using external collections.
    Page 3, “Related Work”
  5. Theoretically, the main difference between these two instantiations of our general model is that EEM3 makes much stronger simplifying indepence assumptions than EEM1.
    Page 7, “Discussion”

See all papers in Proc. ACL 2009 that mention general model.

See all papers in Proc. ACL that mention general model.

Back to top.