Word or Phrase? Learning Which Unit to Stress for Information Retrieval
Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang

Article Structure

Abstract

The use of phrases in retrieval models has been proven to be helpful in the literature, but no particular research addresses the problem of discriminating phrases that are likely to degrade the retrieval performance from the ones that do not.

Introduction

Various researches have improved the quality of information retrieval by relaxing the traditional ‘bag-of-words’ assumption with the use of phrases.

Previous Work

To date, there have been numerous researches to utilize phrases in retrieval models.

Proposed Method

In this section, we present a phrase-based retrieval framework that utilizes both words and phrases effectively in ranking.

Experiments

In this section, we report the retrieval performances of the proposed method with appropriate baselines over a range of training sets.

Conclusion

In this paper, we present a novel method to differentiate impacts of phrases in retrieval according to their relative contribution over the constituent words.

Topics

phrase-based

Appears in 12 sentences as: Phrase-based (1) phrase-based (11)
In Word or Phrase? Learning Which Unit to Stress for Information Retrieval
  1. We also present useful features that reflect the compositionality and discriminative power of a phrase and its constituent words for optimizing the weights of phrase use in phrase-based retrieval models.
    Page 1, “Abstract”
  2. Our approach to phrase-based retrieval is motivated from the following linguistic intuitions: a) phrases have relatively different degrees of significance, and b) the influence of a phrase should be differentiated based on the phrase’s constituents in retrieval models.
    Page 1, “Introduction”
  3. One of the most earliest work on phrase-based retrieval was done by (Fagan, 1987).
    Page 2, “Previous Work”
  4. In many cases, the early researches on phrase-based retrieval have only focused on extracting phrases, not concerning about how to devise a retrieval model that effectively considers both words and phrases in ranking.
    Page 2, “Previous Work”
  5. While a phrase-based approach selectively incorporated potentially-useful relation between words, the probabilistic approaches force to estimate parameters for all possible combinations of words in text.
    Page 2, “Previous Work”
  6. Our study is clearly distinguished from previous phrase-based approaches; we differentiate the influence of each phrase according to its constituent words, instead of allowing equal influence for all phrases.
    Page 2, “Previous Work”
  7. In this section, we present a phrase-based retrieval framework that utilizes both words and phrases effectively in ranking.
    Page 2, “Proposed Method”
  8. 3.1 Basic Phrase-based Retrieval Model
    Page 3, “Proposed Method”
  9. We start out by presenting a simple phrase-based language modeling retrieval model that assumes uniform contribution of words and phrases.
    Page 3, “Proposed Method”
  10. Given a phrase-based retrieval model that utilizes both words and phrases, one would definitely raise a fundamental question on how much weight should be given to the phrase information compared to the word information.
    Page 3, “Proposed Method”
  11. For these reasons, we propose to adapt a learning-to-rank framework that optimizes multiple parameters of phrase-based retrieval models effectively with less computation cost and without any specific retrieval metric.
    Page 3, “Proposed Method”

See all papers in Proc. ACL 2009 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

bigram

Appears in 5 sentences as: bigram (4) bigrams (1)
In Word or Phrase? Learning Which Unit to Stress for Information Retrieval
  1. Phrase extraction and indexing: We evaluate our proposed method on two different types of phrases: syntactic head-modifier pairs (syntactic phrases) and simple bigram phrases (statistical phrases).
    Page 5, “Experiments”
  2. Since our method is not limited to a particular type of phrases, we have also conducted experiments on statistical phrases ( bigrams ) with a reduced set of features directed applicable; RMO, RSO, PD5, DF, and CPP; the features requiring linguistic preprocessing (e. g. PPT) are not used, because it is unrealistic to use them under bigram—based retrieval setting.
    Page 7, “Experiments”
  3. 5In most cases, the distance between words in a bigram is l, but sometimes, it could be more than 1 because of the effect of stopword removal.
    Page 7, “Experiments”
  4. tainty of preferred distance does not vary much for bigram phrases.
    Page 8, “Experiments”
  5. In our observation, the bigram phrases also show a very similar behavior in retrieval; some of them are very effective while others can deteriorate the performance of retrieval models.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

language modeling

Appears in 4 sentences as: language modeling (4)
In Word or Phrase? Learning Which Unit to Stress for Information Retrieval
  1. These approaches have focused to model statistical or syntactic phrasal relations under the language modeling method for information retrieval.
    Page 2, “Previous Work”
  2. (Srikanth and Srihari, 2003; Maisonnasse et al., 2005) examined the effectiveness of syntactic relations in a query by using language modeling framework.
    Page 2, “Previous Work”
  3. (Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.
    Page 2, “Previous Work”
  4. We start out by presenting a simple phrase-based language modeling retrieval model that assumes uniform contribution of words and phrases.
    Page 3, “Proposed Method”

See all papers in Proc. ACL 2009 that mention language modeling.

See all papers in Proc. ACL that mention language modeling.

Back to top.

n-grams

Appears in 3 sentences as: n-grams (3)
In Word or Phrase? Learning Which Unit to Stress for Information Retrieval
  1. (Miller et al., 1999; Song and Croft, 1999) explore the use n-grams in retrieval models.
    Page 1, “Introduction”
  2. Subsequently, various types of phrases, such as sequential n-grams (Mitra et al., 1997), head-modifier pairs extracted from syntactic structures (Lewis and Croft, 1990; Zhai, 1997; Dillon and Gray, 1983; Strzalkowski et al., 1994), proximity-based phrases (Turpin and Mof—fat, 1999), were examined with conventional retrieval models (e. g. vector space model).
    Page 2, “Previous Work”
  3. (Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.
    Page 2, “Previous Work”

See all papers in Proc. ACL 2009 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

significantly outperforms

Appears in 3 sentences as: significantly outperforms (3)
In Word or Phrase? Learning Which Unit to Stress for Information Retrieval
  1. As shown in the table, the multi-parameter model improves by approximately 18% and 12% on the TREC-6 and 7 partial query sets, and it also significantly outperforms both the word model and the one-parameter model on the TREC-8 query set.
    Page 6, “Experiments”
  2. For both opposite cases, the multi—parameter model significantly outperforms one—parameter model.
    Page 7, “Experiments”
  3. Note that the multi—parameter model significantly outperforms the one—parameter model and all manually—set As for the queries ‘declining birth rate’ and ‘Amazon rain forest’, which also has one effective phrase, ‘rain forest’, and one noneffective phrase, ‘Amazon forest’.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention significantly outperforms.

See all papers in Proc. ACL that mention significantly outperforms.

Back to top.