Paraphrasing Adaptation for Web Search Ranking
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming

Article Structure

Abstract

Mismatch between queries and documents is a key issue for the web search task.

Introduction

Paraphrasing is an NLP technique that generates alternative expressions to convey the same meaning of the input text in different ways.

Paraphrasing for Web Search

In this section, we first summarize our paraphrase extraction approaches, and then describe our paraphrasing engine for the web search task from three aspects, including: 1) a search-oriented paraphrasing model; 2) an NDCG—based parameter optimization algorithm; 3) an enhanced ranking model with augmented features that are computed based on the extra knowledge provided by the paraphrase candidates of the original queries.

Experiment

3.1 Data and Metric

Conclusion and Future Work

In this paper, we present an in-depth study on using paraphrasing for web search, which pays close attention to various aspects of the application including choice of model and optimization technique.

Topics

human annotators

Appears in 5 sentences as: Human annotated (1) human annotated (1) human annotators (3)
In Paraphrasing Adaptation for Web Search Ranking
  1. Additionally, human annotated data can also be used as high-quality paraphrases.
    Page 2, “Paraphrasing for Web Search”
  2. QZ- is the ith query and Dfabel C D is a subset of documents, in which the relevance between Qi and each document is labeled by human annotators .
    Page 2, “Paraphrasing for Web Search”
  3. The relevance rating labeled by human annotators can be represented by five levels: “Perfect”, “Excellent”, “Good”, “Fair”, and “Bad”.
    Page 3, “Paraphrasing for Web Search”
  4. where TDQ denotes a numerical relevance rating labeled by human annotators denoting the relevance between Q and DQ.
    Page 3, “Paraphrasing for Web Search”
  5. Human annotated data contains 0.3M synonym pairs from WordNet dictionary.
    Page 3, “Experiment”

See all papers in Proc. ACL 2013 that mention human annotators.

See all papers in Proc. ACL that mention human annotators.

Back to top.

feature weight

Appears in 4 sentences as: feature weight (2) feature weights (2)
In Paraphrasing Adaptation for Web Search Ranking
  1. We utilize minimum error rate training (MERT) (Och, 2003) to optimize feature weights of the paraphrasing model according to NDCG.
    Page 2, “Paraphrasing for Web Search”
  2. MERT is used to optimize feature weights of our linear-formed paraphrasing model.
    Page 2, “Paraphrasing for Web Search”
  3. S X3184 = arg min{Z ETTaDz-Label, 62,-; A3184, M” i=1 The objective of MERT is to find the optimal feature weight vector Xi” that minimizes the error criterion Err according to the NDCG scores of top-l paraphrase candidates.
    Page 3, “Paraphrasing for Web Search”
  4. F 2 {F1, ..., FK} denotes a set of matching features that measure the matching degrees between Q and DQ, FAQ, DQ) e F is the kth matching feature, M, is its corresponding feature weight .
    Page 3, “Paraphrasing for Web Search”

See all papers in Proc. ACL 2013 that mention feature weight.

See all papers in Proc. ACL that mention feature weight.

Back to top.

machine translation

Appears in 4 sentences as: machine translation (4)
In Paraphrasing Adaptation for Web Search Ranking
  1. Comparing to these works, our paraphrasing engine alters queries in a similar way to statistical machine translation , with systematic tuning and decoding components.
    Page 1, “Introduction”
  2. Similar to statistical machine translation (SMT), given an input query Q, our paraphrasing engine generates paraphrase candidates1 based on a linear model.
    Page 2, “Paraphrasing for Web Search”
  3. The bilingual corpus includes 5.1M sentence pairs from the NIST 2008 constrained track of Chinese-to-English machine translation task.
    Page 3, “Experiment”
  4. Development data are generated based on the English references of NIST 2008 constrained track of Chinese-to-English machine translation task.
    Page 4, “Experiment”

See all papers in Proc. ACL 2013 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

weight vector

Appears in 4 sentences as: weight vector (4)
In Paraphrasing Adaptation for Web Search Ranking
  1. S X3184 = arg min{Z ETTaDz-Label, 62,-; A3184, M” i=1 The objective of MERT is to find the optimal feature weight vector Xi” that minimizes the error criterion Err according to the NDCG scores of top-l paraphrase candidates.
    Page 3, “Paraphrasing for Web Search”
  2. where is the best paraphrase candidate according to the paraphrasing model based on the weight vector All”, N(Dfabel, 62,, R) is the NDCG score of computed on the documents ranked by R of Q,- and labeled document set ’Dfabez of 62,-.
    Page 3, “Paraphrasing for Web Search”
  3. How to learn the weight vector {A6521 is a standard leaming-to-rank task.
    Page 3, “Paraphrasing for Web Search”
  4. The goal of learning is to find an optimal weight vector {3%}sz such that for any two documents D22 6 D and D22 6 D, the following condition holds:
    Page 3, “Paraphrasing for Web Search”

See all papers in Proc. ACL 2013 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.

development set

Appears in 3 sentences as: development set (3)
In Paraphrasing Adaptation for Web Search Ranking
  1. {Qi,D{4abel}f=1 is a human-labeled development set .
    Page 2, “Paraphrasing for Web Search”
  2. We use the first 1,419 queries together with their annotated documents as the development set to tune paraphrasing parameters (as we discussed in Section 2.3), and use the rest as the test set.
    Page 3, “Experiment”
  3. The ranking model is trained based on the development set .
    Page 3, “Experiment”

See all papers in Proc. ACL 2013 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

Word alignments

Appears in 3 sentences as: Word alignments (2) word alignments (1)
In Paraphrasing Adaptation for Web Search Ranking
  1. Word alignments within each paraphrase pair are generated using GIZA++ (Och and Ney, 2000).
    Page 2, “Paraphrasing for Web Search”
  2. In order to enable our paraphrasing model to learn the preferences on different paraphrasing strategies according to the characteristics of web queries, we design search-oriented features2 based on word alignments within Q and Q’, which can be described as follows:
    Page 2, “Paraphrasing for Web Search”
  3. Word alignments of each paraphrase pair are trained by GIZA++.
    Page 3, “Experiment”

See all papers in Proc. ACL 2013 that mention Word alignments.

See all papers in Proc. ACL that mention Word alignments.

Back to top.