Joint Annotation of Search Queries
Bendersky, Michael and Croft, W. Bruce and Smith, David A.

Article Structure

Abstract

Marking up search queries with linguistic annotations such as part—of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in information retrieval systems.

Introduction

Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities, or semantic roles is a common practice in natural language processing (NLP).

Query Annotation Example

To demonstrate a possible implementation of linguistic annotation for search queries, Figure 1 presents a simple markup scheme, exemplified using three web search queries (as they appear in a search log): (a) who won the 2004 kentucky derby, (b) kindred where would i be, and (c) shih tzu health problems.

Joint Query Annotation

Given a search query Q, which consists of a sequence of terms (q1,.

Independent Query Annotations

While the joint annotation method proposed in Section 3 is general enough to be applied to any set of independent query annotations, in this work we focus on two previously proposed independent annotation methods based on either the query itself, or the top sentences retrieved in response to the query (Bendersky et al., 2010).

Related Work

In recent years, linguistic annotation of search queries has been receiving increasing attention as an important step toward better query processing and understanding.

Experiments

6.1 Experimental Setup

Conclusions

In this paper, we have investigated a joint approach for annotating search queries with linguistic structures, including capitalization, POS tags and segmentation.

Acknowledgment

This work was supported in part by the Center for Intelligent Information Retrieval and in part by ARRA NSF IIS-9014442.

Topics

POS tagging

Appears in 16 sentences as: POS tagger (2) POS tagging (11) POS tags (3)
In Joint Annotation of Search Queries
  1. In this scheme, each query is marked-up using three annotations: capitalization, POS tags , and segmentation indicators.
    Page 2, “Query Annotation Example”
  2. Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging , phrase chunking, named entity recognition, and stopword indicators, to name just a few.
    Page 3, “Joint Query Annotation”
  3. For instance, imagine that we need to perform two annotations: capitalization and POS tagging .
    Page 3, “Joint Query Annotation”
  4. On the other hand, given sentence from a corpus that is relevant to the query lCh as “Hawaiian Falls is a family-friendly water-:irk”, the word “falls” is correctly identified by a andard POS tagger as a proper noun.
    Page 4, “Independent Query Annotations”
  5. (2010), an estimate of p(Cz-|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging ) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006).
    Page 5, “Independent Query Annotations”
  6. Most of the previous work on query annotation focuses on performing a particular annotation task (e.g., segmentation or POS tagging ) in isolation.
    Page 5, “Related Work”
  7. This sample is manually labeled with three annotations: capitalization, POS tags , and segmentation, according to the description of these annotations in Figure 1.
    Page 5, “Experiments”
  8. Table 1: Summary of query annotation performance for capitalization (CAP), POS tagging (TAG) and segmentation.
    Page 6, “Experiments”
  9. In case of POS tagging , the decisions are ternary, and hence we report the classification accuracy.
    Page 6, “Experiments”
  10. Figure 3: Comparative performance (in terms of F1 for capitalization and segmentation and accuracy for POS tagging ) of the j-PRF method on the three query types.
    Page 8, “Experiments”
  11. Figure 3 shows a plot that contrasts the relative performance for these three query types of our best-performing joint annotation method, j-PRF, on capitalization, POS tagging and segmentation annotation tasks.
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

CRF

Appears in 5 sentences as: CRF (5) CRF++ (1)
In Joint Annotation of Search Queries
  1. Accordingly, we can directly use a superv1sed sequential probabilistic model such as CRF (Lafferty
    Page 3, “Joint Query Annotation”
  2. In this CRF
    Page 3, “Joint Query Annotation”
  3. It then produces a set of independent annotation estimates, which are jointly used, together with the ground truth annotations, to learn a CRF model for each annotation type.
    Page 3, “Joint Query Annotation”
  4. Finally, these CRF models are used to predict annotations on a held-out set of queries, which are the output of the algorithm.
    Page 3, “Joint Query Annotation”
  5. The CRF model training in line (6) of the algorithm is implemented using CRF++ toolkit3.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

natural language

Appears in 5 sentences as: natural language (5)
In Joint Annotation of Search Queries
  1. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.
    Page 1, “Abstract”
  2. Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities, or semantic roles is a common practice in natural language processing (NLP).
    Page 1, “Introduction”
  3. Instead of just focusing our attention on keyword queries, as is often done in previous work (Barr et al., 2008; Bergsma and Wang, 2007; Tan and Peng, 2008; Guo et al., 2008), we also explore the performance of our annotations with more complex natural language search queries such as verbal phrases and wh-questions, which often pose a challenge for IR applications (Bendersky et al., 2010; Kumaran and Allan, 2007; Kumaran and Carvalho, 2009; Lease, 2007).
    Page 2, “Introduction”
  4. Instead, we are interested in annotation of queries of different types, including verbose natural language queries.
    Page 5, “Related Work”
  5. An additional research area which is relevant to this paper is the work on joint structure modeling (Finkel and Manning, 2009; Toutanova et al., 2008) and stacked classification (Nivre and McDonald, 2008; Martins et al., 2008) in natural language processing.
    Page 5, “Related Work”

See all papers in Proc. ACL 2011 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

n-gram

Appears in 4 sentences as: n-gram (4)
In Joint Annotation of Search Queries
  1. (2010), an estimate of p(Cz-|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006).
    Page 5, “Independent Query Annotations”
  2. SEG-I method requires an access to a large web n-gram corpus (Brants and Franz, 2006).
    Page 9, “Experiments”
  3. where SQ is the set of all possible query segmenta-tions, 8 is a possible segmentation, s is a segment in S, and count(s) is the frequency of s in the web n-gram corpus.
    Page 9, “Experiments”
  4. (2009), and include, among others, n-gram frequencies in a sample of a query log, web corpus and Wikipedia titles.
    Page 9, “Experiments”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

SEG

Appears in 4 sentences as: SEG (4)
In Joint Annotation of Search Queries
  1. 2Q 2 {CAP, TAG, SEG }.
    Page 3, “Joint Query Annotation”
  2. SEG
    Page 6, “Experiments”
  3. 8- ....... _<>~~~~ °’ SEG <> 3- ’-—’A ‘ ' ' ‘ ' ' ‘ ' - e - - - - “A v- o, g,"’ 0 L 00 r,—A, l0_ [\ E’— O\CAP 0
    Page 8, “Experiments”
  4. SEG F1 MQA
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention SEG.

See all papers in Proc. ACL that mention SEG.

Back to top.

statistically significant

Appears in 4 sentences as: statistical significance (1) statistically significant (3)
In Joint Annotation of Search Queries
  1. * and Jr denote statistically significant differences with i-QRY and i-PRF, respectively.
    Page 6, “Experiments”
  2. In order to test the statistical significance of improvements attained by the proposed methods we use a two-sided Fisher’s randomization test with 20,000 permutations.
    Page 6, “Experiments”
  3. Results with p-value < 0.05 are considered statistically significant .
    Page 6, “Experiments”
  4. * :notes statistically significant differences with SEG-I .
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probabilities (1) conditional probability (2)
In Joint Annotation of Search Queries
  1. The most straightforward way to estimate the conditional probabilities in Eq.
    Page 4, “Independent Query Annotations”
  2. iven a short, often ungrammatical query, it is hard 1 accurately estimate the conditional probability in q.
    Page 4, “Independent Query Annotations”
  3. Furthermore, to make the estimation of the conditional probability p(zQ |r) feasible, it is assumed that the symbols C;- in the annotation sequence are independent, given a sentence 7“.
    Page 5, “Independent Query Annotations”

See all papers in Proc. ACL 2011 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

named entity

Appears in 3 sentences as: named entities (1) named entity (2)
In Joint Annotation of Search Queries
  1. Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities , or semantic roles is a common practice in natural language processing (NLP).
    Page 1, “Introduction”
  2. Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging, phrase chunking, named entity recognition, and stopword indicators, to name just a few.
    Page 3, “Joint Query Annotation”
  3. These approaches have been shown to be successful for tasks such as parsing and named entity recognition in newswire data (Finkel and Manning, 2009) or semantic role labeling in the Penn Treebank and Brown corpus (Toutanova et al., 2008).
    Page 5, “Related Work”

See all papers in Proc. ACL 2011 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

part-of-speech

Appears in 3 sentences as: part-of-speech (3)
In Joint Annotation of Search Queries
  1. Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities, or semantic roles is a common practice in natural language processing (NLP).
    Page 1, “Introduction”
  2. (2010) we use a large -gram corpus (Brants and Franz, 2006) to estimate [Cl-mi) for annotating the query with capitalization 1d segmentation markup, and a standard POS tag-:r1 for part-of-speech tagging of the query.
    Page 4, “Independent Query Annotations”
  3. The literature on query annotation includes query segmentation (Bergsma and Wang, 2007; Jones et al., 2006; Guo et al., 2008; Hagen et al., 2010; Hagen et al., 2011; Tan and Peng, 2008), part-of-speech and semantic tagging (Barr et al., 2008; Manshadi and Li, 2009; Li, 2010), named-entity recognition (Guo et al., 2009; Lu et al., 2009; Shen et al., 2008; Pasca, 2007), abbreviation disambiguation (Wei et al., 2008) and stopword detection (Lo et al., 2005; Jones and Fain, 2003).
    Page 5, “Related Work”

See all papers in Proc. ACL 2011 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.