Question Answering Using Enhanced Lexical Semantic Models
Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej

Article Structure

Abstract

In this paper, we study the answer sentence selection problem for question answering.

Introduction

Open-domain question answering (QA), which fulfills a user’s information need by outputting direct answers to natural language queries, is a challenging but important problem (Etzioni, 2011).

Related Work

While the task of question answering has a long history dated back to the dawn of artificial intelligence, early systems like STUDENT (Winograd, 1977) and LUNAR (Woods, 1973) are typically designed to demonstrate natural language understanding for a small and specific domain.

Problem Definition

We consider the answer selection problem in a supervised learning setting.

Lexical Semantic Models

In this section, we introduce the lexical semantic models we adopt for solving the semantic matching problem in answer selection.

Learning QA Matching Models

In this section, we investigate the effectiveness of various learning models for matching questions and sentences, including the bag-of-words setting

Experiments

We present our experimental results in this section by first introducing the data and evaluation metrics, followed by the results of existing systems and some baseline methods.

Conclusions

In this paper, we present an experimental study on solving the answer selection problem using enhanced lexical semantic models.

Topics

lexical semantic

Appears in 27 sentences as: lexical semantic (17) Lexical Semantics (2) lexical semantics (8)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources.
    Page 1, “Abstract”
  2. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms.
    Page 1, “Abstract”
  3. nent, lexical semantics .
    Page 2, “Introduction”
  4. We formulate answer selection as a semantic matching problem with a latent word-alignment structure as in (Chang et al., 2010) and conduct a series of experimental studies on leveraging recently proposed lexical semantic models.
    Page 2, “Introduction”
  5. First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings.
    Page 2, “Introduction”
  6. Second, while the latent alignment model performs better than unstructured models, the difference diminishes after adding the enhanced lexical semantics information.
    Page 2, “Introduction”
  7. The enhanced lexical semantic models and the learning frameworks we explore are presented in Sec.
    Page 2, “Introduction”
  8. Although lexical semantic information derived from WordNet has been used in some of these approaches, the research has mainly focused on modeling the mapping between the syntactic structures of questions and sentences, produced from syntactic analysis.
    Page 2, “Related Work”
  9. The potential improvement from enhanced lexical semantic models seems to have been deliberately overlooked.1
    Page 2, “Related Work”
  10. 1For example, Heilman and Smith (2010) emphasized that “The tree edit model, which does not use lexical semantics knowledge, produced the best result reported to date.”
    Page 2, “Problem Definition”
  11. In this work, we focus our study on leveraging the low-level semantic cues from recently proposed lexical semantic models.
    Page 3, “Problem Definition”

See all papers in Proc. ACL 2013 that mention lexical semantic.

See all papers in Proc. ACL that mention lexical semantic.

Back to top.

WordNet

Appears in 11 sentences as: WordNet (12)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Although lexical semantic information derived from WordNet has been used in some of these approaches, the research has mainly focused on modeling the mapping between the syntactic structures of questions and sentences, produced from syntactic analysis.
    Page 2, “Related Work”
  2. Although sets of synonyms can be easily found in thesauri or WordNet synsets, such resources typically cover only strict synonyms.
    Page 3, “Lexical Semantic Models”
  3. Traditionally, WordNet taxonomy is the linguistic resource for identifying hypernyms and hy-ponyms, applied broadly to many NLP problems.
    Page 4, “Lexical Semantic Models”
  4. However, WordNet has a number of well-known limitations including its rather limited or skewed concept distribution and the lack of the coverage of the IsA relation (Song et al., 2011).
    Page 4, “Lexical Semantic Models”
  5. As a result, relations such as “Apple” isa “company” and “Jaguar” isa “car” cannot be found in WordNet .
    Page 4, “Lexical Semantic Models”
  6. Similar to the case in synonymy, the IsA relation defined in WordNet does not provide a native, real-valued degree of the relation, which can only be roughly approximated using the number of links on the taxonomy path connecting two
    Page 4, “Lexical Semantic Models”
  7. In order to remedy these issues, we augment WordNet with the IsA relations found in Probase (Wu et al., 2012).
    Page 4, “Lexical Semantic Models”
  8. All these systems incorporated lexical semantics features derived from WordNet and named entity features.
    Page 7, “Experiments”
  9. Features used in the experiments can be categorized into six types: identical word matching (I), lemma matching (L), WordNet (WN), enhanced Lexical Semantics (LS), Named Entity matching (NE) and Answer type checking (Ans).
    Page 7, “Experiments”
  10. Arguably the most common source of word relations, WordNet (WN) provides the primitive features of whether two words could belong to the same synset in WordNet , could be antonyms and whether one is a hypernym of the other.
    Page 7, “Experiments”
  11. First, while incorporating more information of the word pairs in general helps, it is clear that mapping words beyond surface-form matching with the help of WordNet (Line #3 vs. #2) is important.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

bag-of-words

Appears in 10 sentences as: Bag-of-Words (1) bag-of-words (9)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Due to the variety of word choices and inherent ambiguities in natural languages, bag-of-words approaches with simple surface-form word matching tend to produce brittle results with poor prediction accuracy (Bilotti et al., 2007).
    Page 1, “Introduction”
  2. Observing the limitations of the bag-of-words models, Wang et al.
    Page 2, “Related Work”
  3. For instance, if we assume a naive complete bipartite matching, then effectively it reduces to the simple bag-of-words model.
    Page 3, “Problem Definition”
  4. In this section, we investigate the effectiveness of various learning models for matching questions and sentences, including the bag-of-words setting
    Page 5, “Learning QA Matching Models”
  5. 5.1 Bag-of-Words Model
    Page 5, “Learning QA Matching Models”
  6. The bag-of-words model treats each question and sentence as an unstructured bag of words.
    Page 5, “Learning QA Matching Models”
  7. The bag-of-words model does not require an additional inference stage as in structured learning, which may be computationally expensive.
    Page 5, “Learning QA Matching Models”
  8. One obvious issue of the bag-of-words model is that words in the unrelated part of the sentence may still be paired with words in the question, which introduces noise to the final feature vector.
    Page 5, “Learning QA Matching Models”
  9. For the unstructured, bag-of-words setting, we tested logistic regression (LR) and boosted decision trees (BDT).
    Page 7, “Experiments”
  10. Following the word-alignment paradigm, we find that the rich lexical semantic information improves the models consistently in the unstructured bag-of-words setting and also in the framework of learning latent structures.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2013 that mention bag-of-words.

See all papers in Proc. ACL that mention bag-of-words.

Back to top.

question answering

Appears in 9 sentences as: Question Answering (1) question answering (8) question answering” (1)
In Question Answering Using Enhanced Lexical Semantic Models
  1. In this paper, we study the answer sentence selection problem for question answering .
    Page 1, “Abstract”
  2. Open-domain question answering (QA), which fulfills a user’s information need by outputting direct answers to natural language queries, is a challenging but important problem (Etzioni, 2011).
    Page 1, “Introduction”
  3. While the task of question answering has a long history dated back to the dawn of artificial intelligence, early systems like STUDENT (Winograd, 1977) and LUNAR (Woods, 1973) are typically designed to demonstrate natural language understanding for a small and specific domain.
    Page 2, “Related Work”
  4. The Text REtrieval Conference (TREC) Question Answering Track was arguably the first large-scale evaluation of open-domain question answering (Voorhees and Tice, 2000).
    Page 2, “Related Work”
  5. quiz show provides another open-domain question answering setting, in which IBM’s Watson system famously beat the two highest ranked players (Ferrucci, 2012).
    Page 2, “Related Work”
  6. Although we have demonstrated the benefits of leveraging various lexical semantic models to help find the association between words, the problem of question answering is nevertheless far from solved using the word-based approach.
    Page 8, “Experiments”
  7. It is hard to believe that a pure word-matching model would be able to solve this type of “inferential question answering” problem.
    Page 8, “Experiments”
  8. First, although we focus on improving TREC-style open-domain question answering in this work, we would like to apply the proposed technology to other QA scenarios, such as community-based QA (CQA).
    Page 9, “Conclusions”
  9. Finally, we would like to improve our system for the answer sentence selection task and for question answering in general.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention question answering.

See all papers in Proc. ACL that mention question answering.

Back to top.

named entity

Appears in 7 sentences as: named entities (1) Named Entity (1) Named entity (1) named entity (5)
In Question Answering Using Enhanced Lexical Semantic Models
  1. For instance, when a word refers to a named entity , the particular sense and meaning is often not encoded.
    Page 4, “Lexical Semantic Models”
  2. All these systems incorporated lexical semantics features derived from WordNet and named entity features.
    Page 7, “Experiments”
  3. Features used in the experiments can be categorized into six types: identical word matching (I), lemma matching (L), WordNet (WN), enhanced Lexical Semantics (LS), Named Entity matching (NE) and Answer type checking (Ans).
    Page 7, “Experiments”
  4. Named entity matching (NE) checks whether two words are individually part of some named entities with the same type.
    Page 7, “Experiments”
  5. Finally, when the question word is the WH-word, we check if the paired word belongs to some phrase that has the correct answer type using simple rules, such as “Who should link to a word that is part of a named entity of type Person.” We created examples in each round of experiments by augmenting these features in the same order, and observed how adding different information helped improve the model performance.
    Page 7, “Experiments”
  6. However, adding more information like named entity matching and answer type verification does not seem to help much (Line #5 vs. #4).
    Page 8, “Experiments”
  7. While the first two can be improved by, say, using a better named entity tagger, incorporating other knowledge bases and building a question classifier, how to solve the third problem is tricky.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

word pairs

Appears in 7 sentences as: word pair (2) word pairs (5)
In Question Answering Using Enhanced Lexical Semantic Models
  1. It then aggregates features extracted from each of these word pairs to represent the whole questiorflsentence pair.
    Page 5, “Learning QA Matching Models”
  2. Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector.
    Page 5, “Learning QA Matching Models”
  3. In other words, the IDF values help decide the importance of word pairs to the model.
    Page 7, “Experiments”
  4. 4 to the word pair and use their estimated degree of synonymy, antonymy, hyponymy and semantic relatedness as features.
    Page 7, “Experiments”
  5. 5, the features for the whole questiorflsentence pair are the average and max of features of all the word pairs .
    Page 7, “Experiments”
  6. First, while incorporating more information of the word pairs in general helps, it is clear that mapping words beyond surface-form matching with the help of WordNet (Line #3 vs. #2) is important.
    Page 8, “Experiments”
  7. Second, while the structured-output model usually performs better than both unstructured models (LCLR vs. LR & BDT), the performance gain diminishes after more information of word pairs is available (e.g., Lines #4 and #5).
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

dependency trees

Appears in 6 sentences as: dependency tree (2) dependency trees (4)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources.
    Page 1, “Abstract”
  2. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
    Page 1, “Abstract”
  3. (2007) proposed a syntax-driven approach, where each pair of question and sentence are matched by their dependency trees .
    Page 2, “Related Work”
  4. Heilman and Smith (2010) proposed a discriminative approach that first computes a tree kernel function between the dependency trees of the question and candidate sentence, and then learns a classifier based on the tree-edit features extracted.
    Page 2, “Related Work”
  5. Typically, the “ideal” alignment structure is not available in the data, and previous work exploited mostly syntactic analysis (e.g., dependency trees ) to reveal the latent mapping structure.
    Page 3, “Problem Definition”
  6. Smith (2010) proposed a discriminative approach that first computes a tree kernel function between the dependency trees of the question and candidate sentence, and then learns a classifier based on the tree-edit features extracted.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

learning algorithms

Appears in 5 sentences as: learning algorithm (2) learning algorithms (3)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms .
    Page 1, “Abstract”
  2. First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings.
    Page 2, “Introduction”
  3. A binary classifier can be trained easily using any machine learning algorithm in this standard supervised learning setting.
    Page 5, “Learning QA Matching Models”
  4. We tested two learning algorithms in this setting: logistic regression and boosted decision trees (Friedman, 2001).
    Page 5, “Learning QA Matching Models”
  5. The former is the log-linear model widely used in the NLP community and the latter is a robust nonlinear learning algorithm that has shown great empirical performance.
    Page 5, “Learning QA Matching Models”

See all papers in Proc. ACL 2013 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

alignment model

Appears in 4 sentences as: alignment model (4)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Compared to the previous work, our latent alignment model improves the result on a benchmark dataset by a wide margin — the mean average precision (MAP) and mean reciprocal rank (MRR) scores are increased by 25.6% and 18.8%, respectively.
    Page 2, “Introduction”
  2. Second, while the latent alignment model performs better than unstructured models, the difference diminishes after adding the enhanced lexical semantics information.
    Page 2, “Introduction”
  3. This may suggest that compared to introducing complex structured constraints, incorporating shallow semantic information is both more effective and computationally inexpensive in improving the performance, at least for the specific word alignment model tested in this work.
    Page 2, “Introduction”
  4. This may suggest that adding shallow semantic information is more effective than introducing complex structured constraints, at least for the specific word alignment model we experimented with in this work.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2013 that mention alignment model.

See all papers in Proc. ACL that mention alignment model.

Back to top.

feature vector

Appears in 4 sentences as: feature vector (3) feature vectors (1)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector .
    Page 5, “Learning QA Matching Models”
  2. We consider two aggregate functions for defining the feature vectors of the whole ques-tiorflanswer pair: average and max.
    Page 5, “Learning QA Matching Models”
  3. (Dmaaj (Q7 3) : £11252 ij (wqa we) (2) wsEVs Together, each questiorflsentence pair is represented by a 2d-dimensional feature vector .
    Page 5, “Learning QA Matching Models”
  4. One obvious issue of the bag-of-words model is that words in the unrelated part of the sentence may still be paired with words in the question, which introduces noise to the final feature vector .
    Page 5, “Learning QA Matching Models”

See all papers in Proc. ACL 2013 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

knowledge bases

Appears in 3 sentences as: knowledge base (1) knowledge bases (2)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Probase is a knowledge base that establishes connections between 2.7 million concepts, discovered automatically by applying Hearst patterns (Hearst, 1992) to 1.68 billion Web pages.
    Page 4, “Lexical Semantic Models”
  2. Its abundant concept coverage distinguishes it from other knowledge bases , such as Freebase (Bollacker et al., 2008) and WikiTaxon-omy (Ponzetto and Strube, 2007).
    Page 4, “Lexical Semantic Models”
  3. While the first two can be improved by, say, using a better named entity tagger, incorporating other knowledge bases and building a question classifier, how to solve the third problem is tricky.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention knowledge bases.

See all papers in Proc. ACL that mention knowledge bases.

Back to top.

natural language

Appears in 3 sentences as: natural language (2) natural languages (1)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Open-domain question answering (QA), which fulfills a user’s information need by outputting direct answers to natural language queries, is a challenging but important problem (Etzioni, 2011).
    Page 1, “Introduction”
  2. Due to the variety of word choices and inherent ambiguities in natural languages , bag-of-words approaches with simple surface-form word matching tend to produce brittle results with poor prediction accuracy (Bilotti et al., 2007).
    Page 1, “Introduction”
  3. While the task of question answering has a long history dated back to the dawn of artificial intelligence, early systems like STUDENT (Winograd, 1977) and LUNAR (Woods, 1973) are typically designed to demonstrate natural language understanding for a small and specific domain.
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

vector space

Appears in 3 sentences as: vector space (3)
In Question Answering Using Enhanced Lexical Semantic Models
  1. Among various word similarity models (Agirre et al., 2009; Reisinger and Mooney, 2010; Gabrilovich and Markovitch, 2007; Radinsky et al., 2011), the vector space models (VSMs) based on the idea of distributional similarity (Turney and Pantel, 2010) are often used as the core component.
    Page 4, “Lexical Semantic Models”
  2. Inspired by (Yih and Qazvinian, 2012), which argues the importance of incorporating heterogeneous vector space models for measuring word similarity, we leverage three different VSMs in this work: Wiki term-vectors, recurrent neural
    Page 4, “Lexical Semantic Models”
  3. network language model (RNNLM) and a concept vector space model learned from click-through data.
    Page 5, “Lexical Semantic Models”

See all papers in Proc. ACL 2013 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

word alignment

Appears in 3 sentences as: word alignment (3)
In Question Answering Using Enhanced Lexical Semantic Models
  1. This may suggest that compared to introducing complex structured constraints, incorporating shallow semantic information is both more effective and computationally inexpensive in improving the performance, at least for the specific word alignment model tested in this work.
    Page 2, “Introduction”
  2. As a result, an “ideal” word alignment structure should not link words in this clause to those in the question.
    Page 6, “Learning QA Matching Models”
  3. This may suggest that adding shallow semantic information is more effective than introducing complex structured constraints, at least for the specific word alignment model we experimented with in this work.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2013 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.