A Graph-based Semi-Supervised Learning for Question-Answering
Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng

Article Structure

Abstract

We present a graph-based semi-supervised learning for the question-answering (QA) task for ranking candidate sentences.

Introduction

Open domain natural language question answering (QA) is a process of automatically finding answers to questions searching collections of text files.

Feature Extraction for Entailment

Implementation of different TE models has previously shown to improve the QA task using supervised learning methods (Harabagiu and Hickl, 2006).

Graph Based Semi-Supervised Learning for Entailment Ranking

We formulate semi-supervised entailment rank scores as follows.

Graph Summarization

Research on graph-based SSL algorithms point out their effectiveness on real applications, e.g., (Zhu et al., 2003), (Zhou and Scholkopf, 2004), (Sindhwani et al., 2007).

Experiments

We demonstrate the results from three sets of experiments to explore how our graph representation, which encodes textual entailment information, can be used to improve the performance of the QA systems.

Conclusions and Discussions

In this paper, we applied a graph-based SSL algorithm to improve the performance of QA task by exploiting unlabeled entailment relations between affirmed question and candidate sentence pairs.

Topics

unlabeled data

Appears in 14 sentences as: unlabeled data (14)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA.
    Page 1, “Abstract”
  2. We create a graph for labeled and unlabeled data using match-scores of textual entailment features as similarity weights between data points.
    Page 1, “Abstract”
  3. Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
    Page 1, “Introduction”
  4. We consider situations where there are much more unlabeled data , X U, than labeled data, X L, i.e., 711; << 711].
    Page 2, “Introduction”
  5. Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
    Page 6, “Graph Summarization”
  6. Since we do not know estimated local density constraints of unlabeled data points, we use constants to construct local density constraint column vector for X’ dataset as follows:
    Page 6, “Graph Summarization”
  7. We show that as we increase the number of unlabeled data , with our graph-summarization, it is feasible to extract information that can improve the performance of QA models.
    Page 6, “Experiments”
  8. In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data.
    Page 7, “Experiments”
  9. Table 3: The effect of number of unlabeled data on MRR from Hybrid graph Summarization SSL.
    Page 8, “Experiments”
  10. Although SSL methods are capable of exploiting information from unlabeled data , learning becomes infeasible as the number of data points gets very large.
    Page 8, “Experiments”
  11. We gradually increase the number of unlabeled data samples as shown in Table 3 to demonstrate the effects on the performance of testing dataset.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

labeled data

Appears in 13 sentences as: labeled data (13)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data , we show improvement in generalization performance over state-of-the-art QA models.
    Page 1, “Abstract”
  2. One of the challenges we face with is that we have very limited amount of labeled data , i.e., correctly labeled (true/false entailment) sentences.
    Page 1, “Introduction”
  3. We consider situations where there are much more unlabeled data, X U, than labeled data , X L, i.e., 711; << 711].
    Page 2, “Introduction”
  4. — application of a graph-summarization method to enable learning from a very large unlabeled and rather small labeled data , which would not have been feasible for most sophisticated learning tools in section 4.
    Page 2, “Introduction”
  5. The labeled data points, i.e., X L, are appended to each of these selected X 5 datasets, X5 = {mi,...mfn_l} U XL.
    Page 4, “Graph Summarization”
  6. The local density constraints become crucial for inference where summarized labeled data are used instead of overall dataset.
    Page 5, “Graph Summarization”
  7. As a result q number of summary datasets XS each of which with nb labeled data points are combined to form a representative sample of X, X = {X8 }:=1 reducing the number of data from n to a much smaller number of data, p = q * nb << n. So the new summary of the X can be represented with X = {Xi}§=1.
    Page 6, “Graph Summarization”
  8. Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
    Page 6, “Graph Summarization”
  9. In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data .
    Page 7, “Experiments”
  10. Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM.
    Page 7, “Experiments”
  11. Especially in Hybrid graph-Summary SSL, Hybrid gSum SSL, when the number of labeled data is small (7133 < 25% >x< 7113) performance improvement is better than rest
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

SVM

Appears in 12 sentences as: SVM (16)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. The QC model is trained via support vector machines ( SVM ) (Vapnik, 1995) considering different features such as semantic headword feature based on variation of Collins rules, hypernym extraction via Lesk word disambiguation (Lesk, 1988), regular expressions for wh-word indicators, n-grams, word-shapes(capitals), etc.
    Page 2, “Feature Extraction for Entailment”
  2. Using a separate learner, e.g., SVM (Vapnik, 1995), we obtain predicted outputs, Y5 = (33f, ..., 39214) of X 5 and append observed labels Y5 = Y5 U YL.
    Page 4, “Graph Summarization”
  3. Features Model MRR Topl Tops Baseline — 42.3% 32.7% 54.5% QTCF SVM 51 .9% 44.6% 63.4% SSL 49.5% 43.1% 60.9% LexSem SVM 48.2% 40.6% 61.4% SSL 47.9% 40.1% 58.4% QComp SVM 54.2% 47.5% 64.3% SSL 51.9% 45.5% 62.4%
    Page 7, “Experiments”
  4. We performed manual iterative parameter optimization during training based on prediction accuracy to find the best k—nearest parameter for SSL, i.e., k = {3,5,10,20,50} , and best 0 = {10—2,..,102} and 7 2 {2‘2, ..,23} for RBF kernel SVM .
    Page 7, “Experiments”
  5. We applied SVM and our graph based SSL method with no summarization to learn models using labeled training and testing datasets.
    Page 7, “Experiments”
  6. Table 2 reports the MRR performance of QA system on testing dataset using SVM and our graph-summary SSL (gSum SSL) method using the similarity function in (1).
    Page 7, “Experiments”
  7. To build SVM models in the same way, we separated the training dataset into two based on copula and non-copula questions, X Op, X nap and rerun the SVM method separately.
    Page 7, “Experiments”
  8. The predicted scores are combined to measure overall performance of Hybrid SVM models.
    Page 7, “Experiments”
  9. Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM .
    Page 7, “Experiments”
  10. As the percentage of labeled points in training data increase, the SVM performance increases, however graph summary SSL is still comparable with SVM .
    Page 7, “Experiments”
  11. As more labeled data is introduced, Hybrid SVM models’ performance increase drastically, even outperforming the state-of-the art MRR performance on TREC04 datasets presented in (Shen and Klakow, 2006) i.e., MRR=67.0%, Topl=62.0%, Top5=74.0%.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

graph-based

Appears in 11 sentences as: graph-based (11)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. We present a graph-based semi-supervised learning for the question-answering (QA) task for ranking candidate sentences.
    Page 1, “Abstract”
  2. With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data, we show improvement in generalization performance over state-of-the-art QA models.
    Page 1, “Abstract”
  3. Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
    Page 1, “Introduction”
  4. We construct a textual entailment (TE) module by extracting features from each paired question and answer sentence and designing a classifier with a novel yet feasible graph-based SSL method.
    Page 2, “Introduction”
  5. In general graph-based SSL, a function over the graph is estimated such that it satisfies two conditions: 1) close to the observed labels , and 2) be smooth on the whole graph by:
    Page 4, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  6. Most graph-based SSLs are transductive, i.e., not easily expendable to new test points outside L U U.
    Page 4, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  7. Research on graph-based SSL algorithms point out their effectiveness on real applications, e.g., (Zhu et al., 2003), (Zhou and Scholkopf, 2004), (Sindhwani et al., 2007).
    Page 4, “Graph Summarization”
  8. Using graph-based SSL method on the new representative dataset, X’ = X U XTe, which is comprised of summarized dataset, X = {Xifizy as labeled data points, and the testing dataset, XTe as unlabeled data points.
    Page 6, “Graph Summarization”
  9. We evaluated the performance of graph-based QA system using a set of 202 questions from the TREC04 as testing dataset (Voorhees, 2003), (Prager et al., 2000).
    Page 6, “Experiments”
  10. In this paper, we applied a graph-based SSL algorithm to improve the performance of QA task by exploiting unlabeled entailment relations between affirmed question and candidate sentence pairs.
    Page 8, “Conclusions and Discussions”
  11. We demonstrated that summarization on graph-based SSL can improve the QA task performance when more unlabeled data is used to learn the classifier model.
    Page 8, “Conclusions and Discussions”

See all papers in Proc. ACL 2009 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

NER

Appears in 7 sentences as: NER (10)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. Named-Entity Recognizer ( NER ): This component identifies and classifies basic entities such as proper names of person, organization, product, location; time and numerical expressions such as year, day, month; various measurements such as weight, money, percentage; contact information like address, webpage, phone-number, etc.
    Page 2, “Feature Extraction for Entailment”
  2. The NER module is based on a combination of user defined rules based on Lesk word disambiguation (Lesk, 1988), WordNet (Miller, 1995) lookups, and many user-defined dictionary lookups, e.g.
    Page 2, “Feature Extraction for Entailment”
  3. During the NER extraction, we also employ phrase analysis based on our phrase utility extraction method using Standford dependency parser ((Klein and Manning, 2003)).
    Page 2, “Feature Extraction for Entailment”
  4. We can categorize entities up to 6 coarse and 50 fine categories to match them with the NER types from QC module.
    Page 2, “Feature Extraction for Entailment”
  5. (1) (QTCF) Question-Type-Candidate Sentence NER match feature: Takes on the value ’1’ when the candidate sentence contains the fine NER of the question-type, ’0.5’ if it contains the coarse NER or ’0’ if no NER match is found.
    Page 3, “Feature Extraction for Entailment”
  6. line represents a converted question, in order to extract the question-type feature, we use a matching NER-type between the headline and candidate sentence to set question-type NER match feature.
    Page 7, “Experiments”
  7. From section 2.2, QTCF represents question-type NER match feature, LexSem is the bundle of lexico-semantic features and QComp is the matching features of subject, head, object, and three complements.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

edge weights

Appears in 6 sentences as: edge weights (6)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. So similarity between two q/a pairs 50,-, 533-, is represented with wij E 32””, i.e., edge weights , and is measured as:
    Page 4, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  2. As total entailment scores get closer, the larger their edge weights would be.
    Page 4, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  3. Thus, we modify edge weights in (1) as follows:
    Page 4, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  4. Our idea of summarization is to create representative vertices of data points that are very close to each other in terms of edge weights .
    Page 4, “Graph Summarization”
  5. We identify the edge weights wfj between each node in the boundary Bf via (1), thus the boundary is connected.
    Page 5, “Graph Summarization”
  6. If any testing vector has an edge between a labeled vector, then with the usage of the local density constraints, the edge weights will not not only be affected by that labeled node, but also how dense that node is within that part of the graph.
    Page 6, “Graph Summarization”

See all papers in Proc. ACL 2009 that mention edge weights.

See all papers in Proc. ACL that mention edge weights.

Back to top.

sentence pairs

Appears in 6 sentences as: sentence pair (1) sentence pairs (5)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. (2) (QComp) Question component match features: The sentence component analysis is applied on both the affirmed question and the candidate sentence pairs to characterize their semantic components including subject(S), object(O), head (H) and modifiers(M).
    Page 3, “Feature Extraction for Entailment”
  2. Let each data point in X = {531, ..., an}, xi E SW represents information about a question and candidate sentence pair and Y = {3/1, be their output labels.
    Page 3, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  3. We also used a set of 340 QA-type sentence pairs from RTE02-03 and 195 pairs from RTE04 by converting the hypothesis sentences into question form to create additional set of q/a pairs.
    Page 6, “Experiments”
  4. Each of these headline-candidate sentence pairs is used as additional unlabeled q/a pair.
    Page 6, “Experiments”
  5. This is due to the fact that we establish two seperate entailment models for copula and non-copula q/a sentence pairs that enables extracting useful information and better representation of the specific data.
    Page 8, “Experiments”
  6. In this paper, we applied a graph-based SSL algorithm to improve the performance of QA task by exploiting unlabeled entailment relations between affirmed question and candidate sentence pairs .
    Page 8, “Conclusions and Discussions”

See all papers in Proc. ACL 2009 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

semi-supervised

Appears in 5 sentences as: semi-supervised (5)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. We present a graph-based semi-supervised learning for the question-answering (QA) task for ranking candidate sentences.
    Page 1, “Abstract”
  2. We implement a semi-supervised learning (SSL) approach to demonstrate that utilization of more unlabeled data points can improve the answer-ranking task of QA.
    Page 1, “Abstract”
  3. Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
    Page 1, “Introduction”
  4. We formulate semi-supervised entailment rank scores as follows.
    Page 3, “Graph Based Semi-Supervised Learning for Entailment Ranking”
  5. (2) We will use other distance measures to better explain entailment between q/a pairs and compare with other semi-supervised and transductive approaches.
    Page 8, “Conclusions and Discussions”

See all papers in Proc. ACL 2009 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

natural language

Appears in 3 sentences as: natural language (3)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. Using textual entailment analysis, we obtain entailment scores between a natural language question posed by the user and the candidate sentences returned from search engine.
    Page 1, “Abstract”
  2. Open domain natural language question answering (QA) is a process of automatically finding answers to questions searching collections of text files.
    Page 1, “Introduction”
  3. Recent research indicates that using labeled and unlabeled data in semi-supervised learning (SSL) environment, with an emphasis on graph-based methods, can improve the performance of information extraction from data for tasks such as question classification (Tri et al., 2006), web classification (Liu et al., 2006), relation extraction (Chen et al., 2006), passage-retrieval (Otterbacher et al., 2009), various natural language processing tasks such as part-of-speech tagging, and named-entity recognition (Suzuki and Isozaki, 2008), word-sense disam-
    Page 1, “Introduction”

See all papers in Proc. ACL 2009 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (2) WordNet’s (1)
In A Graph-based Semi-Supervised Learning for Question-Answering
  1. The NER module is based on a combination of user defined rules based on Lesk word disambiguation (Lesk, 1988), WordNet (Miller, 1995) lookups, and many user-defined dictionary lookups, e.g.
    Page 2, “Feature Extraction for Entailment”
  2. —When words don’t match we attempt matching synonyms in WordNet for most common senses.
    Page 3, “Feature Extraction for Entailment”
  3. —Verb match statistics using WordNet’s cause and entailment relations.
    Page 3, “Feature Extraction for Entailment”

See all papers in Proc. ACL 2009 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.