Information Extraction over Structured Data: Question Answering with Freebase
Yao, Xuchen and Van Durme, Benjamin

Article Structure

Abstract

Answering natural language questions using the Freebase knowledge base has recently been explored as a platform for advancing the state of the art in open domain semantic parsing.

Introduction

Question answering (QA) from a knowledge base (KB) has a long history within natural language processing, going back to the 1960s and 1970s, with systems such as Baseball (Green Jr et al., 1961) and Lunar (Woods, 1977).

Approach

We will view a KB as an interlinked collection of “topics”.

Background

QA from a KB faces two prominent challenges: model and data.

Graph Features

Our model is inspired by an intuition on how everyday people search for answers.

Relation Mapping

In this section we describe a “translation” table between Freebase relations and NL words was built.

Experiments

We evaluate the final F1 in this section.

Conclusion

We proposed an automatic method for Question Answering from structured data source (Freebase).

Topics

semantic parsing

Appears in 8 sentences as: semantic parsing (9)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Answering natural language questions using the Freebase knowledge base has recently been explored as a platform for advancing the state of the art in open domain semantic parsing .
    Page 1, “Abstract”
  2. The AI community has tended to approach this problem with a focus on first understanding the intent of the question, Via shallow or deep forms of semantic parsing (c.f.
    Page 1, “Introduction”
  3. bounded by the accuracy of the original semantic parsing , and the well-formedness of resultant database queries.1
    Page 1, “Introduction”
  4. Researchers in semantic parsing have recently explored QA over Freebase as a way of moving beyond closed domains such as GeoQuery (Tang and Mooney, 2001).
    Page 1, “Introduction”
  5. While making semantic parsing more robust is a laudable goal, here we provide a more rigorous IE baseline against which those efforts should be compared: we show that “traditional” IE methodology can significantly outperform prior state-of-the-art as reported in the semantic parsing literature, with a relative gain of 34% F1 as compared to Berant et al.
    Page 1, “Introduction”
  6. Finally, the KB community has developed other means for QA without semantic parsing (Lopez et al., 2005; Frank et al., 2007; Unger et al., 2012; Yahya et al., 2012; Shekarpour et al., 2013).
    Page 2, “Background”
  7. One question of interest is whether our system, aided by the massive web data, can be fairly compared to the semantic parsing approaches (note that Berant et al.
    Page 9, “Experiments”
  8. We hope that this result establishes a new baseline against which semantic parsing researchers can measure their progress towards deeper language understanding and answering of human questions.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention semantic parsing.

See all papers in Proc. ACL that mention semantic parsing.

Back to top.

natural language

Appears in 6 sentences as: natural language (6)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Answering natural language questions using the Freebase knowledge base has recently been explored as a platform for advancing the state of the art in open domain semantic parsing.
    Page 1, “Abstract”
  2. Question answering (QA) from a knowledge base (KB) has a long history within natural language processing, going back to the 1960s and 1970s, with systems such as Baseball (Green Jr et al., 1961) and Lunar (Woods, 1977).
    Page 1, “Introduction”
  3. These systems were limited to closed-domains due to a lack of knowledge resources, computing power, and ability to robustly understand natural language .
    Page 1, “Introduction”
  4. One challenge for natural language querying against a KB is the relative informality of queries as compared to the grammar of a KB.
    Page 2, “Approach”
  5. However, most Freebase relations are framed in a way that is not commonly addressed in natural language questions.
    Page 5, “Graph Features”
  6. To compensate for the problem of domain mismatch or overfitting, we exploited ClueWeb, mined mappings between KB relations and natural language text, and showed that it helped both relation prediction and answer extraction.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

meaning representations

Appears in 5 sentences as: meaning representation (2) meaning representations (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Those efforts map questions to sophisticated meaning representations that are then attempted to be matched against Viable answer candidates in the knowledge base.
    Page 1, “Abstract”
  2. Typically questions are converted into some meaning representation (e. g., the lambda calculus), then mapped to database queries.
    Page 1, “Introduction”
  3. The model challenge involves finding the best meaning representation for the question, converting it into a query and executing the query on the KB.
    Page 2, “Background”
  4. More recent research started to minimize this direct supervision by using latent meaning representations (Berant et
    Page 2, “Background”
  5. We instead attack the problem of QA from a KB from an IE perspective: we learn directly the pattern of QA pairs, represented by the dependency parse of questions and the Freebase structure of answer candidates, without the use of intermediate, general purpose meaning representations .
    Page 2, “Background”

See all papers in Proc. ACL 2014 that mention meaning representations.

See all papers in Proc. ACL that mention meaning representations.

Back to top.

named entity

Appears in 4 sentences as: named entities (2) named entity (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. We simply apply a named entity recognizer to find the question topic.
    Page 3, “Graph Features”
  2. (special case) if a qtopic node was tagged as a named entity, then replace this node with its its named entity form, e.g., bieber —> qtopic=person;
    Page 3, “Graph Features”
  3. For this purpose we used the Freebase Search API (Freebase, 2013a).A11 named entities 8 in a question were sent to this API, which returned a ranked list of relevant topics.
    Page 8, “Experiments”
  4. 8When no named entities are detected, we fall back to noun phrases.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

alignment model

Appears in 3 sentences as: alignment Model (1) alignment model (2)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Thus assuming there is an alignment model that is able to tell how likely one relation maps to the original question, we add extra alignment-based features for the incoming and outgoing relation of each node.
    Page 5, “Graph Features”
  2. We describe such an alignment model in § 5.
    Page 5, “Graph Features”
  3. Since the relations on one side of these pairs are not natural sentences, we ran the most simple IBM alignment Model 1 (Brown et al., 1993) to estimate the translation probability with GIZA++ (Och and Ney, 2003).
    Page 6, “Relation Mapping”

See all papers in Proc. ACL 2014 that mention alignment model.

See all papers in Proc. ACL that mention alignment model.

Back to top.

best result

Appears in 3 sentences as: Best result (1) best result (2)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Best result on
    Page 8, “Experiments”
  2. We experimented with various discriminative learners on DEV, including logistic regression, perceptron and SVM, and found L1 regularized logistic regression to give the best result .
    Page 8, “Experiments”
  3. The final F1 of 42.0% gives a relative improvement over previous best result (Berant et al., 2013) of 31.4% by one third.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention best result.

See all papers in Proc. ACL that mention best result.

Back to top.

co-occurrence

Appears in 3 sentences as: co-occurrence (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Treating the aligned pairs as observation, the co-occurrence matrix between aligning relations and words was computed.
    Page 6, “Relation Mapping”
  2. 5. lirom the co-occurrence matrix ~we computed P(w | R), P(R), P(w | 7“) and PO“).
    Page 6, “Relation Mapping”
  3. ReverbMapping does the same, except that we took a uniform distribution on 15(w | R) and 15(R) since the contributed dataset did not include co-occurrence counts to estimate these probabilities.7 Note that the median rank from CluewebMapping is only 12, indicating that half of all answer relations are ranked in the top 12.
    Page 7, “Relation Mapping”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

dependency relation

Appears in 3 sentences as: dependency relation (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. o the dependency relation nsubj (what, name) and prep_of(name, brother) indicates that the question seeks the information of a name;4
    Page 3, “Graph Features”
  2. 0 the dependency relation prep_of(name, brother) indicates that the name is about a brother (but we do not know whether it is a person name yet);
    Page 3, “Graph Features”
  3. 0 the dependency relation nn(br0ther, bieber) and the facts that, (i) Bieber is a person and (ii) a person’s brother should also be a person, indicate that the name is about a person.
    Page 3, “Graph Features”

See all papers in Proc. ACL 2014 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

gold standard

Appears in 3 sentences as: gold standard (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Thus we evaluated the ranking of retrieval with the gold standard annotation on TRAIN-ALL, shown in Table 3.
    Page 8, “Experiments”
  2. The top 2 results of the Search API contain gold standard topics for more than 90% of the questions and the top 10 results contain more than 95%.
    Page 8, “Experiments”
  3. % 86.4 91.5 93.5 94.6 95.4 Table 3: Evaluation on the Freebase Search API: how many questions’ top n retrieved results contain the gold standard topic.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

knowledge base

Appears in 3 sentences as: knowledge base (3)
In Information Extraction over Structured Data: Question Answering with Freebase
  1. Answering natural language questions using the Freebase knowledge base has recently been explored as a platform for advancing the state of the art in open domain semantic parsing.
    Page 1, “Abstract”
  2. Those efforts map questions to sophisticated meaning representations that are then attempted to be matched against Viable answer candidates in the knowledge base .
    Page 1, “Abstract”
  3. Question answering (QA) from a knowledge base (KB) has a long history within natural language processing, going back to the 1960s and 1970s, with systems such as Baseball (Green Jr et al., 1961) and Lunar (Woods, 1977).
    Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.