Global Learning of Typed Entailment Rules
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

Article Structure

Abstract

Extensive knowledge bases of entailment rules between predicates are crucial for applied semantic inference.

Introduction

Generic approaches for applied semantic inference from text gained growing attention in recent years, particularly under the Textual Entailment (TE) framework (Dagan et al., 2009).

Background

Most work on learning entailment rules between predicates considered each rule independently of others, using two sources of information: lexicographic resources and distributional similarity.

Typed Entailment Graphs

Given a set of typed predicates, entailment rules can only exist between predicates that share the same (unordered) pair of types (such as place and country)3.

Learning Typed Entailment Graphs

Our learning algorithm is composed of two steps: (1) Given a set of typed predicates and their instances extracted from a corpus, we train a (local) entailment classifier that estimates for every pair of predicates whether one entails the other.

Experimental Evaluation

In this section we empirically answer the following questions: (1) Does transitivity improve rule learning over typed predicates?

Conclusions and Future Work

This paper proposes two contributions over two recent works: In the first, Berant et al.

Topics

ILP

Appears in 31 sentences as: (1) ILP (32) ILP: (1) ILP‘ (2)
In Global Learning of Typed Entailment Rules
  1. (2010) proposed a global graph optimization procedure that uses Integer Linear Programming ( ILP ) to find the best set of entailment rules under a transitivity constraint.
    Page 1, “Introduction”
  2. The second challenge is scalability: ILP solvers do not scale well since ILP is an NP-complete problem.
    Page 1, “Introduction”
  3. Their method employs a local learning approach, while the number of predicates in their data is too large to be handled directly by an ILP solver.
    Page 2, “Introduction”
  4. We suggest scaling techniques that allow to optimally learn such graphs over a large set of typed predicates by first decomposing nodes into components and then applying incremental ILP (Riedel and Clarke, 2006).
    Page 2, “Introduction”
  5. Last, we formulate our optimization problem as an Integer Linear Program ( ILP ).
    Page 3, “Background”
  6. ILP is an optimization problem where a linear objective function over a set of integer variables is maximized under a set of linear constraints.
    Page 3, “Background”
  7. Scaling ILP is challenging since it is an NP-complete problem.
    Page 3, “Background”
  8. ILP has been extensively used in NLP lately (Clarke and Lapata, 2008; Martins et al., 2009; Do and Roth, 2010).
    Page 3, “Background”
  9. Section 4.2 gives an ILP formulation for the optimization problem.
    Page 4, “Learning Typed Entailment Graphs”
  10. 4.2 ILP formulation
    Page 4, “Learning Typed Entailment Graphs”
  11. Thus, employing ILP is an appealing approach for obtaining an optimal solution.
    Page 4, “Learning Typed Entailment Graphs”

See all papers in Proc. ACL 2011 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

distributional similarity

Appears in 7 sentences as: Distributional similarity (2) distributional similarity (5)
In Global Learning of Typed Entailment Rules
  1. Most work on learning entailment rules between predicates considered each rule independently of others, using two sources of information: lexicographic resources and distributional similarity .
    Page 2, “Background”
  2. Distributional similarity algorithms use large corpora to learn broader resources by assuming that semantically similar predicates appear with similar arguments.
    Page 2, “Background”
  3. Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
    Page 2, “Background”
  4. (2010) recently used distributional similarity to learn rules between typed predicates, where the left-hand-side of the rule may contain more than a single predicate (horn clauses).
    Page 2, “Background”
  5. We compute 11 distributional similarity scores for each pair of predicates based on the arguments appearing in the extracted arguments.
    Page 4, “Learning Typed Entailment Graphs”
  6. Second, to distributional similarity algorithms: (a) SR: the score used by Schoenmackers et al.
    Page 7, “Experimental Evaluation”
  7. Third, we compared to the entailment classifier with no transitivity constraints (clsf) to see if combining distributional similarity scores improves performance over single measures.
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2011 that mention distributional similarity.

See all papers in Proc. ACL that mention distributional similarity.

Back to top.

WordNet

Appears in 6 sentences as: WordNet (6)
In Global Learning of Typed Entailment Rules
  1. A widely-used resource is WordNet (Fellbaum, 1998), where relations such as synonymy and hyponymy can be used to generate rules.
    Page 2, “Background”
  2. Given a lexicographic resource ( WordNet ) and a set of predicates with their instances, we perform the following three steps (see Table 1):
    Page 4, “Learning Typed Entailment Graphs”
  3. 1) Training set generation We use WordNet to generate positive and negative examples, where each example is a pair of predicates.
    Page 4, “Learning Typed Entailment Graphs”
  4. For every predicate p(t1, t2) 6 P such that p is a single word, we extract from WordNet the set S of synonyms and direct hy-pernyms of p. For every p’ E S, if p’ (t1, t2) 6 P then p(t1, 752) —> p’ (t1, 752) is taken as a positive example.
    Page 4, “Learning Typed Entailment Graphs”
  5. Negative examples are generated in a similar manner, with direct co-hyponyms of p (sister nodes in WordNet ) and hyponyms at distance 2 instead of synonyms and direct hypemyms.
    Page 4, “Learning Typed Entailment Graphs”
  6. single pair of words that are WordNet antonyms (2) Predicates differing by a single word of negation (3) Predicates p(t1, t2) and p(t2, 751) where p is a transitive verb (e.g., beat) in VerbNet (Kipper-Schuler et al., 2000).
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2011 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

optimization problem

Appears in 4 sentences as: optimization problem (4)
In Global Learning of Typed Entailment Rules
  1. Last, we formulate our optimization problem as an Integer Linear Program (ILP).
    Page 3, “Background”
  2. ILP is an optimization problem where a linear objective function over a set of integer variables is maximized under a set of linear constraints.
    Page 3, “Background”
  3. Section 4.2 gives an ILP formulation for the optimization problem .
    Page 4, “Learning Typed Entailment Graphs”
  4. The edges returned by the solver provide an optimal (not approximate) solution to the optimization problem .
    Page 6, “Learning Typed Entailment Graphs”

See all papers in Proc. ACL 2011 that mention optimization problem.

See all papers in Proc. ACL that mention optimization problem.

Back to top.

feature vector

Appears in 3 sentences as: feature vector (3)
In Global Learning of Typed Entailment Rules
  1. Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
    Page 2, “Background”
  2. 2) Feature representation Each example pair of predicates (191,192) is represented by a feature vector , where each feature is a specific distributional
    Page 4, “Learning Typed Entailment Graphs”
  3. We want to use Pm, to derive the posterior P(G|F), where F = Uu¢vFuv and Fm, is the feature vector for a node pair (u, 2)).
    Page 5, “Learning Typed Entailment Graphs”

See all papers in Proc. ACL 2011 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

learning algorithm

Appears in 3 sentences as: learning algorithm (3)
In Global Learning of Typed Entailment Rules
  1. Our learning algorithm is composed of two steps: (1) Given a set of typed predicates and their instances extracted from a corpus, we train a (local) entailment classifier that estimates for every pair of predicates whether one entails the other.
    Page 3, “Learning Typed Entailment Graphs”
  2. (b) DIRT: (Lin and Pantel, 2001) a widely-used rule learning algorithm .
    Page 7, “Experimental Evaluation”
  3. (c) BInc: (Szpektor and Dagan, 2008) a directional rule learning algorithm .
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2011 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.

similarity scores

Appears in 3 sentences as: similarity score (1) similarity scores (2)
In Global Learning of Typed Entailment Rules
  1. similarity score estimating whether p1 entails p2.
    Page 4, “Learning Typed Entailment Graphs”
  2. We compute 11 distributional similarity scores for each pair of predicates based on the arguments appearing in the extracted arguments.
    Page 4, “Learning Typed Entailment Graphs”
  3. Third, we compared to the entailment classifier with no transitivity constraints (clsf) to see if combining distributional similarity scores improves performance over single measures.
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2011 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.