Probabilistic Soft Logic for Semantic Textual Similarity
Beltagy, Islam and Erk, Katrin and Mooney, Raymond

Article Structure

Abstract

Probabilistic Soft Logic (PSL) is a recently developed framework for probabilistic logic.

Introduction

When will people say that two sentences are similar?

Background

2.1 Logical Semantics

PSL for STS

For several reasons, we believe PSL is a more appropriate probabilistic logic for STS than MLNs.

Evaluation

This section evaluates the performance of PSL on the STS task.

Future Work

As mentioned in Section 3.2, it would be good to use a weighted average to compute the truth

Conclusion

This paper has presented an approach that uses Probabilistic Soft Logic (PSL) to determine Semantic Textual Similarity (STS).

Topics

logical forms

Appears in 6 sentences as: logical form (2) logical forms (4)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. The MLN constructed to determine the probability of a given entailment includes the logical forms for both T and H as well as soft inference rules that are constructed from distributional information.
    Page 4, “Background”
  2. To determine an entailment probability, first, the two sentences are mapped to logical representations using Boxer (B08, 2008), a tool for wide-coverage semantic analysis that maps a CCG (Combinatory Categorial Grammar) parse into a lexically-based logical form .
    Page 4, “Background”
  3. First, it is explicitly designed to support efficient inference, therefore it scales better to longer sentences with more complex logical forms .
    Page 4, “PSL for STS”
  4. Given the logical forms for a pair of sentences, a text T and a hypothesis H, and given a set of weighted rules derived from the distributional semantics (as explained in section 2.6) composing the knowledge base KB, we build a PSL model that supports determining the truth value of H in the most probable interpretation (i.e.
    Page 5, “PSL for STS”
  5. Parsing into logical form gives:
    Page 5, “PSL for STS”
  6. This system uses PSL to compute similarity of logical forms but does not use distributional information on lexical or phrasal similarity.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2014 that mention logical forms.

See all papers in Proc. ACL that mention logical forms.

Back to top.

semantic similarity

Appears in 6 sentences as: semantic similarity (5) semantically similar (1)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. judging the semantic similarity of natural-language sentences), and show that PSL gives improved results compared to a previous approach based on Markov Logic Networks (MLNs) and a purely distributional approach.
    Page 1, “Abstract”
  2. Distributional models (Turney and Pantel, 2010), on the other hand, use statistics on contextual data from large corpora to predict semantic similarity of words and phrases (Landauer and Dumais, 1997; Mitchell and Lapata, 2010).
    Page 2, “Background”
  3. Distributional models are motivated by the observation that semantically similar words occur in similar contexts, so words can be represented as vectors in high dimensional spaces generated from the contexts in which they occur (Landauer and Dumais, 1997; Lund and Burgess, 1996).
    Page 2, “Background”
  4. (2013) use MLNs to represent the meaning of natural language sentences and judge textual entailment and semantic similarity , but they were unable to scale the approach beyond short sentences due to the complexity of MLN inference.
    Page 2, “Background”
  5. Lewis and Steed-man (2013) use distributional information to determine word senses, but still produce a strictly logical semantic representation that does not address the “graded” nature of linguistic meaning that is important to measuring semantic similarity .
    Page 4, “Background”
  6. More specifically, they strongly indicate that PSL is a more effective probabilistic logic for judging semantic similarity than MLNs.
    Page 8, “Evaluation”

See all papers in Proc. ACL 2014 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

similarity score

Appears in 6 sentences as: similarity score (5) similarity scores (2)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. Given a similarity score for all pairs of sentences in the dataset, a regressor is trained on the training set to map the system’s output to the gold standard scores.
    Page 4, “Background”
  2. KB: The knowledge base is a set of lexical and phrasal rules generated from distributional semantics, along with a similarity score for each rule (section 2.6).
    Page 5, “PSL for STS”
  3. where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates.
    Page 5, “PSL for STS”
  4. To produce a final similarity score, we train a regressor to learn the mapping between the two PSL scores and the overall similarity score .
    Page 5, “PSL for STS”
  5. Then, for STS 2012, 1,500 pairs were selected and annotated with similarity scores .
    Page 7, “Evaluation”
  6. 0 Pearson correlation: The Pearson correlation between the system’s similarity scores and the human gold-standards.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2014 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.

distributional semantics

Appears in 4 sentences as: Distributional semantic (1) Distributional Semantics (1) distributional semantics (2)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. 2.2 Distributional Semantics
    Page 2, “Background”
  2. Distributional semantic knowledge is then encoded as weighted inference rules in the MLN.
    Page 4, “Background”
  3. Given the logical forms for a pair of sentences, a text T and a hypothesis H, and given a set of weighted rules derived from the distributional semantics (as explained in section 2.6) composing the knowledge base KB, we build a PSL model that supports determining the truth value of H in the most probable interpretation (i.e.
    Page 5, “PSL for STS”
  4. KB: The knowledge base is a set of lexical and phrasal rules generated from distributional semantics , along with a similarity score for each rule (section 2.6).
    Page 5, “PSL for STS”

See all papers in Proc. ACL 2014 that mention distributional semantics.

See all papers in Proc. ACL that mention distributional semantics.

Back to top.

distributional similarity

Appears in 4 sentences as: Distributional similarity (1) distributional similarity (3)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. deep representation of sentence meaning, expressed in first-order logic, to capture sentence structure, but combine it with distributional similarity ratings at the word and phrase level.
    Page 1, “Introduction”
  2. This approach is interesting in that it uses a very deep and precise representation of meaning, which can then be relaxed in a controlled fashion using distributional similarity .
    Page 1, “Introduction”
  3. Distributional similarity between pairs of words is converted into weighted inference rules that are added to the logical representation, and Markov Logic Networks are used to perform probabilistic logical inference.
    Page 4, “Background”
  4. where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates.
    Page 5, “PSL for STS”

See all papers in Proc. ACL 2014 that mention distributional similarity.

See all papers in Proc. ACL that mention distributional similarity.

Back to top.

graphical model

Appears in 4 sentences as: graphical model (2) graphical models (2)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. Markov Logic Networks (MLN) (Richardson and Domingos, 2006) are a framework for probabilistic logic that employ weighted formulas in first-order logic to compactly encode complex undirected probabilistic graphical models (i.e., Markov networks).
    Page 2, “Background”
  2. It uses logical representations to compactly define large graphical models with continuous variables, and includes methods for performing efficient probabilistic inference for the resulting models.
    Page 2, “Background”
  3. Given a set of weighted logical formulas, PSL builds a graphical model defining a probability distribution over the continuous space of values of the random variables in the model.
    Page 2, “Background”
  4. Grounding is the process of instantiating the variables in the quantified rules with concrete constants in order to construct the nodes and links in the final graphical model .
    Page 5, “PSL for STS”

See all papers in Proc. ACL 2014 that mention graphical model.

See all papers in Proc. ACL that mention graphical model.

Back to top.

knowledge base

Appears in 4 sentences as: knowledge base (4)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. The set of generated inference rules can be regarded as the knowledge base KB.
    Page 4, “Background”
  2. Given the logical forms for a pair of sentences, a text T and a hypothesis H, and given a set of weighted rules derived from the distributional semantics (as explained in section 2.6) composing the knowledge base KB, we build a PSL model that supports determining the truth value of H in the most probable interpretation (i.e.
    Page 5, “PSL for STS”
  3. KB: The knowledge base is a set of lexical and phrasal rules generated from distributional semantics, along with a similarity score for each rule (section 2.6).
    Page 5, “PSL for STS”
  4. o PSL-no-DIR: Our PSL system without distributional inference rules(empty knowledge base ).
    Page 7, “Evaluation”

See all papers in Proc. ACL 2014 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

linear program

Appears in 3 sentences as: linear program (2) linear programming (1)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. On the other hand, inference in PSL reduces to a linear programming problem, which is theoretically and practically much more efficient.
    Page 1, “Introduction”
  2. Formally, from equation 3, the most probable interpretation, is the one that minimizes 27.61% Ar(d(7“))p. In case of p = 1, and given that all d (7“) are linear equations, then minimizing the sum requires solving a linear program , which, compared to inference in other probabilistic logics such as MLNs, can be done relatively efficiently using well-established techniques.
    Page 3, “Background”
  3. PSL’s inference is actually an iterative process where in each iteration a grounding phase is followed by an optimization phase (solving the linear program ).
    Page 6, “PSL for STS”

See all papers in Proc. ACL 2014 that mention linear program.

See all papers in Proc. ACL that mention linear program.

Back to top.

probability distribution

Appears in 3 sentences as: probability distribution (3)
In Probabilistic Soft Logic for Semantic Textual Similarity
  1. MLNs define a probability distribution over possible worlds, where a world’s probability increases exponentially with the total weight of the logical clauses that it satisfies.
    Page 2, “Background”
  2. Given a set of weighted logical formulas, PSL builds a graphical model defining a probability distribution over the continuous space of values of the random variables in the model.
    Page 2, “Background”
  3. Using distance to satisfaction, PSL defines a probability distribution over all possible interpretations I of all ground atoms.
    Page 3, “Background”

See all papers in Proc. ACL 2014 that mention probability distribution.

See all papers in Proc. ACL that mention probability distribution.

Back to top.