Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Cheung, Jackie Chi Kit and Penn, Gerald

Article Structure

Abstract

Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain.

Introduction

Detailed domain knowledge is crucial to many NLP tasks, either as an input for language understanding, or as the goal itself, to acquire such knowledge.

Related Work

Probabilistic content models were proposed by Barzilay and Lee (2004), and related models have since become popular for summarization (Fung and Ngai, 2006; Haghighi and Vanderwende,

Distributional Semantic Hidden Markov Models

We now describe the DSHMM model.

Experiments

We trained the distributional semantic models using the Annotated Gigaword corpus (Napoles et al., 2012), which has been automatically preprocessed and is based on Gigaword 5th edition.

Guided Summarization Slot Induction

We first evaluated our models on their ability to produce coherent clusters of entities belonging to the same slot, adopting the experimental procedure of Cheung et al.

Multi-document Summarization: An Extrinsic Evaluation

We next evaluated our models extrinsically in the setting of extractive, multi-document summarization.

Conclusion

We have shown that contextualized distributional semantic vectors can be successfully integrated into a generative probabilistic model for domain modelling, as demonstrated by improvements in slot induction and multi-document summarization.

Topics

distributional semantic

Appears in 19 sentences as: Distributional Semantic (2) Distributional semantic (1) distributional semantic (13) distributional semantics (6)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domain-general lexical disambiguation tasks.
    Page 1, “Abstract”
  2. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions.
    Page 1, “Abstract”
  3. By contrast, distributional semantic models are trained on large, domain-general corpora.
    Page 1, “Introduction”
  4. In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models, in order to combine their complementary strengths for domain modelling.
    Page 2, “Introduction”
  5. There are a number of potential advantages that distributional semantic models offer.
    Page 2, “Introduction”
  6. Our model, the Distributional Semantic Hidden Markov Model (DSHMM), incorporates contextualized distributional semantic vectors into a generative probabilistic model as observed emissions.
    Page 2, “Introduction”
  7. We show that our model outperforms a baseline version of our method that does not use distributional semantic vectors, as well as a recent state-of-the-art template induction method.
    Page 2, “Introduction”
  8. From a modelling perspective, these results show that probabilistic models for content modelling and template induction benefit from distributional semantics trained on a much larger corpus.
    Page 2, “Introduction”
  9. From the perspective of distributional semantics, this work broadens the variety of problems to which distributional semantics can be applied, and proposes methods to perform inference in a probabilistic setting beyond geometric measures such as cosine similarity.
    Page 2, “Introduction”
  10. Our work is similar in that we assume much of the same structure within a domain and consequently in the model as well (Section 3), but whereas PROFINDER focuses on finding the “correct” number of frames, events, and slots with a nonparametric method, this work focuses on integrating global knowledge in the form of distributional semantics into a probabilistic model.
    Page 2, “Related Work”
  11. Unlike in most applications of HMMs in text processing, in which the representation of a token is simply its word or lemma identity, tokens in DSHMM are also associated with a vector representation of their meaning in context according to a distributional semantic model (Section 3.1).
    Page 3, “Distributional Semantic Hidden Markov Models”

See all papers in Proc. ACL 2013 that mention distributional semantic.

See all papers in Proc. ACL that mention distributional semantic.

Back to top.

probabilistic model

Appears in 12 sentences as: probabilistic model (6) probabilistic models (6)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain.
    Page 1, “Abstract”
  2. Generative probabilistic models have been one popular approach to content modelling.
    Page 1, “Introduction”
  3. In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models , in order to combine their complementary strengths for domain modelling.
    Page 2, “Introduction”
  4. First, they provide domain-general representations of word meaning that cannot be reliably estimated from the small target-domain corpora on which probabilistic models are trained.
    Page 2, “Introduction”
  5. Our model, the Distributional Semantic Hidden Markov Model (DSHMM), incorporates contextualized distributional semantic vectors into a generative probabilistic model as observed emissions.
    Page 2, “Introduction”
  6. From a modelling perspective, these results show that probabilistic models for content modelling and template induction benefit from distributional semantics trained on a much larger corpus.
    Page 2, “Introduction”
  7. (2013) propose PROFINDER, a probabilistic model for frame induction inspired by content models.
    Page 2, “Related Work”
  8. Our work is similar in that we assume much of the same structure within a domain and consequently in the model as well (Section 3), but whereas PROFINDER focuses on finding the “correct” number of frames, events, and slots with a nonparametric method, this work focuses on integrating global knowledge in the form of distributional semantics into a probabilistic model .
    Page 2, “Related Work”
  9. Combining distributional information and probabilistic models has actually been explored in previous work.
    Page 2, “Related Work”
  10. Usually, an ad-hoc clustering step precedes training and is used to bias the initialization of the probabilistic model (Barzilay and Lee, 2004; Louis and Nenkova, 2012), or the clustering is interleaved with iterations of training (Fung et al., 2003).
    Page 2, “Related Work”
  11. We have shown that contextualized distributional semantic vectors can be successfully integrated into a generative probabilistic model for domain modelling, as demonstrated by improvements in slot induction and multi-document summarization.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention probabilistic model.

See all papers in Proc. ACL that mention probabilistic model.

Back to top.

hyperparameters

Appears in 6 sentences as: hyperparameter (2) hyperparameters (4)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Distributions that generate the latent variables and hyperparameters are omitted for clarity.
    Page 3, “Related Work”
  2. We follow the “neutral” setting of hyperparameters given by Ormoneit and Tresp (1995), so that the MAP estimate for the covariance matrix for (event or slot) state 2' becomes:
    Page 5, “Distributional Semantic Hidden Markov Models”
  3. where j indexes all the relevant semantic vectors 953- in the training set, rij is the posterior responsibility of state i for vector :53, and [3 is the remaining hyperparameter that we tune to adjust the amount of regularization.
    Page 5, “Distributional Semantic Hidden Markov Models”
  4. We tune the hyperparameters (N E, N3, 6, [3, k) and the number of EM iterations by twofold cross-validationl.
    Page 5, “Distributional Semantic Hidden Markov Models”
  5. 1The topic cluster splits and the hyperparameter settings are available at http: //WWW .
    Page 5, “Distributional Semantic Hidden Markov Models”
  6. We trained a DSHMM separately for each of the five domains with different semantic models, tuning hyperparameters by twofold cross-validation.
    Page 6, “Guided Summarization Slot Induction”

See all papers in Proc. ACL 2013 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

noun phrases

Appears in 5 sentences as: Noun phrases (1) noun phrases (4)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Given a document consisting of a sequence of T clauses headed by propositional heads H (verbs or event nouns), and argument noun phrases fl, a DSHMM models the joint probability of observations H, fl, and latent random variables E and g representing domain events and slots respectively; i.e., P(H, fl, E, g
    Page 3, “Distributional Semantic Hidden Markov Models”
  2. We assume that event heads are verbs or event nouns, while arguments are the head words of their syntactically dependent noun phrases .
    Page 3, “Distributional Semantic Hidden Markov Models”
  3. First, the maximal noun phrases are extracted from the contributors and clustered based on the TAC slot of the contributor.
    Page 6, “Guided Summarization Slot Induction”
  4. These clusters of noun phrases then become the gold standard clusters against which automatic systems are compared.
    Page 6, “Guided Summarization Slot Induction”
  5. Noun phrases are considered to be matched if the lemmata of their head words are the same and they are extracted from the same summary.
    Page 6, “Guided Summarization Slot Induction”

See all papers in Proc. ACL 2013 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

Vector Space

Appears in 5 sentences as: Vector Space (2) Vector space (1) vector space (2)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domain-general lexical disambiguation tasks.
    Page 1, “Abstract”
  2. The most popular approach today is a vector space representation, in which each dimension corresponds to some context word, and the value at that dimension corresponds to the strength of the association between the context word and the target word being modelled.
    Page 1, “Introduction”
  3. Vector space models form the basis of modern information retrieval (Salton et al., 1975), but only recently have distributional models been proposed that are compositional (Mitchell and Lapata, 2008; Clark et al., 2008; Grefenstette and Sadrzadeh, 2011, inter alia), or that contextualize the meaning of a word using other words in the same phrase (co-compositionality) (Erk and Padé, 2008; Dinu and Lapata, 2010; Thater et al., 2011).
    Page 2, “Related Work”
  4. t=1 a=1 3.1 Vector Space Models of Semantics
    Page 4, “Distributional Semantic Hidden Markov Models”
  5. Simple Vector Space Model In the basic version of the model (SIMPLE), we train a term-context matrix, where rows correspond to target words, and columns correspond to context words.
    Page 4, “Distributional Semantic Hidden Markov Models”

See all papers in Proc. ACL 2013 that mention Vector Space.

See all papers in Proc. ACL that mention Vector Space.

Back to top.

latent variables

Appears in 4 sentences as: latent variable (1) latent variables (3)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Distributions that generate the latent variables and hyperparameters are omitted for clarity.
    Page 3, “Related Work”
  2. This model can be thought of as an HMM with two layers of latent variables , representing events and slots in the domain.
    Page 3, “Distributional Semantic Hidden Markov Models”
  3. Event Variables At the top-level, a categorical latent variable E; with N E possible states represents the event that is described by clause 75.
    Page 3, “Distributional Semantic Hidden Markov Models”
  4. Slot Variables Categorical latent variables with N 3 possible states represent the slot that an argument fills, and are conditioned on the event variable in the clause, E7; (i.e., PS(Sta|Et), for the ath slot variable).
    Page 3, “Distributional Semantic Hidden Markov Models”

See all papers in Proc. ACL 2013 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.

vector representation

Appears in 4 sentences as: vector representation (3) vector representations (1)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Unlike in most applications of HMMs in text processing, in which the representation of a token is simply its word or lemma identity, tokens in DSHMM are also associated with a vector representation of their meaning in context according to a distributional semantic model (Section 3.1).
    Page 3, “Distributional Semantic Hidden Markov Models”
  2. All the methods below start from this basic vector representation .
    Page 4, “Distributional Semantic Hidden Markov Models”
  3. Let event head h be the syntactic head of a number of arguments a1,a2, ...am, and 27h,27a1,27a2, ...27am be their respective vector representations according to the SIMPLE method.
    Page 4, “Distributional Semantic Hidden Markov Models”
  4. The intuition is that a word should be contextualized such that its vector representation becomes more similar to the vectors of other words that its dependency neighbours often take in the same syntactic position.
    Page 4, “Distributional Semantic Hidden Markov Models”

See all papers in Proc. ACL 2013 that mention vector representation.

See all papers in Proc. ACL that mention vector representation.

Back to top.

generative model

Appears in 3 sentences as: generative model (2) generative models (1)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions.
    Page 1, “Abstract”
  2. Second, the contextualization process allows the semantic vectors to implicitly encode disambiguated word sense and syntactic information, without further adding to the complexity of the generative model .
    Page 2, “Introduction”
  3. Other related generative models include topic models and structured versions thereof (Blei et al., 2003; Gruber et al., 2007; Wallach, 2008).
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

gold standard

Appears in 3 sentences as: gold standard (3)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Each topic cluster contains eight human-written “model” summaries (“model” here meaning a gold standard ).
    Page 5, “Experiments”
  2. These clusters of noun phrases then become the gold standard clusters against which automatic systems are compared.
    Page 6, “Guided Summarization Slot Induction”
  3. Because we are evaluating unsupervised systems, the clusters produced by the systems are not labelled, and must be matched to the gold standard clusters.
    Page 6, “Guided Summarization Slot Induction”

See all papers in Proc. ACL 2013 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

word sense

Appears in 3 sentences as: word sense (3)
In Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
  1. Contextualization has been found to improve performance in tasks like lexical substitution and word sense disambiguation (Thater et al., 2011).
    Page 2, “Introduction”
  2. Second, the contextualization process allows the semantic vectors to implicitly encode disambiguated word sense and syntactic information, without further adding to the complexity of the generative model.
    Page 2, “Introduction”
  3. The effectiveness of our model stems from the use of a large domain-general corpus to train the distributional semantic vectors, and the implicit syntactic and word sense information pro-
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention word sense.

See all papers in Proc. ACL that mention word sense.

Back to top.