Learning Document-Level Semantic Properties from Free-Text Annotations
Branavan, S.R.K. and Chen, Harr and Eisenstein, Jacob and Barzilay, Regina

Article Structure

Abstract

This paper demonstrates a new method for leveraging free-text annotations to infer semantic properties of documents.

Introduction

A central problem in language understanding is transforming raw text into structured representations.

Related Work

Review Analysis Our approach relates to previous work on property extraction from reviews (Popescu et al., 2005; Hu and Liu, 2004; Kim and Hovy, 2006).

Problem Formulation

We formulate our problem as follows.

Model Description

Our approach leverages both keyphrase clustering and distributional analysis of the text in a joint, hierarchical Bayesian model.

Posterior Sampling

Ultimately, we need to compute the model’s posterior distribution given the training data.

Experimental Setup

Data Sets We evaluate our system on reviews from two categories, restaurants and cell phones.

Results

Comparative performance Table 2 presents the results of the evaluation scenarios described above.

Conclusions and Future Work

In this paper, we have shown how free-text annotations provided by novice users can be leveraged as a training set for document-level semantic inference.

Topics

topic model

Appears in 14 sentences as: topic model (9) Topic Modeling (1) topic modeling (1) topic models (3)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. Keyphrases are clustered based on their distributional and lexical properties, and a hidden topic model is applied to the document text.
    Page 2, “Introduction”
  2. Bayesian Topic Modeling One aspect of our model views properties as distributions over words in the document.
    Page 2, “Related Work”
  3. This approach is inspired by methods in the topic modeling literature, such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), where topics are treated as hidden variables that govern the distribution of words in a text.
    Page 2, “Related Work”
  4. Recent work has examined coupling topic models with explicit supervision (Blei and McAuliffe, 2007; Titov and McDonald, 2008).
    Page 2, “Related Work”
  5. During training, we learn a hidden topic model from the text; each topic is also asso-
    Page 2, “Model Description”
  6. — probability of selecting 77 instead of ¢ — selects between 77 and ¢ for word topics — document topic model
    Page 3, “Model Description”
  7. The hidden topic model of the review text is used to determine the properties that a document as a whole supports.
    Page 3, “Model Description”
  8. Our analysis of the document text is based on probabilistic topic models such as LDA (Blei et al., 2003).
    Page 3, “Model Description”
  9. These latent topics are drawn either from the set of clusters represented by the document’s keyphrases, or from the document’s topic model gbd.
    Page 4, “Model Description”
  10. We deterministically construct a document-specific keyphrase topic model 77d, based on the keyphrase cluster assignments x and the observed keyphrases hd.
    Page 4, “Model Description”
  11. keyphrase model W or the document topic model gbd.
    Page 4, “Model Description”

See all papers in Proc. ACL 2008 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

gold standard

Appears in 6 sentences as: gold standard (5) “gold standard” (1)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. For the restaurant data — where the gold standard identified eight semantic properties — we set K to 20, allowing the model to account for keyphrases not included in the eight most common properties.
    Page 6, “Experimental Setup”
  2. To perform a noise-free comparison, we based our second evaluation on the manually constructed gold standard for the restaurant category.
    Page 6, “Experimental Setup”
  3. For instance, evaluation against gold standard annotations shows that the random baseline outperforms all of the other baselines.
    Page 7, “Results”
  4. Comparison of cluster quality is against the gold standard .
    Page 8, “Results”
  5. One way to assess clustering quality is to compare it against a “gold standard” clustering, as constructed in Section 6.
    Page 8, “Results”
  6. Another way of assessing cluster quality is to consider the impact of using the gold standard clustering instead of our model’s clustering.
    Page 8, “Results”

See all papers in Proc. ACL 2008 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

language model

Appears in 6 sentences as: language model (3) language models (3)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. Each property indexes a language model , thus allowing documents that incorporate the same
    Page 1, “Introduction”
  2. Keyphrases are drawn from a set of clusters; words in the documents are drawn from language models indexed by a set of topics, where the topics correspond to the keyphrase clusters.
    Page 2, “Model Description”
  3. language models of each topic
    Page 3, “Model Description”
  4. In the LDA framework, each word is generated from a language model that is indexed by the word’s topic assignment.
    Page 3, “Model Description”
  5. Finally, the word wdm, is drawn from the multinomial dem, where 2d,“, indexes a topic-specific language model .
    Page 4, “Model Description”
  6. Each of the K language models 6],, is drawn from a symmetric Dirichlet prior 60.
    Page 4, “Model Description”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

LDA

Appears in 4 sentences as: LDA (4)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. This approach is inspired by methods in the topic modeling literature, such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), where topics are treated as hidden variables that govern the distribution of words in a text.
    Page 2, “Related Work”
  2. Our analysis of the document text is based on probabilistic topic models such as LDA (Blei et al., 2003).
    Page 3, “Model Description”
  3. In the LDA framework, each word is generated from a language model that is indexed by the word’s topic assignment.
    Page 3, “Model Description”
  4. Thus, rather than identifying a single topic for a document, LDA identifies a distribution over topics.
    Page 3, “Model Description”

See all papers in Proc. ACL 2008 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

topic distribution

Appears in 4 sentences as: topic distribution (5)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. For this reason, we also construct a document-specific topic distribution gb.
    Page 4, “Model Description”
  2. The auxiliary variable 0 indicates whether a given word’s topic is drawn from the set of keyphrase clusters, or from this topic distribution .
    Page 4, “Model Description”
  3. The third term is the dependence of the word topics 2d,“, on the topic distribution 77d.
    Page 5, “Posterior Sampling”
  4. The word topics 2 are sampled according to keyphrase topic distribution 77d, document topic distribution gbd, words w, and auxiliary variables c:
    Page 5, “Posterior Sampling”

See all papers in Proc. ACL 2008 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

development set

Appears in 3 sentences as: development set (3)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. Properties with proportions above a set threshold (tuned on a development set ) are predicted as being supported.
    Page 3, “Model Description”
  2. Training Our model needs to be provided with the number of clusters K. We set K large enough for the model to learn effectively on the development set .
    Page 6, “Experimental Setup”
  3. A threshold for this proportion is set for each property via the development set .
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2008 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampler (1) Gibbs sampling (2)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. We employ Gibbs sampling , previously used in NLP by Finkel et al.
    Page 4, “Posterior Sampling”
  2. To improve the model’s convergence rate, we perform two initialization steps for the Gibbs sampler .
    Page 6, “Experimental Setup”
  3. Inference The final point estimate used for testing is an average (for continuous variables) or a mode (for discrete variables) over the last 1,000 Gibbs sampling iterations.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2008 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

similarity scores

Appears in 3 sentences as: similarity scores (3)
In Learning Document-Level Semantic Properties from Free-Text Annotations
  1. We represent each distinct keyphrase as a vector of similarity scores computed over the set of observed keyphrases; these scores are represented by s in Figure 2, the plate diagram of our model.1 Modeling the similarity matrix rather than the sur-
    Page 3, “Model Description”
  2. 1We assume that similarity scores are conditionally independent given the keyphrase clustering, though the scores are in fact related.
    Page 3, “Model Description”
  3. Our present model makes strong assumptions about the independence of similarity scores .
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.