Computational Approaches to Sentence Completion
Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang

Article Structure

Abstract

This paper studies the problem of sentence-level semantic coherence by answering SAT-style sentence completion questions.

Introduction

In recent years, standardized examinations have proved a fertile source of evaluation data for language processing tasks.

Related Work

The past work which is most similar to ours is derived from the lexical substitution track of SemEval—2007 (McCarthy and Navigli, 2007).

Sentence Completion via Language Modeling

Perhaps the most straightforward approach to solving the sentence completion task is to form the complete sentence with each option in turn, and to evaluate its likelihood under a language model.

Sentence Completion via Latent Semantic Analysis

Latent Semantic Analysis (LSA) (Deerwester et al., 1990) is a widely used method for representing words and documents in a low dimensional vector space.

Experimental Results 5.1 Data Resources

We present results with two datasets.

Discussion

To verify that the differences in accuracy between the different algorithms are not statistical flukes, we perform a statistical significance test on the outputs of each algorithm.

Conclusion

In this paper we have investigated methods for answering sentence—completion questions.

Topics

language model

Appears in 27 sentences as: Language Model (4) language model (15) Language Modeling (1) language modeling (8) language models (2)
In Computational Approaches to Sentence Completion
  1. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model ; and methods that evaluate global coherence, such as latent semantic analysis.
    Page 1, “Abstract”
  2. To investigate the usefulness of local information, we evaluated n—gram language model scores, from both a conventional model with Good—Turing smoothing, and with a recently proposed maximum—entropy class—based n—gram model (Chen, 2009a; Chen, 2009b).
    Page 2, “Introduction”
  3. Also in the language modeling vein, but with potentially global context, we evaluate the use of a recurrent neural network language model .
    Page 2, “Introduction”
  4. In all the language modeling approaches, a model is used to compute a sentence probability with each of the potential completions.
    Page 2, “Introduction”
  5. Section 3 describes the language modeling methods we have evaluated.
    Page 2, “Introduction”
  6. The KU system uses just an N—gram language model to do this ranking.
    Page 2, “Related Work”
  7. The UNT system uses a large variety of information sources, and a language model score receives the highest weight.
    Page 2, “Related Work”
  8. Perhaps the most straightforward approach to solving the sentence completion task is to form the complete sentence with each option in turn, and to evaluate its likelihood under a language model .
    Page 3, “Sentence Completion via Language Modeling”
  9. In this section, we describe the suite of state—of—the—art language modeling techniques for which we will present results.
    Page 3, “Sentence Completion via Language Modeling”
  10. 3.1 Backoff N-gram Language Model
    Page 3, “Sentence Completion via Language Modeling”
  11. Our baseline model is a Good—Turing smoothed model trained with the CMU language modeling toolkit (Clarkson and Rosenfeld, 1997).
    Page 3, “Sentence Completion via Language Modeling”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

development set

Appears in 6 sentences as: development set (6)
In Computational Approaches to Sentence Completion
  1. In practice, a f of 1.2 was selected on the basis of development set results.
    Page 5, “Sentence Completion via Latent Semantic Analysis”
  2. This book contains eleven practice tests, and we used all the sentence completion questions in the first five tests as a development set , and all the questions in the last six tests as the test set.
    Page 6, “Experimental Results 5.1 Data Resources”
  3. To provide human benchmark performance, we asked six native speaking high school students and five graduate students to answer the questions on the development set .
    Page 6, “Experimental Results 5.1 Data Resources”
  4. For the LSA—LM, an interpolation weight of 0.1 was used for the LSA score, determined through optimization on the development set .
    Page 7, “Experimental Results 5.1 Data Resources”
  5. proach of using reconstruction error performed very well on the development set , but unremarkany on the test set.
    Page 7, “Experimental Results 5.1 Data Resources”
  6. We train the parameters of the linear combination on the SAT development set .
    Page 7, “Experimental Results 5.1 Data Resources”

See all papers in Proc. ACL 2012 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

cosine similarity

Appears in 5 sentences as: Cosine similarity (1) cosine similarity (5)
In Computational Approaches to Sentence Completion
  1. An important property of SVD is that the rows of US — which represents the words — behave similarly to the original rows of W, in the sense that the cosine similarity between two rows in US approximates the cosine similarity between the corre—
    Page 4, “Sentence Completion via Latent Semantic Analysis”
  2. sponding rows in W. Cosine similarity is defined as
    Page 5, “Sentence Completion via Latent Semantic Analysis”
  3. Let m be the smallest cosine similarity between h and any word in the vocabulary V: m = minwev sim(h, w).
    Page 5, “Sentence Completion via Latent Semantic Analysis”
  4. Of all the methods in isolation, the simple approach of Section 4.1 — to use the total cosine similarity between a potential answer and the other words in the sentence — has performed best.
    Page 7, “Experimental Results 5.1 Data Resources”
  5. For the LSA model, the linear combination has three inputs: the total word similarity, the cosine similarity between the sum of the answer word vectors and the sum of the rest of sentence’s word vectors, and the number of out—of—vocabulary terms in the answer.
    Page 7, “Experimental Results 5.1 Data Resources”

See all papers in Proc. ACL 2012 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

latent semantic

Appears in 5 sentences as: Latent Semantic (2) latent semantic (3)
In Computational Approaches to Sentence Completion
  1. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model; and methods that evaluate global coherence, such as latent semantic analysis.
    Page 1, “Abstract”
  2. As a first step, we have approached the problem from two points—of—view: first by exploiting local sentence structure, and secondly by measuring a novel form of global sentence coherence based on latent semantic analysis.
    Page 2, “Introduction”
  3. a novel method based on latent semantic analysis (LSA).
    Page 2, “Introduction”
  4. That paper also explores the use of Latent Semantic Analysis to measure the degree of similarity between a potential replacement and its context, but the results are poorer than others.
    Page 2, “Related Work”
  5. Latent Semantic Analysis (LSA) (Deerwester et al., 1990) is a widely used method for representing words and documents in a low dimensional vector space.
    Page 4, “Sentence Completion via Latent Semantic Analysis”

See all papers in Proc. ACL 2012 that mention latent semantic.

See all papers in Proc. ACL that mention latent semantic.

Back to top.

SVD

Appears in 5 sentences as: SVD (5)
In Computational Approaches to Sentence Completion
  1. The method is based on applying singular value decomposition ( SVD ) to a matrix W representing the occurrence of words in documents.
    Page 4, “Sentence Completion via Latent Semantic Analysis”
  2. SVD results in an approximation of W by the product of three matrices, one in which each word is represented as a low—dimensional vector, one in which each document is represented as a low dimensional vector, and a diagonal scaling matrix.
    Page 4, “Sentence Completion via Latent Semantic Analysis”
  3. An important property of SVD is that the rows of US — which represents the words — behave similarly to the original rows of W, in the sense that the cosine similarity between two rows in US approximates the cosine similarity between the corre—
    Page 4, “Sentence Completion via Latent Semantic Analysis”
  4. In terms of efficiency, recall that it is necessary to perform SVD on a term—document matrix.
    Page 6, “Sentence Completion via Latent Semantic Analysis”
  5. While the resulting matrix is highly sparse, it is nevertheless impractical to perform SVD .
    Page 6, “Sentence Completion via Latent Semantic Analysis”

See all papers in Proc. ACL 2012 that mention SVD.

See all papers in Proc. ACL that mention SVD.

Back to top.

neural network

Appears in 4 sentences as: neural network (3) neural networks (1)
In Computational Approaches to Sentence Completion
  1. Also in the language modeling vein, but with potentially global context, we evaluate the use of a recurrent neural network language model.
    Page 2, “Introduction”
  2. In this case, the best combination is to blend LSA, the Good—Turing language model, and the recurrent neural network .
    Page 8, “Experimental Results 5.1 Data Resources”
  3. Secondly, for the Holmes data, we can state that LSA total similarity beats the recurrent neural network , which in turn is better than the baseline n—gram model.
    Page 8, “Discussion”
  4. It is an interesting research question if this could be done implicitly with a machine learning technique, for example recurrent or recursive neural networks .
    Page 8, “Discussion”

See all papers in Proc. ACL 2012 that mention neural network.

See all papers in Proc. ACL that mention neural network.

Back to top.

LM

Appears in 3 sentences as: LM (4)
In Computational Approaches to Sentence Completion
  1. These data sources were evaluated using the baseline n—gram LM approach of Section 3.1.
    Page 6, “Experimental Results 5.1 Data Resources”
  2. Method Test 3—input LSA 46% LSA + Good—Turing LM 53 LSA + Good—Turing LM + RNN 52
    Page 7, “Experimental Results 5.1 Data Resources”
  3. Method SAT Holmes Chance 20% 20% GT N— gram LM 42 39 RNN 42 45
    Page 8, “Discussion”

See all papers in Proc. ACL 2012 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

maximum entropy

Appears in 3 sentences as: Maximum Entropy (1) maximum entropy (2)
In Computational Approaches to Sentence Completion
  1. 3.2 Maximum Entropy Class-Based N-gram Language Model
    Page 3, “Sentence Completion via Language Modeling”
  2. The key ideas are the modeling of word n—gram probabilities with a maximum entropy model, and the use of word—class information in the definition of the features.
    Page 3, “Sentence Completion via Language Modeling”
  3. Both components are themselves maximum entropy n—gram models in which the probability of a word or class label l given history h is determined by %exp(zk fk(h, The features fk(h, l) used are the presence of various patterns in the concatenation of hl, for example whether a particular suffix is present in hl.
    Page 4, “Sentence Completion via Language Modeling”

See all papers in Proc. ACL 2012 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

model trained

Appears in 3 sentences as: model trained (2) models trained (2)
In Computational Approaches to Sentence Completion
  1. Our baseline model is a Good—Turing smoothed model trained with the CMU language modeling toolkit (Clarkson and Rosenfeld, 1997).
    Page 3, “Sentence Completion via Language Modeling”
  2. For the SAT task, we used a trigram language model trained on 1.1B words of newspaper data, described in Section 5.1.
    Page 3, “Sentence Completion via Language Modeling”
  3. Note that the latter are derived from models trained with the Los Angeles Times data, while the Holmes results are derived from models trained with 19th—century novels.
    Page 8, “Experimental Results 5.1 Data Resources”

See all papers in Proc. ACL 2012 that mention model trained.

See all papers in Proc. ACL that mention model trained.

Back to top.

N-gram

Appears in 3 sentences as: N-gram (3)
In Computational Approaches to Sentence Completion
  1. 3.1 Backoff N-gram Language Model
    Page 3, “Sentence Completion via Language Modeling”
  2. 3.2 Maximum Entropy Class-Based N-gram Language Model
    Page 3, “Sentence Completion via Language Modeling”
  3. 4.3 A LSA N-gram Language Model
    Page 5, “Sentence Completion via Latent Semantic Analysis”

See all papers in Proc. ACL 2012 that mention N-gram.

See all papers in Proc. ACL that mention N-gram.

Back to top.