Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank

Article Structure

Abstract

The analysis of reading times can provide insights into the processes that underlie language comprehension, with longer reading times indicating greater cognitive load.

Introduction

Psycholinguists have long realized that language comprehension is highly incremental, with readers and listeners continuously extracting the meaning of utterances on a word-by-word basis.

Models of Processing Difficulty

As described in Section 1, reading times provide an insight into the various cognitive activities that contribute to the overall processing difficulty involved in comprehending a written text.

Integrating Semantic Constraint into Surprisal

The treatment of semantic and syntactic constraint in models of processing difficulty has been somewhat inconsistent.

Method

Data The models discussed in the previous section were evaluated against an eye-tracking corpus.

Results

We computed separate mixed effects models for three dependent variables, namely first fixation duration, first pass duration, and total reading time.

Discussion

In this paper we investigated the contributions oi syntactic and semantic constraint in modeling processing difficulty.

Topics

LDA

Appears in 17 sentences as: LDA (17)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. LDA is a probabilistic topic model offering an alternative to spatial semantic representations.
    Page 5, “Models of Processing Difficulty”
  2. Whereas in LSA words are represented as points in a multidimensional space, LDA represents words using topics.
    Page 5, “Models of Processing Difficulty”
  3. The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition.
    Page 5, “Integrating Semantic Constraint into Surprisal”
  4. We also trained the LDA model on BLLIP, using the Gibb’s sampling procedure discussed in Griffiths et al.
    Page 6, “Method”
  5. SSS Additive — .03820*** Multiplicative — .00895*** LDA Additive — .025 00
    Page 7, “Results”
  6. Table 2: Coefficients of LME models including simple semantic space (SSS) or Latent Dirichlet Allocation ( LDA ) as factors; ***p < .001
    Page 7, “Results”
  7. Besides, replicating Pynte et al.’s (2008) finding, we were also interested in assessing whether the underlying semantic representation (simple semantic space or LDA ) and composition function (additive versus multiplicative) modulate reading times differentially.
    Page 7, “Results”
  8. The addition of the semantic factor significantly improves model fit for both the simple semantic space and LDA .
    Page 7, “Results”
  9. Factor SSS Coef LDA Coef
    Page 8, “Results”
  10. Table 3: Coefficients of nested LME models with the components of SSS or LDA surprisal as factors; only the coefficient of the additional factor at each step are shown
    Page 8, “Results”
  11. As far as LDA is concerned, the additive model significantly improves model fit, whereas the multiplicative one does not.
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

semantic space

Appears in 16 sentences as: semantic space (15) semantic spaces (1)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. Expectations are represented by a vector of probabilities which reflects the likely location in semantic space of the upcoming word.
    Page 1, “Introduction”
  2. The model essentially integrates the predictions of an incremental parser (Roark 2001) together with those of a semantic space model (Mitchell and Lapata 2009).
    Page 2, “Introduction”
  3. As LSA is one the best known semantic space models it comes as no surprise that it has been used to analyze semantic constraint.
    Page 3, “Models of Processing Difficulty”
  4. Context is represented by a vector of probabilities which reflects the likely location in semantic space of the upcoming word.
    Page 4, “Models of Processing Difficulty”
  5. Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al.
    Page 4, “Models of Processing Difficulty”
  6. 2007), although admittedly some composition functions may be better suited for particular semantic spaces .
    Page 4, “Models of Processing Difficulty”
  7. We also examine the influence of the underlying meaning representations by comparing a simple semantic space similar to McDonald (2000) against Latent Dirichlet Allocation (Blei et al.
    Page 4, “Models of Processing Difficulty”
  8. Despite its simplicity, the above semantic space (and variants thereof) has been used to successfully simulate lexical priming (e.g., McDonald 2000), human judgments of semantic similarity (Bullinaria and Levy 2007), and synonymy tests (Pado and Lapata 2007) such as those included in the Test of English as Foreign Language (TOEFL).
    Page 5, “Models of Processing Difficulty”
  9. In contrast to more standard semantic space models where word senses are conflated into a single representation, topics have an intuitive correspondence to coarse-grained sense distinctions.
    Page 5, “Models of Processing Difficulty”
  10. The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition.
    Page 5, “Integrating Semantic Constraint into Surprisal”
  11. Following Mitchell and Lapata (2009), we constructed a simple semantic space based on c0-occurrence statistics from the BLLIP training set.
    Page 6, “Method”

See all papers in Proc. ACL 2010 that mention semantic space.

See all papers in Proc. ACL that mention semantic space.

Back to top.

language model

Appears in 9 sentences as: language model (8) language models (2)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. In this paper we analyze reading times in terms of a single predictive measure which integrates a model of semantic composition with an incremental parser and a language model .
    Page 1, “Abstract”
  2. The basic idea is that the processing costs relating to the expectations of the language processor can be expressed in terms of the probabilities assigned by some form of language model to the input.
    Page 3, “Models of Processing Difficulty”
  3. Surprisal could be also defined using a vanilla language model that does not take any structural or grammatical information into account (Frank 2009).
    Page 3, “Models of Processing Difficulty”
  4. While surprisal is a theoretically well-motivated measure, formalizing the idea of linguistic processing being highly predictive in terms of probabilistic language models , the measurement of semantic constraint in terms of vector similarities lacks a clear motivation.
    Page 5, “Integrating Semantic Constraint into Surprisal”
  5. This can be achieved by turning a vector model of semantic similarity into a probabilistic language model .
    Page 5, “Integrating Semantic Constraint into Surprisal”
  6. There are in fact a number of approaches to deriving language models from distributional models of semantics (e.g., Bellegarda 2000; Coccaro and Jurafsky 1998; Gildea and Hofmann 1999).
    Page 5, “Integrating Semantic Constraint into Surprisal”
  7. Mitchell and Lapata (2009) show that a combined semantic-trigram language model derived from this approach and trained on the Wall Street Journal outperforms a baseline trigram model in terms of perplexity on a held out set.
    Page 5, “Integrating Semantic Constraint into Surprisal”
  8. They also linearly interpolate this semantic language model with the output of an incremental parser, which computes the following probability:
    Page 5, “Integrating Semantic Constraint into Surprisal”
  9. Equation (9) essentially defines a language model which combines semantic, syntactic and n-gram structure, and Mitchell and Lapata (2009) demonstrate that it improves further upon a semantic language model in terms of perplexity.
    Page 6, “Integrating Semantic Constraint into Surprisal”

See all papers in Proc. ACL 2010 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

co-occurrence

Appears in 5 sentences as: co-occurrence (6)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. To give a concrete example, Latent Semantic Analysis (LSA, Landauer and Dumais 1997) creates a meaning representation for words by constructing a word-document co-occurrence matrix from a large collection of documents.
    Page 3, “Models of Processing Difficulty”
  2. Like LSA, ICD is based on word co-occurrence vectors, however it does not employ singular value decomposition, and constructs a word-word rather than a word-document co-occurrence matrix.
    Page 4, “Models of Processing Difficulty”
  3. Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al.
    Page 4, “Models of Processing Difficulty”
  4. Specifically, the simpler space is based on word co-occurrence counts; it constructs the vector representing a given target word, t, by identifying all the tokens oft in a corpus and recording the counts of context words, 6, (within a specific window).
    Page 4, “Models of Processing Difficulty”
  5. It is similar in spirit to LSA, it also operates on a word-document co-occurrence matrix and derives a reduced dimensionality description of words and documents.
    Page 5, “Models of Processing Difficulty”

See all papers in Proc. ACL 2010 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

semantic representations

Appears in 5 sentences as: semantic representation (2) semantic representations (3)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. 2009); however, the semantic component of these models is limited to semantic role information, rather than attempting to build a full semantic representation for a sentence.
    Page 2, “Introduction”
  2. Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al.
    Page 4, “Models of Processing Difficulty”
  3. LDA is a probabilistic topic model offering an alternative to spatial semantic representations .
    Page 5, “Models of Processing Difficulty”
  4. Besides, replicating Pynte et al.’s (2008) finding, we were also interested in assessing whether the underlying semantic representation (simple semantic space or LDA) and composition function (additive versus multiplicative) modulate reading times differentially.
    Page 7, “Results”
  5. For example, we could envisage a parser that uses semantic representations to guide its search, e.g., by pruning syntactic analyses that have a low semantic probability.
    Page 9, “Discussion”

See all papers in Proc. ACL 2010 that mention semantic representations.

See all papers in Proc. ACL that mention semantic representations.

Back to top.

significantly improves

Appears in 5 sentences as: significant improvement (2) significantly improves (3)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. likelihood ratio is significant, then this indicates that the new factor significantly improves model fit.
    Page 7, “Method”
  2. The addition of the semantic factor significantly improves model fit for both the simple semantic space and LDA.
    Page 7, “Results”
  3. Considering the trigram model first, we find that adding this factor to the model gives a significant improvement in fit.
    Page 8, “Results”
  4. As far as LDA is concerned, the additive model significantly improves model fit, whereas the multiplicative one does not.
    Page 8, “Results”
  5. This collinearity issue may explain the absence of a significant improvement in model fit when these two terms are added to the baseline (see Table 3).
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention significantly improves.

See all papers in Proc. ACL that mention significantly improves.

Back to top.

conditional probability

Appears in 4 sentences as: conditional probabilities (1) conditional probability (3)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. Backward transitional probability is essentially the conditional probability of a word given its immediately preceding word, P(wk|wk_1).
    Page 3, “Models of Processing Difficulty”
  2. Analogously, forward probability is the conditional probability of the current word given the next word, P(wk|wk+1).
    Page 3, “Models of Processing Difficulty”
  3. This measure of processing cost for an input word, wk+1, given the previous context, W1...Wk, can be expressed straightforwardly in terms of its conditional probability as:
    Page 3, “Models of Processing Difficulty”
  4. The parser produces prefix probabilities for each word of a sentence which we converted to conditional probabilities by dividing each current probability by the previous one.
    Page 6, “Method”

See all papers in Proc. ACL 2010 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

meaning representations

Appears in 4 sentences as: meaning representation (1) meaning representations (3)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. The latter creates meaning representations compositionally, and therefore builds semantic expectations for word sequences (e. g., phrases, sentences, even documents) rather than isolated words.
    Page 2, “Introduction”
  2. To give a concrete example, Latent Semantic Analysis (LSA, Landauer and Dumais 1997) creates a meaning representation for words by constructing a word-document co-occurrence matrix from a large collection of documents.
    Page 3, “Models of Processing Difficulty”
  3. Their aim is not so much to model processing difficulty, but to construct vector-based meaning representations that go beyond individual words.
    Page 4, “Models of Processing Difficulty”
  4. We also examine the influence of the underlying meaning representations by comparing a simple semantic space similar to McDonald (2000) against Latent Dirichlet Allocation (Blei et al.
    Page 4, “Models of Processing Difficulty”

See all papers in Proc. ACL 2010 that mention meaning representations.

See all papers in Proc. ACL that mention meaning representations.

Back to top.

vector representing

Appears in 4 sentences as: vector representing (5) vectors representing (1)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. In this framework, the similarity between two words can be easily quantified, e.g., by measuring the cosine of the angle of the vectors representing them.
    Page 3, “Models of Processing Difficulty”
  2. Specifically, the simpler space is based on word co-occurrence counts; it constructs the vector representing a given target word, t, by identifying all the tokens oft in a corpus and recording the counts of context words, 6, (within a specific window).
    Page 4, “Models of Processing Difficulty”
  3. The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition.
    Page 5, “Integrating Semantic Constraint into Surprisal”
  4. The calculation of A is then based on a weighted dot product of the vector representing the upcoming word w, with the vector representing the prior context h:
    Page 5, “Integrating Semantic Constraint into Surprisal”

See all papers in Proc. ACL 2010 that mention vector representing.

See all papers in Proc. ACL that mention vector representing.

Back to top.

content words

Appears in 3 sentences as: content words (4)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. The model takes into account only content words , function words are of little interest here as they can be found in any context.
    Page 4, “Models of Processing Difficulty”
  2. common content words and each vector component is given by the ratio of the probability of a c,-given I to the overall probability of 6,.
    Page 5, “Models of Processing Difficulty”
  3. Finally, because our focus is the influence of semantic context, we selected only content words whose prior sentential context contained at least two further content words .
    Page 6, “Method”

See all papers in Proc. ACL 2010 that mention content words.

See all papers in Proc. ACL that mention content words.

Back to top.

semantic relation

Appears in 3 sentences as: semantic relation (1) semantic relationship (1) semantically related (1)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. The first type is semantic prediction, as evidenced in semantic priming: a word that is preceded by a semantically related prime or a semantically congruous sentence fragment is processed faster (Stanovich and West 1981; van Berkum et al.
    Page 1, “Introduction”
  2. Comprehenders are faster at naming words that are syntactically compatible with prior context, even when they bear no semantic relationship to the context (Wright and Garrett 1984).
    Page 1, “Introduction”
  3. Distributional models of meaning have been commonly used to quantify the semantic relation between a word and its context in computational studies of lexical processing.
    Page 3, “Models of Processing Difficulty”

See all papers in Proc. ACL 2010 that mention semantic relation.

See all papers in Proc. ACL that mention semantic relation.

Back to top.

semantic similarity

Appears in 3 sentences as: Semantic similarities (1) semantic similarity (2)
In Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
  1. Semantic similarities are then modeled in terms of geometric similarities within the space.
    Page 3, “Models of Processing Difficulty”
  2. Despite its simplicity, the above semantic space (and variants thereof) has been used to successfully simulate lexical priming (e.g., McDonald 2000), human judgments of semantic similarity (Bullinaria and Levy 2007), and synonymy tests (Pado and Lapata 2007) such as those included in the Test of English as Foreign Language (TOEFL).
    Page 5, “Models of Processing Difficulty”
  3. This can be achieved by turning a vector model of semantic similarity into a probabilistic language model.
    Page 5, “Integrating Semantic Constraint into Surprisal”

See all papers in Proc. ACL 2010 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.