Word Association Profiles and their Use for Automated Scoring of Essays
Beigman Klebanov, Beata and Flor, Michael

Article Structure

Abstract

We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and disassociated pairs of words that coexist in the given text.

Introduction

The vast majority of contemporary research that investigates statistical properties of language deals with characterizing words by extracting information about their behavior from large corpora.

Methodology

In order to describe the word association profile of a text, three decisions need to be made.

Illustration: The shape of the distribution

For a first illustration, we use a corpus of 5,904 essays written as part of a standardized graduate

Application to Essay Scoring

Texts written for a test and scored by relevant professionals is a setting where variation in text quality is expected.

Related Work

Most of the attention in the computational linguistics research that deals with analysis of the lexis of texts has so far been paid to what in our terms would be the very high end of the word association profile.

Conclusion

In this paper, we described a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassoci-ated, and disassociated pairs of words selected to coexist in the given text by its author.

Topics

word pairs

Appears in 10 sentences as: word pairs (10)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. fact that a text segmentation algorithm that uses information about patterns of word co-occurrences can detect subtopic shifts in a text (Riedl and Bie-mann, 2012; Misra et al., 2009; Eisenstein and Barzilay, 2008) tells us that texts contain some proportion of more highly associated word pairs (those in subsequent sentences within the same topical unit) and of less highly associated pairs (those in sentences from different topical units).1 Yet, does each text have a different distribution of highly associated, mildly associated, unassoci-ated, and disassociated pairs of words, or do texts tend to strike a similar balance of these?
    Page 1, “Introduction”
  2. The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values.
    Page 2, “Methodology”
  3. The lowest bin (shown in Figures 1 and 2 as PMI = —5) contains pairs with PMIg—S; the topmost bin (shown in Figures 1 and 2 as PMI = 4.83) contains pairs with PMI > 4.67, while the rest of the bins contain word pairs (:c,y) with —5 <PMI(x,y) g 4.67.
    Page 2, “Methodology”
  4. Thus, the text “The dog barked and wagged its tail” is much tighter than the text “Green ideas sleep furiously”, with all the six content word pairs scoring above PMI=5.5 in the first and below PMI=2.2 in the second.4
    Page 2, “Methodology”
  5. Yet, the picture at the right tail is remarkably similar to that of the essays, with 9% of word pairs , on average, having PMI>2.17.
    Page 3, “Illustration: The shape of the distribution”
  6. The right tail, with PMI>2.17, holds 19% of all word pairs in these texts — more than twice the proportion in essays written by college graduates or in texts from the WSJ.
    Page 3, “Illustration: The shape of the distribution”
  7. We calculated correlations between essay score and the proportion of word pairs in each of the 60 bins of the WAP histogram, separately for each of the prompts p1-p6 in setA.
    Page 5, “Application to Essay Scoring”
  8. Next, observe the consistent negative correlations between essay score and the proportion of word pairs in bins PMI=0.833 through PMI=1.5.
    Page 5, “Application to Essay Scoring”
  9. Our results suggest that this direction is promising, as merely the proportion of highly associated word pairs is already contributing a clear signal regarding essay quality; it is possible that additional information can be derived from richer representations common in the lexical cohesion literature.
    Page 8, “Related Work”
  10. We hypothesize that this pattern is consistent with the better essays demonstrating both a better topic development (hence the higher percentage of highly related pairs) and a more creative use of language resources, as manifested in a higher percentage of word pairs that generally do not tend to appear together.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

co-occurrence

Appears in 8 sentences as: co-occurrence (9)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. Thus, co-occurrence of words in n-word windows, syntactic structures, sentences, paragraphs, and even whole documents is captured in vector-space models built from text corpora (Turney and Pan-tel, 2010; Basili and Pennacchiotti, 2010; Erk and Pado, 2008; Mitchell and Lapata, 2008; Bullinaria and Levy, 2007; Jones and Mewhort, 2007; Pado and Lapata, 2007; Lin, 1998; Landauer and Dumais, 1997; Lund and Burgess, 1996; Salton et al., 1975).
    Page 1, “Introduction”
  2. However, little is known about typical profiles of texts in terms of co-occurrence behavior of their words.
    Page 1, “Introduction”
  3. The cited approaches use topic models that are in turn estimated using word co-occurrence .
    Page 1, “Introduction”
  4. The first decision is how to quantify the extent of co-occurrence between two words; we will use point-wise mutual information (PMI) estimated from a large and diverse corpus of texts.
    Page 2, “Methodology”
  5. The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values.
    Page 2, “Methodology”
  6. To obtain comprehensive information about typical co-occurrence behavior of words of English, we build a first-order co-occurrence word-space model (Turney and Pantel, 2010; Baroni and Lenci, 2010).
    Page 2, “Methodology”
  7. The model was generated from a corpus of texts of about 2.5 billion words, counting co-occurrence in a paragraph,2 using no distance coefficients (Bullinaria and Levy, 2007).
    Page 2, “Methodology”
  8. 8It is also possible that some of the instances with very high PMI are pairs that contain low frequency words for which the database predicts a spuriously high PMI based on a single (and atypical) co-occurrence that happens to repeat in an essay — similar to the Schwartz eschews example in (Manning and Schiitze, 1999, Table 5.16, p. 181).
    Page 5, “Application to Essay Scoring”

See all papers in Proc. ACL 2013 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

regression model

Appears in 5 sentences as: regression model (4) regression models (1)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. From this set, pl-p6 were used for feature selection, data visualization, and estimation of the regression models (training), while sets p7-p9 were reserved for a blind test.
    Page 4, “Application to Essay Scoring”
  2. To evaluate the usefulness of WAP in improving automated scoring of essays, we estimate a linear regression model using the human score as a dependent variable (label) and e-rater score and the HAT as the two independent variables (features).
    Page 7, “Application to Essay Scoring”
  3. We estimate a regression model on each of setA-pi, i E {1, .., 6}, and evaluate them on each of setA-pj, j E {7, .., 9}, and compare the performance with that of e-rater alone on setA-pj.
    Page 7, “Application to Essay Scoring”
  4. For setB, we estimate the regression model on setB-pl and test on setB-p2, and vice versa.
    Page 7, “Application to Essay Scoring”
  5. 11We also performed a cross-validation test on setA p1-p6, where we estimated a regression model on setA-pi and evaluate it on setA-pj, for all i,j E {1, ..,6},i 7E j, and compared the performance with that of e-rater alone on setA-pj, yielding 30 different train-test combinations.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention regression model.

See all papers in Proc. ACL that mention regression model.

Back to top.

fine-grained

Appears in 4 sentences as: fine-grained (4)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. We chose a relatively fine-grained binning and performed no optimization for grid selection; for more sophisticated gridding approaches to study nonlinear relationships in the data, see Reshef et al.
    Page 2, “Methodology”
  2. This fine-grained scale resulted in higher mean pairwise inter-rater correlations than the traditional integer-only scale (r=0.79 vs around r=0.70 for the operational scoring).
    Page 5, “Application to Essay Scoring”
  3. This dataset provides a very fine-grained ranking of the essays, with almost no two essays getting exactly the same score.
    Page 5, “Application to Essay Scoring”
  4. This is a very competitive baseline, as e-rater features explain more than 70% of the variation in essay scores on a relatively coarse scale (setA) and more than 80% of the variation in scores on a fine-grained scale (setB).
    Page 6, “Application to Essay Scoring”

See all papers in Proc. ACL 2013 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

Significant improvements

Appears in 4 sentences as: significant improvement (1) Significant improvements (1) significantly improve (1) significantly improving (1)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. Finally, we report on an experiment where we significantly improve the performance of a very competitive, state-of-art system for automated scoring of essays, using a feature derived from WAP.
    Page 4, “Application to Essay Scoring”
  2. Significant improvements are underlined.
    Page 7, “Application to Essay Scoring”
  3. The results were similar to those of the blind test presented here, with e-rater+HAT significantly improving upon e-rater alone, using Wilcoxon test, W=374, n=29, p<0.05.
    Page 8, “Related Work”
  4. Finally, we demonstrated that the information provided by word association profiles leads to a significant improvement in a highly competitive, state-of-art essay scoring system that already measures various aspects of writing quality.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention Significant improvements.

See all papers in Proc. ACL that mention Significant improvements.

Back to top.

content word

Appears in 3 sentences as: content word (3)
In Word Association Profiles and their Use for Automated Scoring of Essays
  1. The second is which pairs of words in a text to consider when building a profile for the text; we opted for all pairs of content word types occurring in a text, irrespective of the distance between them.
    Page 2, “Methodology”
  2. Thus, the text “The dog barked and wagged its tail” is much tighter than the text “Green ideas sleep furiously”, with all the six content word pairs scoring above PMI=5.5 in the first and below PMI=2.2 in the second.4
    Page 2, “Methodology”
  3. Likewise, a feature that calculates the average PMI for all pairs of content word types in the text failed to produce an improvement over the baseline for setA pl-p6.
    Page 7, “Application to Essay Scoring”

See all papers in Proc. ACL 2013 that mention content word.

See all papers in Proc. ACL that mention content word.

Back to top.