A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
Tomanek, Katrin and Hahn, Udo and Lohmann, Steffen and Ziegler, Jürgen

Article Structure

Introduction

Today’s NLP systems, in particular those relying on supervised ML approaches, are meta-data greedy.

Experimental Design

In our study, we applied, for the first time ever to the best of our knowledge, eye-tracking to study the cognitive processes underlying the annotation of linguistic meta-data, named entities in particular.

Results

We used a mixed-design analysis of variance (ANOVA) model to test the hypotheses, with the context condition as between-subjects factor and the two complexity classes as within-subject factors.

Cognitively Grounded Cost Modeling

We now discuss whether the findings on dependent variables from our eye-tracking study are fruitful for actually modeling annotation costs.

Summary and Conclusions

In this paper, we explored the use of eye-tracking technology to investigate the behavior of human annotators during the assignment of three types of named entities — persons, organizations and locations — based on the eye-mind assumption.

Topics

named entity

Appears in 11 sentences as: named entities (5) named entity (6)
In A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
  1. We have seen large-scale efforts in syntactic and semantic annotations in the past related to P08 tagging and parsing, on the one hand, and named entities and relations (propositions), on the other hand.
    Page 1, “Introduction”
  2. In our study, we applied, for the first time ever to the best of our knowledge, eye-tracking to study the cognitive processes underlying the annotation of linguistic meta-data, named entities in particular.
    Page 2, “Experimental Design”
  3. These articles come already annotated with three types of named entities considered important in the newspaper domain, viz.
    Page 2, “Experimental Design”
  4. An example consists of a text document having one single annotation phrase highlighted which then had to be semantically annotated with respect to named entity mentions.
    Page 2, “Experimental Design”
  5. We also discarded all CNPs that did not contain at least one entity-critical word, i.e., one which might be a named entity according to its orthographic appearance (e.g., starting with an uppercase letter).
    Page 3, “Experimental Design”
  6. It should be noted that such orthographic signals are by no means a sufficient condition for the presence of a named entity mention within a CNP.
    Page 3, “Experimental Design”
  7. The choice of CNPs as stimulus phrases is motivated by the fact that named entities are usually fully encoded by this kind of linguistic structure.
    Page 3, “Experimental Design”
  8. In this case, words were searched in the upper context, which according to their orthographic signals might refer to a named entity but which could not completely be resolved only relying on the information given by the annotation phrase itself and its embedding sentence.
    Page 6, “Results”
  9. These time tags indicate the time it took to annotate the respective phrase for named entity mentions of the types person, location, and organization.
    Page 8, “Cognitively Grounded Cost Modeling”
  10. In this paper, we explored the use of eye-tracking technology to investigate the behavior of human annotators during the assignment of three types of named entities — persons, organizations and locations — based on the eye-mind assumption.
    Page 8, “Summary and Conclusions”
  11. of complex cognitive tasks (such as gaze duration and gaze movements for named entity annotation) to explanatory models (in our case, a time-based cost model for annotation) follows a much warranted avenue in research in NLP where feature farming becomes a theory-driven, explanatory process rather than a much deplored theory-blind engineering activity (cf.
    Page 9, “Summary and Conclusions”

See all papers in Proc. ACL 2010 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

entity types

Appears in 7 sentences as: entity type (1) entity types (6)
In A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
  1. In this task, a human annotator has to decide for each word whether or not it belongs to one of the entity types of interest.
    Page 2, “Experimental Design”
  2. Annotation of these entity types in newspaper articles is admittedly fairly easy.
    Page 2, “Experimental Design”
  3. Moreover, the limited number of entity types reduced the amount of participants’ training prior to the actual experiment, and positively affected the design and handling of the experimental apparatus (see below).
    Page 2, “Experimental Design”
  4. The annotation task was defined such that the correct entity type had to be assigned to each word in the annotation phrase.
    Page 2, “Experimental Design”
  5. If a word belongs to none of the three entity types a fourth class called “no entity” had to be assigned.
    Page 2, “Experimental Design”
  6. After manual fine-tuning of the example set assuring an even distribution of entity types and syntactic correctness of the automatically derived annotation phrases, we finally selected 80 annotation examples for the experiment.
    Page 4, “Experimental Design”
  7. The participants used a standard keyboard to assign the entity types for each word of the annotation example.
    Page 4, “Experimental Design”

See all papers in Proc. ACL 2010 that mention entity types.

See all papers in Proc. ACL that mention entity types.

Back to top.

entity mentions

Appears in 3 sentences as: entity mention (1) entity mentions (2)
In A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
  1. An example consists of a text document having one single annotation phrase highlighted which then had to be semantically annotated with respect to named entity mentions .
    Page 2, “Experimental Design”
  2. It should be noted that such orthographic signals are by no means a sufficient condition for the presence of a named entity mention within a CNP.
    Page 3, “Experimental Design”
  3. These time tags indicate the time it took to annotate the respective phrase for named entity mentions of the types person, location, and organization.
    Page 8, “Cognitively Grounded Cost Modeling”

See all papers in Proc. ACL 2010 that mention entity mentions.

See all papers in Proc. ACL that mention entity mentions.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (3) parse trees (1)
In A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
  1. Structural complexity emerges, e. g., from the static topology of phrase structure trees and procedural graph traversals exploiting the topology of parse trees (see Szmrecsanyi (2004) or Cheung and Kemper (1992) for a survey of metrics of this type).
    Page 2, “Introduction”
  2. We defined two measures for the complexity of the annotation examples: The syntactic complexity was given by the number of nodes in the constituent parse tree which are dominated by the annotation phrase (Szmrecsanyi, 2004).1 According to a threshold on the number of nodes in such a parse tree , we classified CNPs as having either high or low syntactic complexity.
    Page 3, “Experimental Design”
  3. As for syntactic complexity, we use two measures based on structural complexity including (a) the number of nodes of a constituency parse tree which are dominated by the annotation phrase (cf.
    Page 7, “Cognitively Grounded Cost Modeling”

See all papers in Proc. ACL 2010 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

regression model

Appears in 3 sentences as: regression model (2) regression models (1)
In A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
  1. Therefore, we learn a linear regression model with time (an operationalization of annotation costs) as the dependent variable.
    Page 7, “Cognitively Grounded Cost Modeling”
  2. We learned a simple linear regression model with the annotation time as dependent variable and the features described above as independent variables.
    Page 8, “Cognitively Grounded Cost Modeling”
  3. This optimization may include both exploration of additional features (such as domain-specific ones) as well as experimentation with other, presumably nonlinear, regression models .
    Page 9, “Summary and Conclusions”

See all papers in Proc. ACL 2010 that mention regression model.

See all papers in Proc. ACL that mention regression model.

Back to top.