Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System
Mrozinski, Joanna and Whittaker, Edward and Furui, Sadaoki

Article Structure


Question answering research has only recently started to spread from short factoid questions to more complex ones.


Automatic question answering (QA) is an alternative to traditional word-based search engines.


Question-answer pairs have previously been generated for example by asking workers to both ask a question and then answer it based on a given text (Verberne et al., 2006; Verberne et al., 2007).

Corpus description

The final corpus consists of questions with their rephrased versions and answers.

Discussion and future work

During the initial trials of data collection we encountered some unexpected phenomena.


In this paper we have described a dynamic and inexpensive method of collecting a corpus of questions and answers using the Amazon Mechanical Turk framework.


Mechanical Turk

Appears in 7 sentences as: Mechanical Turk (7)
In Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System
  1. We collected our corpus through Amazon Mechanical Turk service 4 (MTurk).
    Page 2, “Introduction”
  2. 2.1 Mechanical Turk
    Page 3, “Setup”
  3. Mechanical Turk is a Web-based service, offered by, Inc.
    Page 3, “Setup”
  4. The total cost of producing the corpus was about $350, consisting of $310 paid in workers rewards and $40 in Mechanical Turk fees, including all the trials conducted during the development of the final system.
    Page 6, “Corpus description”
  5. However, the Mechanical Turk framework provided additional information for each assignment, for example the time workers spent on the task.
    Page 6, “Corpus description”
  6. We believe that this unexpected result is due to the distributed nature of the worker pool in Mechanical Turk .
    Page 8, “Discussion and future work”
  7. In this paper we have described a dynamic and inexpensive method of collecting a corpus of questions and answers using the Amazon Mechanical Turk framework.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.