Comparing Objective and Subjective Measures of Usability in a Human-Robot Dialogue System
Foster, Mary Ellen and Giuliani, Manuel and Knoll, Alois

Article Structure

Abstract

We present a human-robot dialogue system that enables a robot to work together with a human user to build wooden construction toys.

Introduction

Evaluating the usability of a spoken language dialogue system generally requires a large-scale user study, which can be a time-consuming process both for the experimenters and for the experimental subjects.

Task-Based Human-Robot Dialogue

This study makes use of the JAST human-robot dialogue system (Rickert et al., 2007) which supports multimodal human-robot collaboration on a joint construction task.

Experiment Design

The human-robot system was evaluated via a user study in which subjects interacted with the complete system; all interactions were in German.

Results

In this section, we present the results of each of the individual dependent measures; in the following section, we examine the relationship among the different types of measures.

Building Performance Functions

In the preceding section, we presented results on a number of objective and subjective measures, and also examined the correlation among measures of the same type.

Discussion

That the factors included in Table 6 were the most significant contributors to user satisfaction is not surprising.

Conclusions and Future Work

We have presented the JAST human-robot dialogue system and described a user study in which the system instructed users to build a series of target objects out of wooden construction toys.

Topics

linear regression

Appears in 5 sentences as: linear regression (5)
In Comparing Objective and Subjective Measures of Usability in a Human-Robot Dialogue System
  1. PARADISE uses stepwise multiple linear regression to model user satisfaction based on measures representing the performance dimensions of task success, dialogue quality, and dialogue efficiency, and has been applied to a wide range of systems (e. g., Walker et al., 2000; Litman and Pan, 2002; Moller et al., 2008).
    Page 1, “Introduction”
  2. The PARADISE model uses stepwise multiple linear regression to predict subjective user satisfaction based on measures representing the performance dimensions of task success, dialogue quality, and dialogue efficiency, resulting in a predictor function of the following form:
    Page 6, “Building Performance Functions”
  3. Stepwise linear regression produces coefficients (wi) describing the relative contribution of each predictor to the user satisfaction.
    Page 6, “Building Performance Functions”
  4. Using stepwise linear regression , we computed a predictor function for each of the subj ective measures that we gathered during our study: the mean score for each of the individual user-satisfaction categories (Table 4), the mean score across the whole questionnaire (the last line of Table 4), as well as the difference between the users’ emotional states before and after the study (the last line of Table 5).
    Page 6, “Building Performance Functions”
  5. (2008) for linear regression models similar to those presented here were between 0.22 and 0.57.
    Page 7, “Discussion”

See all papers in Proc. ACL 2009 that mention linear regression.

See all papers in Proc. ACL that mention linear regression.

Back to top.

human judgements

Appears in 3 sentences as: human judgements (2) human judges (1)
In Comparing Objective and Subjective Measures of Usability in a Human-Robot Dialogue System
  1. When employing any such metric, it is crucial to verify that the predictions of the automated evaluation process agree with human judgements of the important aspects of the system output.
    Page 1, “Introduction”
  2. counter-examples to the claim that BLEU agrees with human judgements .
    Page 2, “Introduction”
  3. Also, Foster (2008) examined a range of automated metrics for evaluation generated multimodal output and found that few agreed with the preferences expressed by human judges .
    Page 2, “Introduction”

See all papers in Proc. ACL 2009 that mention human judgements.

See all papers in Proc. ACL that mention human judgements.

Back to top.