You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement
Elsner, Micha and Charniak, Eugene

Article Structure

Abstract

When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately.

Motivation

Simultaneous conversations seem to arise naturally in both informal social interactions and multiparty typed chat.

Related Work

Two threads of research are direct attempts to solve the disentanglement problem: Aoki et al.

Dataset

Our dataset is recorded from the IRC (Internet Relay Chat) channel ##LINUX at freenode.net, using the freely-available gaim client.

Model

Our model for disentanglement fits into the general class of graph partitioning algorithms (Roth and Yih, 2004) which have been used for a variety of tasks in NLP, including the related task of meeting segmentation (Malioutov and Barzilay, 2006).

We discard the 50 most frequent words entirely.

6“Introduction to Linux: A Hands-on Guide”.

Future Work

Although our annotators are reasonably reliable, it seems clear that they think of conversations as a hierarchy, with digressions and schisms.

Conclusion

This work provides a corpus of annotated data for chat disentanglement, which, along with our proposed metrics, should allow future researchers to evaluate and compare their results quantitatively9.

Topics

feature set

Appears in 4 sentences as: feature set (4)
In You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement
  1. They motivate a richer feature set , which, however, does not yet appear to be implemented.
    Page 2, “Related Work”
  2. (2005) adds word repetition to their feature set .
    Page 2, “Related Work”
  3. Our feature set incorporates information which has proven useful in meeting segmentation (Galley et al., 2003) and the task of detecting addressees of a specific utterance in a meeting (J ovanovic et al., 2006).
    Page 3, “Related Work”
  4. We are also interested to see how well this feature set performs on speech data, as in (Aoki et al., 2003).
    Page 8, “Future Work”

See all papers in Proc. ACL 2008 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

human annotations

Appears in 4 sentences as: human annotations (3) human annotator (1)
In You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement
  1. It partitions a chat transcript into distinct conversations, and its output is highly correlated with human annotations .
    Page 2, “Motivation”
  2. This places it within the bounds of our human annotations (see table 1), toward the more general end of the spectrum.
    Page 7, “We discard the 50 most frequent words entirely.”
  3. The range of human variation is quite wide, and there are annotators who are closer to baselines than to any other human annotator .
    Page 7, “We discard the 50 most frequent words entirely.”
  4. As explained earlier, this is because some human annotations are much more specific than others.
    Page 7, “We discard the 50 most frequent words entirely.”

See all papers in Proc. ACL 2008 that mention human annotations.

See all papers in Proc. ACL that mention human annotations.

Back to top.