Extracting Social Networks from Literary Fiction
Elson, David and Dames, Nicholas and McKeown, Kathleen

Article Structure

Abstract

We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials.

Introduction

Literary studies about the nineteenth-century British novel are often concerned with the nature of the community that surrounds the protagonist.

Related Work

Computer-assisted literary analysis has typically occurred at the word level.

Hypotheses

It is commonly held that the novel is a literary form which tries to produce an accurate representation of the social world.

Extracting Conversational Networks from Literature

In order to test these hypotheses, we developed a novel approach to extracting social networks from literary texts themselves, building on existing analysis tools.

Data Analysis

5.1 Feature extraction

Literary Interpretation of Results

Our data, therefore, markedly do not confirm hypothesis #1.

Conclusion

In this paper, we presented a method for characterizing a text of literary fiction by extracting the network of social conversations that occur between its characters.

Acknowledgments

This material is based on research supported in part by the US.

Topics

named entity

Appears in 8 sentences as: named entities (3) named entity (6)
In Extracting Social Networks from Literary Fiction
  1. In a conversational network, vertices represent characters (assumed to be named entities ) and edges indicate at least one instance of dialogue interaction between two characters over the course of the novel.
    Page 4, “Extracting Conversational Networks from Literature”
  2. For each named entity , we generate variations on the name that we would expect to see in a coreferent.
    Page 5, “Extracting Conversational Networks from Literature”
  3. For each named entity, we compile a list of other named entities that may be coreferents, either because they are identical or because one is an expected variation on the other.
    Page 5, “Extracting Conversational Networks from Literature”
  4. We then match each named entity to the most recent of its possible coreferents.
    Page 5, “Extracting Conversational Networks from Literature”
  5. We found that a network that included incidental or single-mention named entities became too noisy to function effectively, so we filtered out the entities that are mentioned fewer than three
    Page 5, “Extracting Conversational Networks from Literature”
  6. times in the novel or are responsible for less than 1% of the named entity mentions in the novel.
    Page 6, “Extracting Conversational Networks from Literature”
  7. An example network, automatically constructed in this manner from Jane Austen’s Mansfield Park, is shown in Figure l. The width of each vertex is drawn to be proportional to the character’s share of all the named entity mentions in the book (so that protagonists, who are mentioned frequently, appear in larger ovals).
    Page 6, “Extracting Conversational Networks from Literature”
  8. Not surprisingly, the most oft-repeated named entity in the text is I , referring to the narrator.
    Page 8, “Data Analysis”

See all papers in Proc. ACL 2010 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

edge weights

Appears in 7 sentences as: edge weight (2) edge weights (4) edge’s weight (1)
In Extracting Social Networks from Literary Fiction
  1. We then construct a network where characters are vertices and edges signify an amount of bilateral conversation between those characters, with edge weights corresponding to the frequency and length of their exchanges.
    Page 1, “Introduction”
  2. When such an adjacency is found, the length of the quote is added to the edge weight , under the hypothesis that the significance of the relationship between two individuals is proportional to the length of the dialogue that they exchange.
    Page 6, “Extracting Conversational Networks from Literature”
  3. Finally, we normalized each edge’s weight by the length of the novel.
    Page 6, “Extracting Conversational Networks from Literature”
  4. These coefficients are used for the edge weights .
    Page 6, “Extracting Conversational Networks from Literature”
  5. Characters that tend to appear together in the same areas of the novel are taken to be more socially connected, and have a higher edge weight .
    Page 6, “Extracting Conversational Networks from Literature”
  6. These counts, normalized by the length of the text, are used as edge weights .
    Page 6, “Extracting Conversational Networks from Literature”
  7. To calculate precision and recall for the two baseline social networks, we set a threshold 75 to derive a binary prediction from the continuous edge weights .
    Page 7, “Extracting Conversational Networks from Literature”

See all papers in Proc. ACL 2010 that mention edge weights.

See all papers in Proc. ACL that mention edge weights.

Back to top.

coreferents

Appears in 6 sentences as: coreferent (1) Coreferents (1) coreferents (4)
In Extracting Social Networks from Literary Fiction
  1. We then clustered the noun phrases into coreferents for the same entity (person or organization).
    Page 5, “Extracting Conversational Networks from Literature”
  2. For each named entity, we generate variations on the name that we would expect to see in a coreferent .
    Page 5, “Extracting Conversational Networks from Literature”
  3. For each named entity, we compile a list of other named entities that may be coreferents , either because they are identical or because one is an expected variation on the other.
    Page 5, “Extracting Conversational Networks from Literature”
  4. We then match each named entity to the most recent of its possible coreferents .
    Page 5, “Extracting Conversational Networks from Literature”
  5. Coreferents for the same name (such as Mr. Darcy and Darcy) were grouped into the same vertex.
    Page 5, “Extracting Conversational Networks from Literature”
  6. There were several reasons that we did not detect the missing links, including indirect speech, quotes attributed to anaphoras or coreferents , and “diffuse” conversations in which the characters do not speak in turn with one another.
    Page 7, “Extracting Conversational Networks from Literature”

See all papers in Proc. ACL 2010 that mention coreferents.

See all papers in Proc. ACL that mention coreferents.

Back to top.

F-measure

Appears in 3 sentences as: F-measure (3)
In Extracting Social Networks from Literary Fiction
  1. Table 2: Precision, recall, and F-measure of three methods for detecting bilateral conversations in literary texts.
    Page 7, “Extracting Conversational Networks from Literature”
  2. The precision and recall values shown for the baselines in Table 2 represent the highest performance we achieved by varying t between 0 and 1 (maximizing F-measure over 25).
    Page 7, “Extracting Conversational Networks from Literature”
  3. Both baselines performed significantly worse in precision and F-measure than our quoted speech adjacency method for detecting conversations.
    Page 7, “Extracting Conversational Networks from Literature”

See all papers in Proc. ACL 2010 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

precision and recall

Appears in 3 sentences as: precision and recall (3)
In Extracting Social Networks from Literary Fiction
  1. The precision and recall of our method for detecting conversations is shown in Table 2.
    Page 7, “Extracting Conversational Networks from Literature”
  2. To calculate precision and recall for the two baseline social networks, we set a threshold 75 to derive a binary prediction from the continuous edge weights.
    Page 7, “Extracting Conversational Networks from Literature”
  3. The precision and recall values shown for the baselines in Table 2 represent the highest performance we achieved by varying t between 0 and 1 (maximizing F-measure over 25).
    Page 7, “Extracting Conversational Networks from Literature”

See all papers in Proc. ACL 2010 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.