Evaluating Text Segmentation using Boundary Edit Distance
Fournier, Chris

Article Structure

Abstract

This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).

Introduction

Text segmentation is the task of splitting text into segments by placing boundaries within it.

Related Work

2.1 Segmentation Evaluation

A New Proposal for Edit-Based Text Segmentation Evaluation

In this section, a new boundary edit distance based segmentation metric and confusion matrix is proposed to solve the deficiencies of S for both segmentation comparison and inter-coder agreement.

Discussion of Segmentation Metrics

Before analysing how each metric compares to each other upon a large data set, it would be useful to investigate how they act on a smaller scale.

Segmentation Agreement

Having a bit more confidence in B compared to S and WD on a small scale from the previous sec-

Evaluation of Automatic Segmenters

Having looked at how S, WD, and B perform at a small scale in §4 and on larger data set in §5, this section demonstrates the use of these metrics to evaluate some automatic segmenters.

Conclusions

In this work, a new segmentation evaluation metric, referred to as boundary similarity (B) is proposed as an unbiased metric, along with a boundary-edit-distance-based (BED-based) confusion matrix to compute predictably biased IR metrics such as precision and recall.

Future Work

Future work includes adapting this work to analyse hierarchical segmentations and using it to attempt to explain the low inter-coder agreement coefficients reported in topical segmentation tasks.

Topics

segmentations

Appears in 47 sentences as: Segmentations (1) segmentations (33) segmenters (19) segmenter’s (2)
In Evaluating Text Segmentation using Boundary Edit Distance
  1. Existing segmentation metrics such as Pk, WindowD-iff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries.
    Page 1, “Abstract”
  2. A variety of segmentation granularities, or atomic units, exist, including segmentations at the morpheme (e.g., Sirts and Alum'ae 2012), word (e.g., Chang et al.
    Page 1, “Introduction”
  3. Segmentations can also represent the structure of text as being organized linearly (e.g., Hearst 1997), hierarchically (e.g., Eisenstein 2009), etc.
    Page 1, “Introduction”
  4. Theoretically, segmentations could also contain varying bound-
    Page 1, “Introduction”
  5. Because of its value to natural language processing, various text segmentation tasks have been automated such as topical segmentation—for which a variety of automatic segmenters exist (e.g., Hearst 1997, Malioutov and Barzilay 2006, Eisenstein and Barzilay 2008, and Kazantseva and Szpakowicz 2011).
    Page 1, “Introduction”
  6. Each of these metrics have a variety of flaws: Pk, and WindowD-iff both under-penalize errors at the beginning of segmentations (Lamprier et al., 2007) and have a bias towards favouring segmentations with few or tightly-clustered boundaries (Niekrasz and Moore, 2010), while S produces overly optimistic values due to its normalization (shown later).
    Page 1, “Introduction”
  7. coder agreement coefficient adaptation; §4 compares existing segmentation metrics to those proposed herein; §5 evaluates S and B based inter-coder agreement; and §6 compares B, S, and WD while evaluating automatic segmenters .
    Page 2, “Introduction”
  8. Many early studies evaluated automatic segmenters using information retrieval (IR) metrics such as precision, recall, etc.
    Page 2, “Related Work”
  9. To attempt to overcome this issue, both Passonneau and Litman (1993) and Hearst (1993) conflated multiple manual segmentations into one that contained only those boundaries which the majority of coders agreed upon.
    Page 2, “Related Work”
  10. IR metrics were then used to compare automatic segmenters to this majority solution.
    Page 2, “Related Work”
  11. To address the issue of awarding partial credit for an automatic segmenter nearly missing a boundary—without conflating segmentations , Beeferman and Berger (1999, pp.
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.

edit distance

Appears in 18 sentences as: Edit Distance (5) edit distance (17)
In Evaluating Text Segmentation using Boundary Edit Distance
  1. This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).
    Page 1, “Abstract”
  2. using Boundary Edit Distance
    Page 1, “Introduction”
  3. To overcome the flaws of existing text segmentation metrics, this work proposes a new series of metrics derived from an adaptation of boundary edit distance (Fournier and Inkpen, 2012, p. 154—156).
    Page 1, “Introduction”
  4. In this work: §2 reviews existing segmentation metrics; §3 proposes an adaptation of boundary edit distance , a new normalization of it, a new confusion matrix for segmentation, and an inter-
    Page 1, “Introduction”
  5. 1An implementation of boundary edit distance , boundary similarity, B-precision, and B-recall, etc.
    Page 1, “Introduction”
  6. Instead of using windows, the work proposes a new restricted edit distance called boundary edit distance which differentiates between full and near misses.
    Page 2, “Related Work”
  7. normalizes the counts of full and near misses identified by boundary edit distance, as shown in Equation 2, where 3a and 35 are the segmentations, nt is the maximum distance that boundaries may span to be considered a near miss, edits(sa, 35, nt) is the edit distance , and pb(D) is the number of potential boundaries in a document D (pb(D) = |D| — l).
    Page 3, “Related Work”
  8. Boundary edit distance models full misses as the additiorfldeletion of a boundary, and near misses as n-wise transpositions.
    Page 3, “Related Work”
  9. The usage of an edit distance that supported transpositions to compare segmentations was an advancement over window-based methods, but boundary edit distance and its normalization S are not without problems, specifically: i) This edit distance uses string reversals (ABCD => DCBA) to perform transpositions, making it cumbersome to analyse individual pairs of boundaries between segmentations; ii) S is sensitive to variations in the total size of a segmentation, leading it to favour very sparse segmentations with few boundaries; iii) S produces cosmetically high values, making it difficult to interpret and causing overestimation of inter-coder agreement.
    Page 3, “Related Work”
  10. In this section, a new boundary edit distance based segmentation metric and confusion matrix is proposed to solve the deficiencies of S for both segmentation comparison and inter-coder agreement.
    Page 3, “A New Proposal for Edit-Based Text Segmentation Evaluation”
  11. 3.1 Boundary Edit Distance
    Page 4, “A New Proposal for Edit-Based Text Segmentation Evaluation”

See all papers in Proc. ACL 2013 that mention edit distance.

See all papers in Proc. ACL that mention edit distance.

Back to top.

evaluation metric

Appears in 5 sentences as: evaluation metric (4) evaluation metrics (1)
In Evaluating Text Segmentation using Boundary Edit Distance
  1. This work proposes a new segmentation evaluation metric , named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).
    Page 1, “Abstract”
  2. To select an automatic segmenter for a particular task, a variety of segmentation evaluation metrics have been proposed, including Pk, (Beeferman and Berger, 1999, pp.
    Page 1, “Introduction”
  3. An ideal segmentation evaluation metric should, in theory, place the three automatic segmenters between the upper and lower bounds in terms of performance if the metrics, and the segmenters, function properly.
    Page 8, “Evaluation of Automatic Segmenters”
  4. In this work, a new segmentation evaluation metric , referred to as boundary similarity (B) is proposed as an unbiased metric, along with a boundary-edit-distance-based (BED-based) confusion matrix to compute predictably biased IR metrics such as precision and recall.
    Page 9, “Conclusions”
  5. B also allows for an intuitive comparison of boundary pairs between segmentations, as opposed to the window counts of WD or the simplistic edit count normalization of S. When an unbiased segmentation evaluation metric is desired, this work recommends the usage of B and the use of an upper and lower bound to provide context.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention evaluation metric.

See all papers in Proc. ACL that mention evaluation metric.

Back to top.