A New Proposal for Edit-Based Text Segmentation Evaluation | In this section, a new boundary edit distance based segmentation metric and confusion matrix is proposed to solve the deficiencies of S for both segmentation comparison and inter-coder agreement. |
A New Proposal for Edit-Based Text Segmentation Evaluation | 3.1 Boundary Edit Distance |
Abstract | This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). |
Introduction | using Boundary Edit Distance |
Introduction | To overcome the flaws of existing text segmentation metrics, this work proposes a new series of metrics derived from an adaptation of boundary edit distance (Fournier and Inkpen, 2012, p. 154—156). |
Introduction | In this work: §2 reviews existing segmentation metrics; §3 proposes an adaptation of boundary edit distance , a new normalization of it, a new confusion matrix for segmentation, and an inter- |
Related Work | Instead of using windows, the work proposes a new restricted edit distance called boundary edit distance which differentiates between full and near misses. |
Related Work | normalizes the counts of full and near misses identified by boundary edit distance, as shown in Equation 2, where 3a and 35 are the segmentations, nt is the maximum distance that boundaries may span to be considered a near miss, edits(sa, 35, nt) is the edit distance , and pb(D) is the number of potential boundaries in a document D (pb(D) = |D| — l). |
Related Work | Boundary edit distance models full misses as the additiorfldeletion of a boundary, and near misses as n-wise transpositions. |
Experiments | It can be seen that: 1) using only Prior Probability feature already yields a reasonable F1; and 2) Context Similarity and Edit Distance Similarity feature have little contribution to the F1, while Mention and Entity Title Similarity feature greatly boosts the F1. |
Experiments | denote Prior Probability, Context Similarity, Edit Distance Similarity, and Mention and Entity Title Similarity, respectively. |
Introduction | More specifically, we define local features, including context similarity and edit distance , to model the similarity between a mention and an entity. |
Introduction | Finally, we introduce a set of features to compute the similarity between mentions, including how similar the tweets containing the mentions are, whether they come from the tweets of the same account, and their edit distance . |
Our Method | 0 Edit Distance Similarity: If Length(mi)+ED(mi, 61-) = Length(ei), f3(mi,ei) = 1, otherwise 0. |
Our Method | ED(-,-) computes the character level edit distance . |
Our Method | “ms” is 2, and the edit distance between them is 7. |
Methodology | We then rank templates according to the Levenshtein edit distance (Levenshtein, 1966) from the template corresponding to the current sentence in the training document (using the top 10 ranked templates in training for ease of processing effort). |
Methodology | We obtained better results with edit distance . |
Methodology | o Similarity between the most likely template in CuId and current template: Edit distance between the current template and the most likely template for the current CuId. |
Experiments | Both these metrics are based on edit distance . |
Experiments | CER is the edit distance between the predicted and gold transcriptions of the document, divided by the number of characters in the gold transcription. |
Experiments | WER is the word-level edit distance (words, instead of characters, are treated as tokens) between predicted and gold transcriptions, divided by the number of words in the gold transcription. |
Instantiation | Generator From To leave intact good good edit distance bac back lowercase NEED need capitalize it It Google spell disspaear disappear contraction wouldn’t would not slang language ima I am going to insert punctuation e . |
Model | obtain a truth assignment 05°” from each yzgo'd by selecting an assignment 04 that minimizes the edit distance between ngId and the normalized |
Model | Here, y(a) denotes the normalized text implied by a, and DIST is a token-level edit distance . |