Index of papers in Proc. ACL that mention

edit distance

Seen in text as:

edit distance (138)
Edit Distance (13)
edit distances (11)
Edit distance (3)

Seen in 151 sentences in 21 papers.

1. A chance-corrected measure of inter-annotator agreement for syntax

Skjaerholt, Arne

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	In future work, we would like to investigate the use of other distance functions, in particular the use of approximate tree edit distance functions such as the pq-gram algorithm (Augsten et al., 2005).
Conclusion	For large data sets such as the PCEDT set used in this work, computing 04 with tree edit distance as the distance measure can take a very long time.8 This is due to the fact that 04 requires 0(n2) comparisons to be made, each of which is 0(n2) using our current approach.
Introduction	In this article we propose a family of chance-corrected measures of agreement, applicable to both dependency- and constituency-based syntactic annotation, based on Krippendorff’s 04 and tree edit distance .
Introduction	The idea of using edit distance as the basis for an inter-annotator agreement metric has previously been explored by Fournier (2013).
Introduction	However that work used a boundary edit distance as the basis of a metric for the task of text segmentation.
The metric	Instead, we propose to use an agreement measure based on Krippendorff’s a (Krippendorff, 1970; Krippendorff, 2004) and tree edit distance .
The metric	Instead, we base our work on tree edit distance .
The metric	The tree edit distance (TED) problem is defined analogously to the more familiar problem of string edit distance : what is the minimum number of edit operations required to transform one tree into the other?

edit distance is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Finding Cognate Groups Using Phylogenies

Hall, David and Klein, Dan

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances .
Experiments	When we did not use learning, we set the parameters of the edit distance to (0, -3, -4) for matches, substitutions, and deletions/insertions, respectively.
Experiments	Levenshtein refers to fixed parameter edit distance transducer.
Learning	practice, it is important to estimate better parametric edit distances cpg and survival variables 85.
Learning	However, a naively constructed edit distance , which for example might penalize vowel substitutions lightly, would fail to learn that Latin words that are borrowed into English would not undergo the sound change /I/—>/eI/.
Learning	For the transducers go, we learn parameterized edit distances that model the probabilities of different sound changes.
Model	In this paper, each (pg takes the form of a parameterized edit distance similar to the standard Levenshtein distance.
Transducers and Automata	In our model, it is not just the edit distances that are finite state machines.

edit distance is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

3. Evaluating Text Segmentation using Boundary Edit Distance

Fournier, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A New Proposal for Edit-Based Text Segmentation Evaluation	In this section, a new boundary edit distance based segmentation metric and confusion matrix is proposed to solve the deficiencies of S for both segmentation comparison and inter-coder agreement.
A New Proposal for Edit-Based Text Segmentation Evaluation	3.1 Boundary Edit Distance
Abstract	This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).
Introduction	using Boundary Edit Distance
Introduction	To overcome the flaws of existing text segmentation metrics, this work proposes a new series of metrics derived from an adaptation of boundary edit distance (Fournier and Inkpen, 2012, p. 154—156).
Introduction	In this work: §2 reviews existing segmentation metrics; §3 proposes an adaptation of boundary edit distance , a new normalization of it, a new confusion matrix for segmentation, and an inter-
Related Work	Instead of using windows, the work proposes a new restricted edit distance called boundary edit distance which differentiates between full and near misses.
Related Work	normalizes the counts of full and near misses identified by boundary edit distance, as shown in Equation 2, where 3a and 35 are the segmentations, nt is the maximum distance that boundaries may span to be considered a near miss, edits(sa, 35, nt) is the edit distance , and pb(D) is the number of potential boundaries in a document D (pb(D) = \|D\| — l).
Related Work	Boundary edit distance models full misses as the additiorfldeletion of a boundary, and near misses as n-wise transpositions.

edit distance is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

4. Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

Peng, Xingyuan and Ke, Dengfeng and Xu, Bo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	Here, we apply the edit distance to measure how best the path is.
Approach	This means the best path is the word sequence path in the FST which has the smallest edit distance compared with the to-be-scored transcription’s word sequences .
Approach	EDcost is the edit distance from the transcription to the paths which start at state 0 and end at the end

edit distance is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

edit distance (14)
Latent Semantic (3)

5. How to Speak a Language without Knowing It

Shi, Xing and Knight, Kevin and Ji, Heng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	0 We compute the normalized edit distance between the system’s output and a human-generated Chinglish reference.
Evaluation	We measure the normalized edit distance against an English reference.
Experiments	the test portion of our phrasebook, using edit distance .
Experiments	The average edit distance of phoneme-phrase model and that of hybrid training/decoding model are close, indicating that long phoneme-phrase pairs can emulate word-pinyin mappings.
Experiments	MOdeI Edit Distance Reference English 0.477 Phoneme based 0.696 Hybrid training and decoding 0.496

edit distance is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages

Yang, Fan and Zhao, Jun and Zou, Bo and Liu, Kang and Liu, Feifan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Statistical Transliteration Model	We use the algorithm like calculating the edit distance between two words.
Statistical Transliteration Model	The edit distance calculation finds the best matching with the least operation cost to change one word to another word by using deletion/addition/insertion operations on syllables.
Statistical Transliteration Model	But the complexity will be too high to afford if we calculate the edit distance between a query and each word in the list.

edit distance is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. Refinements to Interactive Translation Prediction Based on Search Graphs

Koehn, Philipp and Tsoukala, Chara and Saint-Amand, Herve

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	If the user prefix cannot be found in the search graph, approximate string matching is used by finding a path with minimal string edit distance , i.e., a path in the graph with the minimal number of insertions, deletions and substitutions to match the user prefix.
Properties of Core Algorithm	Cost is measured primarily in terms of string edit distance (number of deletions, insertions and substitutions), and secondary in terms of translation model score for the matched path in the graph.
Properties of Core Algorithm	ms 5 1015 20 25 30 35 40pX Figure 1: Average response time of baseline method based on length of the prefix and number of edits: The main bottleneck is the string edit distance between prefix and path.
Properties of Core Algorithm	the length of the user prefix and the string edit distance between the user prefix and the search graph.
Refinements	We attempt to find the last word in the predicted path either before or after the optimal matching position according to string edit distance .
Refinements	tion output is a better fallback than computing optimal string edit distance .
Refinements	Dissimilarity is measured as letter edit distance

edit distance is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. Entity Linking for Tweets

Liu, Xiaohua and Li, Yitong and Wu, Haocheng and Zhou, Ming and Wei, Furu and Lu, Yi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	It can be seen that: 1) using only Prior Probability feature already yields a reasonable F1; and 2) Context Similarity and Edit Distance Similarity feature have little contribution to the F1, while Mention and Entity Title Similarity feature greatly boosts the F1.
Experiments	denote Prior Probability, Context Similarity, Edit Distance Similarity, and Mention and Entity Title Similarity, respectively.
Introduction	More specifically, we define local features, including context similarity and edit distance , to model the similarity between a mention and an entity.
Introduction	Finally, we introduce a set of features to compute the similarity between mentions, including how similar the tweets containing the mentions are, whether they come from the tweets of the same account, and their edit distance .
Our Method	0 Edit Distance Similarity: If Length(mi)+ED(mi, 61-) = Length(ei), f3(mi,ei) = 1, otherwise 0.
Our Method	ED(-,-) computes the character level edit distance .
Our Method	“ms” is 2, and the edit distance between them is 7.

edit distance is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

9. Extracting Narrative Timelines as Temporal Dependency Structures

Kolomiyets, Oleksandr and Bethard, Steven and Moens, Marie-Francine

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
Evaluations	Tree Edit Distance In addition to the UAS and LAS the tree edit distance score has been recently introduced for evaluating dependency structures (Tsarfaty et al., 2011).
Evaluations	The tree edit distance score for a tree 7r is based on the following operations A E A : A = {DELETE, INSERT, RELABEL}:
Evaluations	Taking the shortest such sequence, the tree edit distance is calculated as the sum of the edit operation costs divided by the size of the tree (i.e.

edit distance is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

Han, Bo and Baldwin, Timothy

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Lexical normalisation	Second, IV words within a threshold TC character edit distance of the given OOV word are calculated, as is widely used in spell checkers.
Lexical normalisation	Third, the double metaphone algorithm (Philips, 2000) is used to decode the pronunciation of all IV words, and IV words within a threshold Tp edit distance of the given OOV word under phonemic transcription, are included in the confusion set; this allows us to capture OOV words such as earthquick “earthquake”.
Lexical normalisation	The recall for lexical edit distance with TC g 2 is moderately high, but it is unable to detect the correct candidate for about one quarter of words.

edit distance is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

11. Learning Phrase-Based Spelling Error Models from Clickthrough Data

Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Phrase-Based Error Model	The cost of assigning k to ai is equal to the Levenshtein edit distance (Levenshtein, 1966) between the ith word in Q and the kth word in C, and the cost of assigning O to ai is equal to the length of the ith word in Q.
Clickthrough Data and Spelling Correction	We then scored each query pair (Q1, Q2) using the edit distance between Q1 and Q2, and retained those with an edit distance score lower than a preset threshold as query correction pairs.
Related Work	Most traditional systems use a manually tuned similarity function (e.g., edit distance function) to rank the candidates, as reviewed by Kukich (1992).
The Baseline Speller System	Then we scan the query from left to right, and each query term q is looked up in lexicon to generate a list of spelling suggestions 0 whose edit distance from q is lower than a preset threshold.
The Baseline Speller System	The lexicon is stored using a trie-based data structure that allows efficient search for all terms within a maximum edit distance .
The Baseline Speller System	The error model (the first factor) is approximated by the edit distance function as

edit distance is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. A Fast and Accurate Method for Approximate String Search

Wang, Ziqi and Xu, Gu and Li, Hang and Zhang, Ming

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	The use of edit distance is a typical example, which exploits operations of character deletion, insertion and substitution.
Related Work	Some methods generate candidates within a fixed range of edit distance or different ranges for strings with different lengths (Li et al., 2006; Whitelaw et al., 2009).
Related Work	Other methods make use of weighted edit distance to enhance the representation power of edit distance (Ristad and Yianilos, 1998; Oncina and Sebban, 2005; McCal-lum et al., 2005; Ahmad and Kondrak, 2005).

edit distance is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Machine Translation as a Decipherment Task	Evaluation: All the MT systems are run on the Spanish test data and the quality of the resulting English translations are evaluated using two different measures—(1) Normalized edit distance score (Navarro, 2001),6 and (2) BLEU (Papineni et
Machine Translation as a Decipherment Task	6When computing edit distance , we account for substitutions, insertions, deletions as well as local-swap edit operations required to convert a given English string into the (gold) reference translation.
Machine Translation as a Decipherment Task	Results: Figure 3 compares the results of various MT systems (using parallel versus decipherment training) on the two test corpora in terms of edit distance scores (a lower score indicates closer match to the gold translation).

edit distance is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

14. Automatic Detection of Cognates Using Orthographic Alignment

Ciobanu, Alina Maria and Dinu, Liviu P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We employ several orthographic metrics widely used in this research area: the edit distance (Levenshtein, 1965), the longest common subsequence ratio (Melamed, 1995) and the XDice metric (Brew and McKelvie, l996)4.
Experiments	In addition, we use SpSim (Gomes and Lopes, 2011), which outperformed the longest common subsequence ratio and a similarity measure based on the edit distance in previous experiments.
Experiments	For the edit distance , we subtract the normalized value from 1 in order to obtain similarity.
Our Approach	Therefore, because the edit distance was widely used in this research area and produced good results, we are encouraged to employ orthographic alignment for identifying pairs of cognates, not only to compute similarity scores, as was previously done, but to use aligned subsequences as features for machine learning algorithms.
Related Work	(2013) proposed a method for cognate production relying on statistical character-based machine translation, learning orthographic production patterns, and Mulloni (2007) introduced an algorithm for cognate production based on edit distance alignment and the identification of orthographic cues when words enter a new language.

edit distance is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

15. A Statistical NLG Framework for Aggregated Planning and Realization

Kondadadi, Ravi and Howald, Blake and Schilder, Frank

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	We then rank templates according to the Levenshtein edit distance (Levenshtein, 1966) from the template corresponding to the current sentence in the training document (using the top 10 ranked templates in training for ease of processing effort).
Methodology	We obtained better results with edit distance .
Methodology	o Similarity between the most likely template in CuId and current template: Edit distance between the current template and the most likely template for the current CuId.

edit distance is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Detecting Retries of Voice Search Queries

Levitan, Rivka and Elson, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	We calculate the edit distance between the two transcripts at the character and word level, as well as the two most similar phonetic rewrites.
Prediction task	Of the similarity features, the ones that contributed significantly in the final model were character edit distance (normalized) and phoneme edit distance (raw and normalized); as expected, retries are associated with more similar query pairs.
Prediction task	T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries.
Prediction task	Most notably, all edit distance features are significantly greater for rephrases.

edit distance is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

17. Learning Bilingual Lexicons from Monolingual Corpora

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	The second method is to heuristically induce, where applicable, a seed lexicon using edit distance , as is done in Koehn and Knight (2002).
Experimental Setup	Where applicable, we compare against the EDITDIST baseline, which solves a maximum bipartite matching problem where edge weights are normalized edit distances .
Features	One direct way to capture orthographic similarity between word pairs is edit distance .
Features	Note that MCCA can learn regular orthographic correspondences between source and target words, which is something edit distance cannot capture (see table 5).

edit distance is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. Unsupervised Transcription of Historical Documents

Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Both these metrics are based on edit distance .
Experiments	CER is the edit distance between the predicted and gold transcriptions of the document, divided by the number of characters in the gold transcription.
Experiments	WER is the word-level edit distance (words, instead of characters, are treated as tokens) between predicted and gold transcriptions, divided by the number of words in the gold transcription.

edit distance is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Adaptive Parser-Centric Text Normalization

Zhang, Congle and Baldwin, Tyler and Ho, Howard and Kimelfeld, Benny and Li, Yunyao

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Instantiation	Generator From To leave intact good good edit distance bac back lowercase NEED need capitalize it It Google spell disspaear disappear contraction wouldn’t would not slang language ima I am going to insert punctuation e .
Model	obtain a truth assignment 05°” from each yzgo'd by selecting an assignment 04 that minimizes the edit distance between ngId and the normalized
Model	Here, y(a) denotes the normalized text implied by a, and DIST is a token-level edit distance .

edit distance is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. Hybrid Simplification using Deep Semantics and Machine Translation

Narayan, Shashi and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Number of Edits Table 4 indicates the edit distance of the output sentences w.r.t.
Experiments	In sum, the automatic metrics indicate that our system produces simplification that are consistently closest to the reference in terms of edit distance , number of splits and BLEU score.
Simplification Framework	The governing criteria for the construction of the training graph is that, at each step, it tries to minimize the Leven-shtein edit distance between the complex and the simple sentences.

edit distance is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Reinforcement Learning for Mapping Instructions to Actions

Branavan, S.R.K. and Chen, Harr and Zettlemoyer, Luke and Barzilay, Regina

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Applying the Model	edit distance between 7.0 E W and object labels in s
Applying the Model	All features are binary, except for the normalized edit distance which is real-valued.
Applying the Model	6We assume that a word maps to an environment object if the edit distance between the word and the object’s name is below a threshold value.

edit distance is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: