SciSurf: Index of 'A Comparison of Techniques to Automatically Identify Complex Words.'

A Comparison of Techniques to Automatically Identify Complex Words.

Shardlow, Matthew

Published in Proc. ACL, 2013

Article Structure

Abstract

Identifying complex words (CWs) is an important, yet often overlooked, task within lexical simplification (The process of automatically replacing CWs with simpler alternatives).

Introduction

Complex Word (CW) identification is an important task at the first stage of lexical simplification and errors introduced or avoided here will affect final results.

Experimental Design

Several systems for detecting CWs were implemented and evaluated using the CW corpus.

Results

The results of the experiments in identifying CWs are shown in Figure l and the values are given in Table 3.

Discussion

It is clear from these results that there is a fairly high accuracy from all the methods.

Related Work

This research will be used for lexical simplification.

Future Work

This work is intended as an initial study of methods for identifying CWs for simplification.

Conclusion

This paper has provided an insight into the challenges associated with evaluating the identification of CWs.

Topics

SVM

Appears in 19 sentences as: SVM (18) ‘SVM’ (1)

In A Comparison of Techniques to Automatically Identify Complex Words.

Support vector machines ( SVM ) are statistical classifiers which use labelled training data to predict the class of unseen inputs.
Page 3, “Experimental Design”
The training data consist of several features which the SVM uses to distinguish between classes.
Page 3, “Experimental Design”
The SVM was chosen as it has been used elsewhere for similar tasks (Gasperin et al., 2009; Hancke et al., 2012; J auhar and Specia, 2012).
Page 3, “Experimental Design”
One further advantage is that the features of an SVM can be analysed to determine their effect on the classification.
Page 4, “Experimental Design”
The SVM was trained using the LIBSVM package (Chang and Lin, 2011) in Matlab.
Page 4, “Experimental Design”
To implement the SVM a set of features was determined for the classification scheme.
Page 4, “Experimental Design”
Everything Thresholding SVM
Page 4, “Results”
To analyse the features of the SVM , the correlation coefficient between each feature vector and the vector of feature labels was calculated.
Page 4, “Results”
Whilst the thresholding and simplify everything methods were not significantly different from each other, the SVM method was significantly different from the other two (p < 0.001).
Page 5, “Discussion”
This can be seen in the slightly lower recall, yet higher precision attained by the SVM .
Page 5, “Discussion”
This indicates that the SVM was better at distinguishing between complex and simple words, but also wrongly identified many CWs.
Page 5, “Discussion”

See all papers in Proc. ACL 2013 that mention SVM.

See all papers in Proc. ACL that mention SVM.

support vector

Appears in 9 sentences as: Support Vector (2) Support vector (2) support vector (5)

In A Comparison of Techniques to Automatically Identify Complex Words.

Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine.
Page 1, “Abstract”
The support vector machine achieves a slight increase in precision over the other two methods, but at the cost of a dramatic trade off in recall.
Page 1, “Abstract”
o The implementation of a support vector machine for the classification of CWs.
Page 2, “Introduction”
0 An analysis of the features used in the support vector machine.
Page 2, “Introduction”
These were implemented as well as a support vector machine classifier.
Page 2, “Experimental Design”
2.6 Support Vector Machine
Page 3, “Experimental Design”
Support vector machines (SVM) are statistical classifiers which use labelled training data to predict the class of unseen inputs.
Page 3, “Experimental Design”
Support vector machines are powerful statistical classifiers, as employed in the ‘SVM’ method of this paper.
Page 6, “Related Work”
A Support Vector Machine is used to predict the familiarity of CWs in Zeng et al.
Page 6, “Related Work”

See all papers in Proc. ACL 2013 that mention support vector.

See all papers in Proc. ACL that mention support vector.

WordNet

Appears in 3 sentences as: WordNet (3)

In A Comparison of Techniques to Automatically Identify Complex Words.

This algorithm generated a set of synonyms from WordNet and then used the SUBTLEX frequencies to find the most frequent synonym.
Page 3, “Experimental Design”
This measure is taken from WordNet (Fellbaum, 1998).
Page 4, “Experimental Design”
Synonym Count Also taken from WordNet , this is the number of potential synonyms with which a word could be replaced.
Page 4, “Experimental Design”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.