Text-Driven Toponym Resolution using Indirect Supervision
Speriosu, Michael and Baldridge, Jason

Article Structure

Abstract

Toponym resolvers identify the specific locations referred to by ambiguous placenames in text.

Introduction

It has been estimated that at least half of the world’s stored knowledge, both printed and digital, has geographic relevance, and that geographic information pervades many more aspects of humanity than previously thought (Petras, 2004; Skupin and Esperbe, 2011).

Data

2.1 Gazetteer

Toponym Resolvers

Given a set of toponyms provided via annotations or identified using NER, a resolver must select a candidate location for each toponym (or, in some cases, a resolver may abstain).

Evaluation

Many prior efforts use a simple accuracy metric: the fraction of toponyms whose predicted location

Results

Table 2 gives the performance of the resolvers on the TR-CONLL and CWAR test sets when gold toponyms are used.

Error Analysis

Table 4 shows the ten toponyms that caused the greatest total error distances from TRC-DEV with gold toponyms when resolved by TRAWL, the resolver that achieves the lowest mean error on that

Conclusion

Our text-driven resolvers prove highly effective for both modern day newswire texts and 19th century texts pertaining to the Civil War.

Topics

named entity

Appears in 9 sentences as: named entities (1) named entity (8)
In Text-Driven Toponym Resolution using Indirect Supervision
  1. (2010) use relationships learned between people, organizations, and locations from Wikipedia to aid in toponym resolution when such named entities are present, but do not exploit any other textual context.
    Page 1, “Introduction”
  2. However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer.
    Page 2, “Introduction”
  3. Toponyms were annotated by a semiautomated process: a named entity recognizer identified toponyms, and then coordinates were assigned using simple rules and corrected by hand.
    Page 3, “Data”
  4. To create the indirectly supervised training data for WISTR, the OpenNLP named entity recognizer detects toponyms in GEOWIKI, and candidate locations for each toponym are retrieved from GEONAMES.
    Page 5, “Toponym Resolvers”
  5. when a named entity recognizer is used to identify toponyms.
    Page 6, “Evaluation”
  6. We primarily present results from experiments with gold toponyms but include an accuracy measure for comparability with results from experiments run on plain text with a named entity recognizer.
    Page 6, “Evaluation”
  7. The named entity recognizer is likely better at detecting common toponyms than rare toponyms due to the na-
    Page 7, “Results”
  8. We also measured the mean and median error distance for toponyms correctly identified by the named entity recognizer, and found that they tended to be 50-200km worse than for gold toponyms.
    Page 8, “Results”
  9. This also makes sense given the named entity recognizer’s tendency to detect common toponyms: common toponyms tend to be more ambiguous than others.
    Page 8, “Results”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

NER

Appears in 8 sentences as: NER (8) NER’s (1)
In Text-Driven Toponym Resolution using Indirect Supervision
  1. Table 1 gives statistics for both corpora, including the number and ambiguity of gold standard toponyms for both as well as NER identified to-
    Page 3, “Data”
  2. ponyms for TR-CONLL.4 We use the pre-trained English NER from the OpenNLP project.5
    Page 3, “Data”
  3. Given a set of toponyms provided via annotations or identified using NER , a resolver must select a candidate location for each toponym (or, in some cases, a resolver may abstain).
    Page 3, “Toponym Resolvers”
  4. 4States and countries are not annotated in CWAR, so we do not evaluate end-to-end using NER plus toponym resolution for it as there are many (falsely) false positives.
    Page 3, “Toponym Resolvers”
  5. False positives occur when the NER incorrectly predicts a toponym, and false negatives occur when it fails to predict a toponym identified by the annotator.
    Page 6, “Evaluation”
  6. In this case, the ORACLE results are less than 100% due to the limitations of the NER, and represent the best possible results given the NER we used.
    Page 7, “Results”
  7. Results on TR-CONLL indicate much higher performance than the resolvers presented by Leidner (2008), whose F-scores do not exceed 36.5% with either gold or NER toponyms.7 TRC-TEST is a subset of the documents Leidner uses (he did not split development and test data), but the results still come from overlapping data.
    Page 8, “Results”
  8. However, our evaluation is more penalized since SPIDER loses precision for NER’s false positives (Jack London as a location) while Leidner only evaluated on actual locations.
    Page 8, “Results”

See all papers in Proc. ACL 2013 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

text classifiers

Appears in 4 sentences as: text classifier (1) text classifiers (3)
In Text-Driven Toponym Resolution using Indirect Supervision
  1. We exploit document-level geotags to indirectly generate training instances for text classifiers for toponym resolution, and show that textual cues can be straightforwardly integrated with other commonly used ones.
    Page 1, “Abstract”
  2. Essentially, we learn a text classifier per toponym.
    Page 2, “Introduction”
  3. Our results show these text classifiers are far more accurate than algorithms based on spatial proximity or metadata.
    Page 2, “Introduction”
  4. It learns text classifiers based on local context window features trained on instances automatically extracted from GEOWIKI.
    Page 5, “Toponym Resolvers”

See all papers in Proc. ACL 2013 that mention text classifiers.

See all papers in Proc. ACL that mention text classifiers.

Back to top.

end-to-end

Appears in 3 sentences as: end-to-end (3)
In Text-Driven Toponym Resolution using Indirect Supervision
  1. However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer.
    Page 2, “Introduction”
  2. 4States and countries are not annotated in CWAR, so we do not evaluate end-to-end using NER plus toponym resolution for it as there are many (falsely) false positives.
    Page 3, “Toponym Resolvers”
  3. Performance remains good when resolving toponyms identified automatically, indicating that end-to-end systems based on our models may improve the experience of digital humanities scholars interested in finding and visualizing toponyms in large corpora.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention end-to-end.

See all papers in Proc. ACL that mention end-to-end.

Back to top.

news articles

Appears in 3 sentences as: news article (1) news articles (2)
In Text-Driven Toponym Resolution using Indirect Supervision
  1. ically recorded travel costs on the shaping of empires (Scheidel et al., 2012), and systems that convey the geographic content in news articles (Teitler et al., 2008; Sankaranarayanan et al., 2009) and microblogs (Gelernter and Mushegian, 2011).
    Page 1, “Introduction”
  2. The TR-CONLL corpus (Leidner, 2008) contains 946 REUTERS news articles published in August 1996.
    Page 3, “Data”
  3. An instance of California in a baseball-related news article is incorrectly predicted to be the town California, Pennsylvania.
    Page 8, “Error Analysis”

See all papers in Proc. ACL 2013 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.