Simple supervised document geolocation with geodesic grids
Wing, Benjamin and Baldridge, Jason

Article Structure

Abstract

We investigate automatic geolocation (i.e.

Introduction

There are a variety of applications that arise from connecting linguistic content—be it a word, phrase, document, or entire corpus—to geography.

Data

Wikipedia As of April 15, 2011, Wikipedia has some 18.4 million content-bearing articles in 281 language-specific encyclopedias.

Grid representation for connecting texts to locations

Geolocation involves identifying some spatial region with a unit of text—be it a word, phrase, or document.

Supervised models for document geolocation

Our methods use only the text in the documents; predictions are made based on the distributions 6, H, and p introduced in the previous section.

Experiments

The approaches described in the previous section are evaluated on both the geotagged Wikipedia and Twitter datasets.

Related work

Lieberman and Lin (2009) also work with geotagged Wikipedia articles, but they do in order so to ana-

Conclusion

We have shown that automatic identification of the location of a document based only on its text can be performed with high accuracy using simple supervised methods and a discrete grid representation of the earth’s surface.

Topics

development set

Appears in 3 sentences as: development set (3)
In Simple supervised document geolocation with geodesic grids
  1. Running on the full data set is time-consuming, so development was done on a subset of about 80,000 articles (19.9 million tokens) as a training set and 500 articles as a development set .
    Page 3, “Data”
  2. For both Wikipedia and Twitter, preliminary experiments on the development set were run to plot the prediction error for each method for each level of resolution, and the optimal resolution for each method was chosen for obtaining test results.
    Page 6, “Experiments”
  3. We recomputed the distributions using several values for both parameters and evaluated on the development set .
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

topic model

Appears in 3 sentences as: topic model (3)
In Simple supervised document geolocation with geodesic grids
  1. Hao et al (2010) use a location-based topic model to summarize travelogues, enrich them with automatically chosen images, and provide travel recommendations.
    Page 1, “Introduction”
  2. (2010) evaluate their geographic topic model by geolocating USA-based Twitter users based on their tweet content.
    Page 2, “Introduction”
  3. Their geographic topic model receives supervision from many documents/users and predicts locations for unseen documents/users.
    Page 2, “Introduction”

See all papers in Proc. ACL 2011 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.