Linking and Extending an Open Multilingual Wordnet
Bond, Francis and Foster, Ryan

Article Structure

Abstract

We create an open multilingual wordnet with large wordnets for over 26 languages and smaller ones for 57 languages.

Introduction

We wish to create a lexicon covering as many languages as possible, with as much useful information as possible.

Linking Multiple Wordnets

In order to make the data from existing wordnet projects more accessible, we have built a simple database with information from those wordnets with licenses that allow redistribution of the data.

Extending with non-wordnet data

We looked at two sources for automatically adding new entries.

Results and Evaluation

We give the data for the 26 wordnets with more than 10,000 synsets in Table 2.

Discussion and Future Work

We have created a large open wordnet of high quality (85%—99% measured on senses).

Conclusions

We have created an open multilingual wordnet with over 26 languages.

Topics

wordnets

Appears in 87 sentences as: (16) WordNet (18) Wordnet (7) wordnet (38) Wordnets (2) wordnets (41)
In Linking and Extending an Open Multilingual Wordnet
  1. We create an open multilingual wordnet with large wordnets for over 26 languages and smaller ones for 57 languages.
    Page 1, “Abstract”
  2. It is made by combining wordnets with open li-cences, data from Wiktionary and the Unicode Common Locale Data Repository.
    Page 1, “Abstract”
  3. One of the many attractions of the semantic network WordNet (Fellbaum, 1998), is that there are numerous wordnets being built for different languages.
    Page 1, “Introduction”
  4. There are, in addition, many projects for groups of languages: Euro WordNet (Vossen, 1998), BalkaNet (Tufis et al., 2004), Asian Wordnet (Charoenporn et al., 2008) and more.
    Page 1, “Introduction”
  5. Although there are over 60 languages for which wordnets exist in some state of development (Fellbaum and Vossen, 2012, 316), less than half of these have released any data, and for those that have, the data is often not freely accessible (Bond and Paik, 2012).
    Page 1, “Introduction”
  6. For those wordnets that are available, they are of widely varying size and quality, both in terms of accuracy and richness.
    Page 1, “Introduction”
  7. Dpen Multilingual Wordnet
    Page 1, “Introduction”
  8. languages to be able to access wordnets for those languages with a minimum of legal and technical barriers.
    Page 1, “Introduction”
  9. In practice this means making it possible to access multiple wordnets with a common interface.
    Page 1, “Introduction”
  10. We also use sources of semistructured data that have minimal legal restrictions to automatically extend existing freely available wordnets and to create additional wordnets which can be added to our open wordnet grid.
    Page 1, “Introduction”
  11. Previous studies have leveraged multiple wordnets and Wiktionary (Wikimedia, 2013) to extend existing wordnets or create new ones (de Melo and Weikum, 2009; Hanoka and Sagot, 2012).
    Page 1, “Introduction”

See all papers in Proc. ACL 2013 that mention wordnets.

See all papers in Proc. ACL that mention wordnets.

Back to top.

synsets

Appears in 22 sentences as: synset (6) Synsets (1) synsets (17)
In Linking and Extending an Open Multilingual Wordnet
  1. Open class words (nouns, verbs, adjectives and adverbs) are grouped into concepts represented by sets of synonyms ( synsets ).
    Page 2, “Linking Multiple Wordnets”
  2. Synsets are linked by semantic relations such as hyponomy and meronomy.
    Page 2, “Linking Multiple Wordnets”
  3. The majority of freely available wordnets take the basic structure of the PWN and add new lemmas (words) to the existing synsets : the extend model (Vossen, 2005).
    Page 2, “Linking Multiple Wordnets”
  4. In theory, such wordnets can easily be combined into a single resource by using the PWN synsets as pivots.
    Page 2, “Linking Multiple Wordnets”
  5. Because they are linked at the synset level, the problem of ambiguity one gets when linking bilingual dictionaries through a common language is resolved: we are linking senses to senses.
    Page 2, “Linking Multiple Wordnets”
  6. Mapping introduces some distortions, in particular, when a synset is split, we chose to only map the translations to the most probable mapping, so some new synsets will have no translations.
    Page 2, “Linking Multiple Wordnets”
  7. The Scandinavian and Polish wordnets are based on the merge approach, where independent language specific structures are built and then some synsets linked to PWN.
    Page 3, “Linking Multiple Wordnets”
  8. As a very rough measure of useful coverage, we report the percentage of synsets covered from this core list.
    Page 3, “Linking Multiple Wordnets”
  9. edu/standoff—files/core—wordnet .txt; we converted it to wn30 synsets .
    Page 3, “Linking Multiple Wordnets”
  10. Most had around 550 senses ( synsets and their lemmas): for example, for Portuguese: Englishnzl inglés.
    Page 4, “Extending with non-wordnet data”
  11. de Melo and Weikum (2009) also use this data (and data from a variety of other sources) to build an enhanced wordnet, in addition adding new synsets for concepts that are not in not wordnet.
    Page 4, “Extending with non-wordnet data”

See all papers in Proc. ACL 2013 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

similarity score

Appears in 6 sentences as: similarity score (3) similarity scores (3)
In Linking and Extending an Open Multilingual Wordnet
  1. Meyer and Gurevych (2011) showed that automatic alignments between Wiktionary senses and PWN can be established with reasonable accuracy and recall by combining multiple text similarity scores to compare a bag of words based on several pieces of information linked to a WordNet sense with another bag of words obtained from a Wiktionary entry.
    Page 5, “Extending with non-wordnet data”
  2. We calculated a number of similarity scores , the first two based on similarity in the number of lemmas, calculated using the J accard index:
    Page 5, “Extending with non-wordnet data”
  3. This development dataset was used to tune refined similarity scores .
    Page 6, “Extending with non-wordnet data”
  4. simd measures the similarity of the definitions in the two resources, using a cosine similarity score .
    Page 6, “Extending with non-wordnet data”
  5. If that failed to produce a single possible alignment, we aligned the short gloss with the full definition that produced the greatest cosine similarity score .
    Page 6, “Extending with non-wordnet data”
  6. Our combined ranking score (simc), based on both overlapping translations and a monolingual lexical similarity score , was able to outperform ranking based on either component in isolation.
    Page 6, “Extending with non-wordnet data”

See all papers in Proc. ACL 2013 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.