FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
Hartmann, Silvana and Gurevych, Iryna

Article Structure

Abstract

We present a new bilingual FrameNet lexicon for English and German.

Introduction

FrameNet is a valuable resource for natural language processing (NLP): semantic role labeling (SRL) systems based on FrameNet provide semantic analysis for NLP applications, such as question answering (Narayanan and Harabagiu, 2004; Shi and Mihalcea, 2005) and information extraction (Mohit and Narayanan, 2003).

Resource Overview

FrameNet (Baker et al., 1998) is an expert-built lexical-semantic resource incorporating the theory of frame-semantics (Fillmore, 1976).

Method Overview

Our method consists of two steps visualized in Fig.

Related Work

4.1 Creating FrameNets in New Languages

FrameNet — Wiktionary Alignment

5.1 Alignment Technique

Intermediate Resource FNWKxx

6.1 Statistics

Translation Disambiguation

7.1 Disambiguation Method

Resource FNWKde

8.1 Statistics

Discussion: a Multilingual FrameNet

FNWKxx builds an excellent starting point to create FrameNet lexicons in various languages: the translation counts, for instance 6,871 for German, compare favorably to FrameNet 1.5, which contains 9,700 English lemma-POS.

Conclusion

The resource-coverage bottleneck for frame-semantic resources is particularly severe for less well-resourced languages.

Topics

gold standard

Appears in 20 sentences as: Gold Standard (2) gold standard (17) gold standards (1)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. In Tonelli and Pighin (2009), they use these features to train an SVM-classifier to identify valid alignments and report an Fl-score of 0.66 on a manually annotated gold standard .
    Page 3, “Related Work”
  2. They rely on a knowledge-based word sense disambiguation algorithm to establish the alignment and report F1=0.75 on a gold standard based on Tonelli and Pighin (2009).
    Page 3, “Related Work”
  3. teas) independently on a manually annotated gold standard .
    Page 4, “FrameNet — Wiktionary Alignment”
  4. 5.3 Gold Standard Creation
    Page 4, “FrameNet — Wiktionary Alignment”
  5. For the gold standard , we sampled 2,900 candidate pairs from Call.
    Page 4, “FrameNet — Wiktionary Alignment”
  6. The properties of the gold standard mirror the properties of Call: the sampling preserved the distribution of P08 in Call (around 40% verbs and nouns, and 12% adjectives) and the average numbers of candidates per FrameNet sense.
    Page 4, “FrameNet — Wiktionary Alignment”
  7. For comparison: Meyer and Gurevych (2011) report n=0.74 for their WordNet — Wiktionary gold standard , and Niemann and Gurevych (2011)
    Page 4, “FrameNet — Wiktionary Alignment”
  8. n=0.87 for their WordNet — Wikipedia gold standard .
    Page 5, “FrameNet — Wiktionary Alignment”
  9. These gold standards only consist of nouns, which appear to be an easier annotation task than verb senses.
    Page 5, “FrameNet — Wiktionary Alignment”
  10. Therefore, we had an expert annotator correct the verbal part of the gold standard set.
    Page 5, “FrameNet — Wiktionary Alignment”
  11. After removing the training set for the raters, the final gold standard contains 2,789 sense pairs.
    Page 5, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

WordNet

Appears in 14 sentences as: WordNet (15) wordnet (1) wordnets (1)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. (2008) map FrameNet frames to WordNet synsets based on the embedding of FrameNet lemmas in WordNet .
    Page 3, “Related Work”
  2. They use Multi-WordNet, an English-Italian wordnet , to induce an Italian FrameNet lexicon with 15,000 entries.
    Page 3, “Related Work”
  3. To create MapNet, Tonelli and Pianta (2009) align FrameNet senses with WordNet synsets by exploiting the textual similarity of their glosses.
    Page 3, “Related Work”
  4. The similarity measure is based on stem overlap of the candidates’ glosses expanded by WordNet domains, the WordNet synset, and the set of senses for a FrameNet frame.
    Page 3, “Related Work”
  5. The goal is the multilingual coverage extension of FrameNet, which is achieved by linking WordNet to wordnets in other languages (Spanish, Italian, Basque, and Catalan) in the Multilingual Central Repository.
    Page 3, “Related Work”
  6. Collaboratively created resources have become popular for sense alignments for NLP, starting with the alignment between WordNet and Wikipedia (Ruiz-Casado et al., 2005; Ponzetto and Navigli, 2009).
    Page 3, “Related Work”
  7. Wiktionary has been subject to few alignment efforts: de Melo and Weikum (2009) integrate information from Wiktionary into Universal WordNet .
    Page 3, “Related Work”
  8. Meyer and Gurevych (2011) map WordNet synsets to Wiktionary senses and show their complementary domain coverage.
    Page 3, “Related Work”
  9. They align senses in WordNet to Wikipedia entries in a supervised setting using semantic similarity measures.
    Page 4, “FrameNet — Wiktionary Alignment”
  10. The PPR measure (Agirre and Soroa, 2009) maps the glosses of the two senses to a semantic vector space spanned up by WordNet synsets and then compares them using the chi-square measure.
    Page 4, “FrameNet — Wiktionary Alignment”
  11. The semantic vectors ppr are computed using the personalized PageRank algorithm on the WordNet graph.
    Page 4, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

word senses

Appears in 10 sentences as: word sense (4) word senses (6)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. It groups word senses in frames that represent particular situations.
    Page 2, “Resource Overview”
  2. FrameNet release 1.5 contains 1,015 frames, and 11,942 word senses .
    Page 2, “Resource Overview”
  3. Wiktionary is organized like a traditional dictionary in lexical entries and word senses .
    Page 2, “Resource Overview”
  4. For the word senses , definitions and example sentences, as well as other lexical information, such as register (e.g., colloquial), phonetic transcription, inflection may be available, including language-specific types of information.
    Page 2, “Resource Overview”
  5. The second step is the disambiguation of the translated lemmas with respect to the target language Wiktionary in order to retrieve the linguistic information of the corresponding word sense in the target language Wiktionary (Meyer and Gurevych, 2012a).
    Page 2, “Method Overview”
  6. For the example sense of complete, we extract lexical information for the word sense of its German translation fertigmachen, for instance a German gloss, an example sentence, register information (colloquial), and synonyms, e. g., beenden.
    Page 2, “Method Overview”
  7. The first, corpus-based approach is to automatically extract word senses in the target language based on parallel corpora and frame annotations in the source language.
    Page 3, “Related Work”
  8. They rely on a knowledge-based word sense disambiguation algorithm to establish the alignment and report F1=0.75 on a gold standard based on Tonelli and Pighin (2009).
    Page 3, “Related Work”
  9. Tonelli and Giuliano (2009) align FrameNet senses to Wikipedia entries with the goal to extract word senses and example sentences in Italian.
    Page 3, “Related Work”
  10. This is in line with the reported higher complexity of lexical resources with respect to verbs and greater difficulty in alignments and word sense disambiguation (Laparra and Rigau, 2010).
    Page 6, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention word senses.

See all papers in Proc. ACL that mention word senses.

Back to top.

fine-grained

Appears in 8 sentences as: fine-grained (8)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. The verb senses are very fine-grained and thus present a difficult alignment task.
    Page 5, “FrameNet — Wiktionary Alignment”
  2. A number of false positives occur because the gold standard was developed in a very fine-grained manner: distinctions such as causative vs. inchoa-tive (enlarge: become large vs. enlarge: make large) were explicitly stressed in the definitions and thus annotated as different senses by the annotators.
    Page 6, “FrameNet — Wiktionary Alignment”
  3. Because sense granularity was an issue in the error analysis, we considered two alignment decisions: (a) fine-grained alignment: the two glosses describe the same sense; (b) coarse-grained alignment.
    Page 6, “Intermediate Resource FNWKxx”
  4. The precision for the fine-grained (a) is lower than the allover precision on the gold standard.
    Page 6, “Intermediate Resource FNWKxx”
  5. Also, fine-grained sense and frame distinctions may be more relevant in one language than in another language.
    Page 9, “Discussion: a Multilingual FrameNet”
  6. We however find lower performance for verbs in a fine-grained setting.
    Page 9, “Discussion: a Multilingual FrameNet”
  7. We argue that an improved alignment algorithm, for instance taking subcategorization information into account, can identify the fine-grained distinctions.
    Page 9, “Discussion: a Multilingual FrameNet”
  8. For the coarse-grained frames, fine-grained decisions can be merged in a second classification step.
    Page 9, “Discussion: a Multilingual FrameNet”

See all papers in Proc. ACL 2013 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

synsets

Appears in 8 sentences as: synset (1) synsets (7)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. (2008) map FrameNet frames to WordNet synsets based on the embedding of FrameNet lemmas in WordNet.
    Page 3, “Related Work”
  2. To create MapNet, Tonelli and Pianta (2009) align FrameNet senses with WordNet synsets by exploiting the textual similarity of their glosses.
    Page 3, “Related Work”
  3. The similarity measure is based on stem overlap of the candidates’ glosses expanded by WordNet domains, the WordNet synset , and the set of senses for a FrameNet frame.
    Page 3, “Related Work”
  4. Net synsets .
    Page 3, “Related Work”
  5. Meyer and Gurevych (2011) map WordNet synsets to Wiktionary senses and show their complementary domain coverage.
    Page 3, “Related Work”
  6. The PPR measure (Agirre and Soroa, 2009) maps the glosses of the two senses to a semantic vector space spanned up by WordNet synsets and then compares them using the chi-square measure.
    Page 4, “FrameNet — Wiktionary Alignment”
  7. where M is a transition probability matrix between the n WordNet synsets , c is a damping factor, and vppr is a vector of size n representing the probability of jumping to the node 2' associated with each vi.
    Page 4, “FrameNet — Wiktionary Alignment”
  8. For personalized PageRank, vppr is initialized in a particular way: the initial weight is distributed equally over the m vector components (i.e., synsets ) associated with a word in the sense gloss, other components receive a 0 value.
    Page 4, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

cross-lingual

Appears in 6 sentences as: cross-lingual (6)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. Previous cross-lingual transfer of FrameNet used corpus-based approaches, or resource alignment with multilingual expert-built resources, such as EuroWordNet.
    Page 1, “Introduction”
  2. To our knowledge, Wiktionary has not been evaluated as an interlingual index for the cross-lingual extension of lexical-semantic resources.
    Page 1, “Introduction”
  3. In this vein, Pado and Lapata (2005) propose a cross-lingual FrameNet extension to German and French; J ohansson and Nugues (2005) and J ohansson and Nugues (2006) do this for Spanish and Swedish, and Basili et al.
    Page 3, “Related Work”
  4. Another issue that applies to all automatic (and also manual) approaches of cross-lingual FrameNet extension is the restricted cross-language applicability of frames.
    Page 8, “Discussion: a Multilingual FrameNet”
  5. Unlike corpus-based approaches for cross-lingual FrameNet extension, our approach does not provide frame-semantic annotations for the example
    Page 9, “Discussion: a Multilingual FrameNet”
  6. Example annotations can be additionally obtained via cross-lingual annotation projection (Pado and Lapata, 2009), and the lexical information in FNWKde can be used to guide this process.
    Page 9, “Discussion: a Multilingual FrameNet”

See all papers in Proc. ACL 2013 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

similarity measure

Appears in 4 sentences as: similarity measure (3) similarity measures (1)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. The similarity measure is based on stem overlap of the candidates’ glosses expanded by WordNet domains, the WordNet synset, and the set of senses for a FrameNet frame.
    Page 3, “Related Work”
  2. They align senses in WordNet to Wikipedia entries in a supervised setting using semantic similarity measures .
    Page 4, “FrameNet — Wiktionary Alignment”
  3. Niemann and Gurevych (2011) combine two different types of similarity (i) cosine similarity on bag-of-words vectors (COS) and (ii) a personalized PageRank—based similarity measure (PPR).
    Page 4, “FrameNet — Wiktionary Alignment”
  4. For each similarity measure , Niemann and Gurevych (2011) determine a threshold (tppr and
    Page 4, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

hypernyms

Appears in 3 sentences as: HYPERNYM (1) hypernyms (2)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. For the joint model, we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms , synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739.
    Page 5, “FrameNet — Wiktionary Alignment”
  2. We also extract other related lemma-POS, for instance 487 antonyms, 126 hyponyms, and 19 hypernyms .
    Page 7, “Intermediate Resource FNWKxx”
  3. Relation per FrameNet sense per frame SYNONYM 17,713 13,288 HYPONYM 4,818 3,347 HYPERNYM 6,369 3,961 ANTONYM 9,626 6,737
    Page 8, “Resource FNWKde”

See all papers in Proc. ACL 2013 that mention hypernyms.

See all papers in Proc. ACL that mention hypernyms.

Back to top.

joint model

Appears in 3 sentences as: JOINT model (1) joint model (2)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. In Table 2, we report on the results of the best single models and the best joint model .
    Page 5, “FrameNet — Wiktionary Alignment”
  2. For the joint model , we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms, synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739.
    Page 5, “FrameNet — Wiktionary Alignment”
  3. The BEST JOINT model performs well on nouns, slightly better on adjectives, and worse on verbs, see Table 2.
    Page 5, “FrameNet — Wiktionary Alignment”

See all papers in Proc. ACL 2013 that mention joint model.

See all papers in Proc. ACL that mention joint model.

Back to top.

manually annotated

Appears in 3 sentences as: manually annotated (3)
In FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
  1. In Tonelli and Pighin (2009), they use these features to train an SVM-classifier to identify valid alignments and report an Fl-score of 0.66 on a manually annotated gold standard.
    Page 3, “Related Work”
  2. teas) independently on a manually annotated gold standard.
    Page 4, “FrameNet — Wiktionary Alignment”
  3. We compare FNWKde to two German frame-semantic resources, the manually annotated SALSA corpus (Burchardt et al., 2006) and a resource from Pado and Lapata (2005), henceforth P&L05.
    Page 8, “Resource FNWKde”

See all papers in Proc. ACL 2013 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.