Automatic Detection of Multilingual Dictionaries on the Web
Grigonyte, Gintare and Baldwin, Timothy

Article Structure

Abstract

This paper presents an approach to query construction to detect multilingual dictionaries for predetermined language combinations on the web, based on the identification of terms which are likely to occur in bilingual dictionaries but not in general web documents.

Motivation

Translation dictionaries and other multilingual lexical resources are valuable in a myriad of contexts, from language preservation (Thieberger and Berez, 2012) to language learning (Laufer and Hadar, 1997), cross-language information retrieval (Nie, 2010) and machine translation (Munteanu and Marcu, 2005; Soderland et al., 2009).

Related Work

This research seeks to identify documents of a particular type on the web, namely multilingual dictionaries.

Methodology

Our method is based on a query formulation approach, and querying against a preexisting index of a document collection (e.g.

Experimental methodology

We evaluate our proposed methodology in two ways:

Results

First, we present results over the synthetic dataset in Table 3.

Conclusions

We have described initial results for a method designed to automatically detect multilingual dictionaries on the web, and attained highly credible results over both a synthetic dataset and an experiment over the open web using a web search engine.

Topics

language pairs

Appears in 6 sentences as: language pair (2) language pairs (4)
In Automatic Detection of Multilingual Dictionaries on the Web
  1. org aspire to aggregate these dictionaries into a single lexical database, but are hampered by the need to identify individual multilingual dictionaries, especially for language pairs where there is a sparsity of data from existing dictionaries (Baldwin et al., 2010; Kamholz and Pool, to appear).
    Page 1, “Motivation”
  2. This paper is an attempt to automate the detection of multilingual dictionaries on the web, through query construction for an arbitrary language pair .
    Page 1, “Motivation”
  3. Most queries returned no results; indeed, for the en-ar language pair , only 49/1000 queries returned documents.
    Page 5, “Results”
  4. Among the 7 language pairs , en-es, en-de, en-fr and en-it achieved the highest MAP scores.
    Page 5, “Results”
  5. In terms of unique lexical resources found with 50 queries, the most successful language pairs were en-fr, en-de and en-it.
    Page 5, “Results”
  6. training over domain-specific dictionaries from other language pairs ), and low-density languages where there are few dictionaries and Wikipedia articles to train the method on.
    Page 5, “Conclusions”

See all papers in Proc. ACL 2014 that mention language pairs.

See all papers in Proc. ACL that mention language pairs.

Back to top.