Bootstrapping Entity Translation on Weakly Comparable Corpora
Lee, Taesung and Hwang, Seung-won

Article Structure

Abstract

This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”.

Introduction

Identifying and understanding entities is a crucial step in understanding text.

Related Work

This work is related to two research streams: NE translation and semantically equivalent relation mining.

Framework Overview

In this section, we provide an overview of our framework for translating NEs, using news corpora in English and Chinese as a running example.

Methods

In this section, we describe our method in detail.

Experiments

In this section, we present experimental settings and results of translating entity names using our methods compared with several baselines.

Conclusions

This paper proposed a bootstrapping approach for entity translation using multilingual relational clustering.

Topics

semantically similar

Appears in 9 sentences as: semantic similarity (2) Semantically similar (1) semantically similar (6)
In Bootstrapping Entity Translation on Weakly Comparable Corpora
  1. In particular, our approach leverages semantically similar document pairs to exclude incomparable parts that appear in one language only.
    Page 2, “Introduction”
  2. Semantically similar relation mining
    Page 3, “Related Work”
  3. In automatically constructed knowledge bases, finding semantically similar relations can improve understanding of the Web describing content with many different expressions.
    Page 3, “Related Work”
  4. NELL (Mohamed et al., 2011) finds related relations using seed pairs of one given relation; then, using K—means clustering, it finds relations that are semantically similar to the given relation.
    Page 3, “Related Work”
  5. 1) Using Tt , we obtain a set of relation translations with a semantic similarity score, T2073, To), for an English relation TE and a Chinese relation r0 (Figure 2 (b), Section 4.3) (e.g., TE =visit and r0 = iii/5]).
    Page 4, “Framework Overview”
  6. 2) Using TR, and Tt , we identify a set of semantically similar document pairs that describe the same event with a similarity score TE (d E, do) where d E is an English document and do is a Chinese document (Figure 2 (c), Section 4.4).
    Page 4, “Framework Overview”
  7. H50”, Tj) if?” 6 RE and W E R0 H(7“i,7“j) = Hb(7“j,7“i) ifrj 6 RE and Ti 6 RC Hm(ri,rj) otherwise Intuitively, |H (7”, r3 indicates the strength of the semantic similarity of two relations Ti and N of any languages.
    Page 5, “Methods”
  8. However, as shown in Table 2, we cannot use this value directly to measure the similarity because the support intersection of semantically similar bilingual relations (e.g., |H (head to, i)?
    Page 5, “Methods”
  9. We consider that a pair of an English entity 6E and a Chinese entity ea are likely to indicate the same real world entity if they have 1) semantically similar relations to the same entity 2) under the same context.
    Page 7, “Methods”

See all papers in Proc. ACL 2013 that mention semantically similar.

See all papers in Proc. ACL that mention semantically similar.

Back to top.

phrase pairs

Appears in 4 sentences as: phrase pair (1) phrase pairs (4)
In Bootstrapping Entity Translation on Weakly Comparable Corpora
  1. 0 f3 (7“, C) = lHWithin(7“, c)|/|Hwithin(c)|: This is a variation of f2 that considers only noun phrase pairs shared at least once by relations in c.
    Page 6, “Methods”
  2. o f4(7“, C) = lHWithin(7“, c)|/|Hshared(c)|: ThlS lS a variation of f 2 that considers only noun phrase pairs shared at least once by any pair of relations.
    Page 6, “Methods”
  3. where Hwithino“, c) = UMEC H(7“, 0) fl H(7“,7“*), the intersection, considering translation, of H (7“) and noun phrase pairs shared at once by relations in c, Hwithin(c) = UMEC H(7“*,c — {r*}), and Hshared(c) = Ur*€REURC [{(7~*7 c), the noun phrase pairs shared at once by any relations.
    Page 6, “Methods”
  4. The use of Hwithin and Hshared is based on the observation that a noun phrase pair that appear in only one relation tends to be an incorrectly chunked entity such as ‘World Trade’ from the ‘World Trade Organization’ .
    Page 6, “Methods”

See all papers in Proc. ACL 2013 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

edge weights

Appears in 3 sentences as: edge weights (3)
In Bootstrapping Entity Translation on Weakly Comparable Corpora
  1. (2010; 2012) leverage two graphs of entities in each language, that are generated from a pair of corpora, with edge weights quantified as the strength of the relatedness of entities.
    Page 3, “Related Work”
  2. where B C 8% x 8% is a greedy approximate solution of maximum bipartite matching (West, 1999) on a bipartite graph GB 2 (VB 2 (8%, 6%), EB) with edge weights that are defined by T3.
    Page 7, “Methods”
  3. that maximize the sum of the selected edge weights and that do not share a node as their anchor point.
    Page 7, “Methods”

See all papers in Proc. ACL 2013 that mention edge weights.

See all papers in Proc. ACL that mention edge weights.

Back to top.

named entity

Appears in 3 sentences as: named entities (1) named entity (2)
In Bootstrapping Entity Translation on Weakly Comparable Corpora
  1. This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”.
    Page 1, “Abstract”
  2. Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low comparability case.
    Page 1, “Abstract”
  3. This task is more challenging in the presence of multilingual text, because translating named entities (NEs), such as persons, locations, or organizations, is a nontrivial task.
    Page 1, “Introduction”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

news articles

Appears in 3 sentences as: news articles (4)
In Bootstrapping Entity Translation on Weakly Comparable Corpora
  1. We processed news articles for an entire year in 2008 by Xinhua news who publishes news in both English and Chinese, which were also used by Kim et al.
    Page 7, “Experiments”
  2. The English corpus consists of 100,746 news articles, and the Chinese corpus consists of 88,031 news articles .
    Page 7, “Experiments”
  3. 0 D0: All news articles are used.
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.