Building Comparable Corpora Based on Bilingual LDA Model
Zhu, Zede and Li, Miao and Chen, Lei and Yang, Zhenxin

Article Structure

Introduction

Comparable corpora can be mined fine-grained translation equivalents, such as bilingual terminologies, named entities and parallel sentences, to support the bilingual lexicography, statistical machine translation and cross-language information retrieval (Abdul-Rauf et al., 2009).

Bilingual LDA Model

2.1 Standard LDA

Building comparable corpora

Based on the bilingual LDA model, building comparable corpora includes several steps to

Experiments and analysis

4.1 Datasets and Evaluation

Topics

LDA

Appears in 9 sentences as: LDA (10)
In Building Comparable Corpora Based on Bilingual LDA Model
  1. Based on Bilingual LDA Model
    Page 1, “Introduction”
  2. The paper concretely includes: 1) Introduce the Bilingual LDA (Latent Dirichlet Allocation) model which builds comparable corpora and improves the efficiency of matching similar documents; 2) Design a novel method of TFIDF (Topic Frequency-Inverse Document Frequency) to enhance the distinguishing ability of topics from different documents; 3) Propose a tailored
    Page 1, “Introduction”
  3. 2.1 Standard LDA
    Page 2, “Bilingual LDA Model”
  4. LDA model (Blei et al., 2003) represents the latent topic of the document distribution by Dirichlet distribution with a K-dimensional implicit random variable, which is transformed into a complete generative model when ,8 is exerted to Dirichlet distribution (Griffiths et al., 2004) (Shown in Fig.
    Page 2, “Bilingual LDA Model”
  5. Figure 1: Standard LDA model
    Page 2, “Bilingual LDA Model”
  6. 2.2 Bilingual LDA
    Page 2, “Bilingual LDA Model”
  7. Bilingual LDA is a bilingual extension of a standard LDA model.
    Page 2, “Bilingual LDA Model”
  8. Figure 2: Bilingual LDA model
    Page 2, “Bilingual LDA Model”
  9. Based on the bilingual LDA model, building comparable corpora includes several steps to
    Page 2, “Building comparable corpora”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

topic model

Appears in 5 sentences as: topic model (5) topical model (1)
In Building Comparable Corpora Based on Bilingual LDA Model
  1. Preiss (2012) transformed the source language topical model to the target language and classified probability distribution of topics in the same language, whose shortcoming is that the effect of model translation seriously hampers the comparable corpora quality.
    Page 1, “Introduction”
  2. (2009) adapted monolingual topic model to bilingual topic model in which the documents of a concept unit in different languages were assumed to share identical topic distribution.
    Page 1, “Introduction”
  3. Bilingual topic model is widely adopted to mine translation equivalents from multi-language documents (Mimno et al., 2009;Ivaneta1,2011)
    Page 1, “Introduction”
  4. Based on the bilingual topic model , this paper predicts the topical structure of documents in different languages and calculates the similarity of topics over documents to build comparable corpora.
    Page 1, “Introduction”
  5. generate the bilingual topic model (0” from the
    Page 2, “Building comparable corpora”

See all papers in Proc. ACL 2013 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

Conditional Probability

Appears in 3 sentences as: Conditional Probability (2) conditional probability (1)
In Building Comparable Corpora Based on Bilingual LDA Model
  1. method of conditional probability to calculate document similarity; 4) Address a language-independent study which isn’t limited to a particular data source in any language.
    Page 2, “Introduction”
  2. 3.2 Conditional Probability
    Page 3, “Building comparable corpora”
  3. The similarity between 7715 and 771T is defined as the Conditional Probability (CP) of documents
    Page 3, “Building comparable corpora”

See all papers in Proc. ACL 2013 that mention Conditional Probability.

See all papers in Proc. ACL that mention Conditional Probability.

Back to top.

probability distribution

Appears in 3 sentences as: probability distribution (3)
In Building Comparable Corpora Based on Bilingual LDA Model
  1. Preiss (2012) transformed the source language topical model to the target language and classified probability distribution of topics in the same language, whose shortcoming is that the effect of model translation seriously hampers the comparable corpora quality.
    Page 1, “Introduction”
  2. denotes the vocabulary probability distribution in the topic k; M denotes the document number; 6m
    Page 2, “Bilingual LDA Model”
  3. denotes the topic probability distribution in the document m; Nm denotes the length of m; me
    Page 2, “Bilingual LDA Model”

See all papers in Proc. ACL 2013 that mention probability distribution.

See all papers in Proc. ACL that mention probability distribution.

Back to top.

topic distribution

Appears in 3 sentences as: topic distribution (3)
In Building Comparable Corpora Based on Bilingual LDA Model
  1. (2009) adapted monolingual topic model to bilingual topic model in which the documents of a concept unit in different languages were assumed to share identical topic distribution .
    Page 1, “Introduction”
  2. given bilingual corpora, predict the topic distribution (9m, kof the new documents, calculate the
    Page 2, “Building comparable corpora”
  3. P(Z) as prior topic distribution is assumed a
    Page 3, “Building comparable corpora”

See all papers in Proc. ACL 2013 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.