Learning Entity Representation for Entity Disambiguation
He, Zhengyan and Liu, Shujie and Li, Mu and Zhou, Ming and Zhang, Longkai and Wang, Houfeng

Article Structure

Abstract

We propose a novel entity disambiguation model, based on Deep Neural Network (DNN).

Introduction

Entity linking or disambiguation has recently received much attention in natural language processing community (Bunescu and Pasca, 2006; Han et al., 2011; Kataria et al., 2011; Sen, 2012).

Learning Representation for Contextual Document

Given a mention string m with its context document d, a list of candidate entities C (m) are generated for m, for each candidate entity 6, E C (m) , we compute a ranking score sim(dm, ei) indicating how likely m refers to 61-.

Experiments and Analysis

Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function.

Conclusion

We propose a deep learning approach that automatically learns context-entity similarity measure for entity disambiguation.

Topics

similarity measure

Appears in 7 sentences as: similarity measure (8) similarity measures (1)
In Learning Entity Representation for Entity Disambiguation
  1. Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure .
    Page 1, “Abstract”
  2. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure .
    Page 1, “Abstract”
  3. ument and entity representations for a fixed similarity measure .
    Page 2, “Introduction”
  4. In fact, the underlying representations for computing similarity measure add internal structure to the given similarity measure .
    Page 2, “Introduction”
  5. The learned similarity measure can be readily incorporated into any existing collective approaches, which further boosts performance.
    Page 2, “Introduction”
  6. When embedding our similarity measure sim(d, 6) into (Han et al., 2011), we achieve the best results on AIDA.
    Page 4, “Experiments and Analysis”
  7. We propose a deep learning approach that automatically learns context-entity similarity measure for entity disambiguation.
    Page 4, “Conclusion”

See all papers in Proc. ACL 2013 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

“hidden layer

Appears in 4 sentences as: hidden layer (1) hidden layers (1) “hidden layer (2)
In Learning Entity Representation for Entity Disambiguation
  1. This stage we optimize the learned representation ( “hidden layer 11” in Fig.
    Page 2, “Learning Representation for Contextual Document”
  2. The network weights below “hidden layer 11” are initialized with the pre-training stage.
    Page 2, “Learning Representation for Contextual Document”
  3. hidden layer n
    Page 3, “Learning Representation for Contextual Document”
  4. Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function.
    Page 4, “Experiments and Analysis”

See all papers in Proc. ACL 2013 that mention “hidden layer.

See all papers in Proc. ACL that mention “hidden layer.

Back to top.

knowledge base

Appears in 3 sentences as: knowledge base (3)
In Learning Entity Representation for Entity Disambiguation
  1. It is an essential first step for succeeding subtasks in knowledge base construction (Ji and Grishman, 2011) like populating attribute to entities.
    Page 1, “Introduction”
  2. However, the one-topic-per-entity assumption makes it impossible to scale to large knowledge base , as every entity has a separate word distribution P(w|e); besides, the training objective does not directly correspond with disambiguation performances.
    Page 1, “Introduction”
  3. The learned representation of entity is compact and can scale to very large knowledge base .
    Page 4, “Conclusion”

See all papers in Proc. ACL 2013 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

loss function

Appears in 3 sentences as: loss function (4)
In Learning Entity Representation for Entity Disambiguation
  1. The loss function is defined as negative log of 30 f tmax function:
    Page 3, “Learning Representation for Contextual Document”
  2. The loss function is closely related to contrastive estimation (Smith and Eisner, 2005), which defines where the positive example takes probability mass from.
    Page 3, “Learning Representation for Contextual Document”
  3. In our experiments, the so ftmazc loss function consistently outperforms pairwise ranking loss function , which is taken as our default setting.
    Page 3, “Learning Representation for Contextual Document”

See all papers in Proc. ACL 2013 that mention loss function.

See all papers in Proc. ACL that mention loss function.

Back to top.

Neural Network

Appears in 3 sentences as: Neural Network (1) neural network (1) neural networks (1)
In Learning Entity Representation for Entity Disambiguation
  1. We propose a novel entity disambiguation model, based on Deep Neural Network (DNN).
    Page 1, “Abstract”
  2. Deep neural networks (Hinton et al., 2006; Bengio et al., 2007) are built in a hierarchical manner, and allow us to compare context and entity at some higher level abstraction; while at lower levels, general concepts are shared across entities, resulting in compact models.
    Page 1, “Introduction”
  3. BTS is a variant of the general backpropagation algorithm for structured neural network .
    Page 3, “Learning Representation for Contextual Document”

See all papers in Proc. ACL 2013 that mention Neural Network.

See all papers in Proc. ACL that mention Neural Network.

Back to top.

similarity score

Appears in 3 sentences as: similarity score (3)
In Learning Entity Representation for Entity Disambiguation
  1. In the pre-training stage, Stacked Denoising Auto-encoders are built in an unsupervised layer-wise fashion to discover general concepts encoding d and e. In the supervised fine-tuning stage, the entire network weights are fine-tuned to optimize the similarity score sim(d, e).
    Page 2, “Learning Representation for Contextual Document”
  2. The similarity score of (d, 6) pair is defined as the dot product of f (d) and f (6) (Fig.
    Page 3, “Learning Representation for Contextual Document”
  3. That is, we raise the similarity score of true pair sim(d, e) and penalize all the rest sim(d, 6,).
    Page 3, “Learning Representation for Contextual Document”

See all papers in Proc. ACL 2013 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.