CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
Rothe, Sascha and Schütze, Hinrich

Article Structure

Abstract

We present CoSimRank, a graph-theoretic similarity measure that is efficient because it can compute a single node similarity without having to compute the similarities of the entire graph.

Introduction

Graph-theoretic algorithms have been successfully applied to many problems in NLP (Mihalcea and Radev, 2011).

Related Work

Our work is unsupervised.

CoSimRank

We first first give an intuitive introduction of CoSimRank as a Personalized PageRank (PPR) derivative.

Comparison to SimRank

The original SimRank equation can be written as follows (Jeh and Widom, 2002):

Extensions

We will show now that the basic CoSimRank algorithm can be extended in a number of ways and is thus a flexible tool for different NLP applications.

Topics

similarity measure

Appears in 20 sentences as: Similarity Measure (1) similarity measure (9) similarity measurement (2) similarity measures (8)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. We present CoSimRank, a graph-theoretic similarity measure that is efficient because it can compute a single node similarity without having to compute the similarities of the entire graph.
    Page 1, “Abstract”
  2. Another advantage of CoSimRank is that it can be flexibly extended from basic node-node similarity to several other graph-theoretic similarity measures .
    Page 1, “Abstract”
  3. }raph-The0retic Similarity Measure
    Page 1, “Introduction”
  4. Apart from SimRank, many other similarity measures have been proposed.
    Page 2, “Related Work”
  5. (2006) introduce a similarity measure that is also based on the idea that nodes are similar when their neighbors are, but that is designed for bipartite graphs.
    Page 2, “Related Work”
  6. Another important similarity measure is cosine similarity of Personalized PageRank (PPR) vectors.
    Page 2, “Related Work”
  7. (2008) compared PPR+cos to other graph based similarity measures like shortest-path and bounded-length random walks.
    Page 2, “Related Work”
  8. PPR+cos performed best except for a new similarity measure based on commute time.
    Page 2, “Related Work”
  9. They applied different similarity measures , e.g., cosine of dependency vectors or a new algorithm called path-constrained graph walk, on synonym extraction (Minkov and Cohen, 2012).
    Page 2, “Related Work”
  10. Some other applications of SimRank or other graph based similarity measures in NLP include work on document similarity (Li et al., 2009), the transfer of sentiment information between languages (Scheible et al., 2010) and named entity disambiguation (Han and Zhao, 2010).
    Page 2, “Related Work”
  11. Rank or as a version of Personalized PageRank for similarity measurement .
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

PageRank

Appears in 13 sentences as: PageRank (15)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. We present equivalent formalizations that show CoSimRank’s close relationship to Personalized PageRank and SimRank and also show how we can take advantage of fast matrix multiplication algorithms to compute CoSimRank.
    Page 1, “Abstract”
  2. These algorithms are often based on PageRank (Erin and Page, 1998) and other centrality measures (e.g., (Erkan and Radev, 2004)).
    Page 1, “Introduction”
  3. This paper introduces CoSimRank,1 a new graph-theoretic algorithm for computing node similarity that combines features of SimRank and PageRank .
    Page 1, “Introduction”
  4. Another important similarity measure is cosine similarity of Personalized PageRank (PPR) vectors.
    Page 2, “Related Work”
  5. LexRank (Erkan and Radev, 2004) is similar to PPR+cos in that it combines PageRank and cosine; it initializes the sentence similarity matrix of a document using cosine and then applies PageRank to compute lexical centrality.
    Page 2, “Related Work”
  6. These approaches use at least one of cosine similarity, PageRank and SimRank.
    Page 2, “Related Work”
  7. Rank or as a version of Personalized PageRank for similarity measurement.
    Page 3, “Related Work”
  8. We first first give an intuitive introduction of CoSimRank as a Personalized PageRank (PPR) derivative.
    Page 3, “CoSimRank”
  9. 3.1 Personalized PageRank
    Page 3, “CoSimRank”
  10. Haveliwala (2002) introduced Personalized PageRank — or topic-sensitive PageRank — based on the idea that the uniform damping vector 19(0) can be replaced by a personalized vector, which depends on node i.
    Page 3, “CoSimRank”
  11. The use of weighted edges was first proposed in the PageRank patent.
    Page 5, “Extensions”

See all papers in Proc. ACL 2014 that mention PageRank.

See all papers in Proc. ACL that mention PageRank.

Back to top.

cosine similarity

Appears in 6 sentences as: cosine similarity (6)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. Another important similarity measure is cosine similarity of Personalized PageRank (PPR) vectors.
    Page 2, “Related Work”
  2. These approaches use at least one of cosine similarity , PageRank and SimRank.
    Page 2, “Related Work”
  3. This is similar to cosine similarity except that the l-norm is used instead of the 2-norm.
    Page 3, “CoSimRank”
  4. We are not including this method in our experiments, but we will give the equation here, as traditional document similarity measures (e.g., cosine similarity ) perform poorly on this task although there also are known alternatives with good results (Sahami and Heilman, 2006).
    Page 6, “Extensions”
  5. To calculate PPR+cos, we computed 20 iterations with a decay factor of 0.8 and used the cosine similarity with the 2-norm in the denominator to compare two vectors.
    Page 7, “Extensions”
  6. We compute 20 iterations of PPR+cos to reach convergence and then calculate a single cosine similarity .
    Page 8, “Extensions”

See all papers in Proc. ACL 2014 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

time complexity

Appears in 5 sentences as: time complexities (1) time complexity (7)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. Unfortunately, SimRank has time complexity (9(n3) (where n is the number of nodes in the graph) and therefore does not scale to the large graphs that are typical of NLP.
    Page 1, “Introduction”
  2. 8) have time complexity (9(n3) or — if we want to take the higher efficiency of computation for sparse graphs into account —(9(dn2) where n is the number of nodes and d the
    Page 4, “Comparison to SimRank”
  3. If d < k, then the time complexity of CoSimRank is (9(k2n).
    Page 5, “Comparison to SimRank”
  4. Thus, we have reduced SimRank’s cubic time complexity to a quadratic time complexity for CoSimRank or — assuming that the average degree d does not depend on n — SimRank’s quadratic time complexity to linear time complexity for the case of computing few similarities.
    Page 5, “Comparison to SimRank”
  5. In summary, CoSimRank and SimRank have similar space and time complexities for computing all n2 similarities.
    Page 5, “Comparison to SimRank”

See all papers in Proc. ACL 2014 that mention time complexity.

See all papers in Proc. ACL that mention time complexity.

Back to top.

word pairs

Appears in 5 sentences as: word pairs (6)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. We use a seed dictionary of 12,630 word pairs to establish node-node correspondences between the two graphs.
    Page 7, “Extensions”
  2. As the seed dictionary contains 12,630 word pairs , this means that only every fourth entry of the PPR vector (the German graph has 47,439 nodes) is used for similarity calculation.
    Page 8, “Extensions”
  3. synonym extraction lexicon extraction (68 word pairs) (1000 word pairs )
    Page 8, “Extensions”
  4. We evaluate on a subset we call TS774 that consists of the 774 test word pairs that are in the intersection of words covered by the
    Page 8, “Extensions”
  5. Most of the 226 missing word pairs are adverbs, prepositions and plural forms that are not covered by our graphs due to the construction algorithm we use: lemmatization, restriction to adjectives, nouns and verbs etc.
    Page 9, “Extensions”

See all papers in Proc. ACL 2014 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

Best result

Appears in 4 sentences as: Best result (4)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. Best result in each column in bold.
    Page 7, “Extensions”
  2. Best result in each column in bold.
    Page 8, “Extensions”
  3. Best result in each column in bold.
    Page 8, “Extensions”
  4. Best result in each column in bold.
    Page 9, “Extensions”

See all papers in Proc. ACL 2014 that mention Best result.

See all papers in Proc. ACL that mention Best result.

Back to top.

edge weight

Appears in 3 sentences as: edge weight (1) edge weighted (1) edge weights (1)
In CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
  1. (2010) extend SimRank to edge weights , edge labels and multiple graphs.
    Page 2, “Related Work”
  2. It is straightforward and easy to implement by replacing the row normalized adjacency matrix A with an arbitrary stochastic matrix P. We can use this edge weighted PageRank for CoSimRank.
    Page 5, “Extensions”
  3. We tried a number of different ways of modifying it for weighted graphs: (i) running the random walks with the weighted adjacency matrix as Markov matrix, (ii) storing the weight (product of each edge weight ) of a random walk and using it as a factor if two walks meet and (iii) a combination of both.
    Page 8, “Extensions”

See all papers in Proc. ACL 2014 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.