Experiment | To measure the latent similarity among documents, we construct topic vectors with the topic probabilistic distribution, and then adopt the Jensen-Shannon divergence to measures it, on the other hand, in the case of using document vectors we adopt cosine similarity . |
Experiment | Table 1: Extracting important sentences Methods Measure Accuracy F-value PageRank J enshen-Shannon 0.567 0.485 Cosine similarity 0.287 0.291 tf. |
Experiment | idf J enshen-Shannon 0.550 0.43 5 Cosine similarity 0.275 0.270 |
Related Work | We computed the similarity between co-occurrence vectors using different metrics: Cosine Similarity , Dice coefficient (Curran, 2004), Kullback—Leibler divergence or KL divergence or relative entropy (Kullback and Leibler, 1951) and the J enson-Shannon divergence (Lee, 1999). |
Related Work | One year data (1991) were used to extract the “noun wo verb” tuples to compute word similarity (using cosine similarity metric) and collocation scores. |
Related Work | These data are necessary to compute the word similarity (using cosine similarity metric) and collocation scores. |
Evaluation | The data was generated by clustering similar news stories from Gigaword using TF-IDF cosine similarity of their headlines. |
Evaluation | Doc-pair Cosine Similarity |
Evaluation | The x-axis represents the cosine similarity between the document pairs. |
Results | Additionally, there is more data in the MTC dataset which has low cosine similarity than in RF. |
Selection of Reliable Training Instances | We can then estimate the topical similarity of two article sets by calculating the cosine similarity of their category frequency vectors C712=Aand6722= Bas |
Selection of Reliable Training Instances | Cosine Similarity |
Selection of Reliable Training Instances | Table 3: Cosine similarity scores between the category frequency vectors of the flawed article sets and the respective random or reliable negatives |
Experiments and Evaluations | Borrowing this idea, for each sub-summary in a human-generated summary, we find its most matched sub-summary (judged by the cosine similarity measure) in the corresponding system-generated summary and then define the correlation according to the concordance between the two |
Experiments and Evaluations | For the semantic-based approach, we compare three different approaches to defining the subtopic number K: (1) Semantic-based 1: Following the approach proposed in (Li et al., 2007), we first derive the matrix of tweet cosine similarity . |
Sequential Summarization | For a tweet in a peak area, the linear combination of two measures is considered to evaluate its significance to be a sub-summary: (l) subtopic representativeness measured by the cosine similarity between the tweet and the centroid of all the tweets in the same peak area; (2) crowding endorsement measured by the times that the tweet is re-tweeted normalized by the total number of re-tweeting. |
Experiments | Evaluation: The similarity between a tweet and a news article is measured by cosine similarity . |
Experiments | 4 The cosine similarity |
WTMF on Graphs | In the WTMF model, we would like the latent vectors of two text nodes Q.,j1,Q.,j-2 to be as similar as possible, namely that their cosine similarity to be close to 1. |
Our Method | o 31(mi, mj): The cosine similarity of 75071;) and t(mj); and tweets are represented as TF-IDF vectors; |
Our Method | 0 32(mi, mj): The cosine similarity of 75071;) and t(mj); and tweets are represented as topic distribution vectors; |
Related Work | SemTag uses the TAP knowledge base5, and employs the cosine similarity with TF-IDF weighting scheme to compute the match degree between a mention and an entity, achieving an accuracy of around 82%. |