Method | As a similarity function, we use cosine similarity weighted with TF*IDF. |
Method | We define sim(efl, ej/lx) as the cosine similarity between excerpts 63-; from topic 253- and ej/l/ from topic if. |
Method | If excerpts ejl and ej/l/ have cosine similarity |
Rank(eij1...€ij7~,Wj) | For each topic present in the human-authored article, the Oracle selects the excerpt from our full model’s candidate excerpts with the highest cosine similarity to the human-authored text. |
Application Oriented Evaluations | During classification cosine similarity is measured between the feature vector of the classified document and the expanded vectors of all categories. |
Application Oriented Evaluations | The first avoids any expansion, classifying documents based on cosine similarity with category names only. |
The asterisk denotes an incorrect rule | We also examined another filtering score, the cosine similarity between the vectors representing the two rule sides in LSA (Latent Semantic Analysis) space (Deerwester et al., 1990). |