Abstract | Sentence Similarity is the process of computing a similarity score between two sentences. |
Evaluation for SS | A subset of 30 pairs is further selected by L106 to render the similarity scores evenly distributed. |
Experiments and Results | The performance of WTMF on CDR is compared with (a) an Information Retrieval model (IR) that is based on surface word matching, (b) an n-gram model (N-gram) that captures phrase overlaps by returning the number of overlapping ngrams as the similarity score of two sentences, (c) LSA that uses svds() function in Matlab, and (d) LDA that uses Gibbs Sampling for inference (Griffiths and Steyvers, 2004). |
Experiments and Results | Using a smaller wm means the similarity score is computed mainly from semantics of the observed words. |
Experiments and Results | This benefits CDR, since it gives more accurate similarity scores for those similar pairs, but not so accurate for dissimilar pairs. |
Experiments | However, common to all datasets is that similarity scores are given to pairs of words in isolation. |
Experiments | Single-prototype models would give the max similarity score for those pairs, which can be problematic depending on the words’ contexts. |
Experiments | For evaluation, we also compute Spearman correlation between a model’s computed similarity scores and human judgments. |