Abstract | Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. |
Abstract | The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. |
Abstract | For the task of word similarity estimation using a large amount of Web data in Japanese, we show that the proposed measure gives better accuracies than other well-known similarity measures . |
Background | The BC is also a similarity measure on probability distributions and is suitable for our purposes as we describe in the next section. |
Background | Although BC has not been explored well in the literature on distributional word similarities, it is also a good similarity measure as the experiments show. |
Introduction | A number of semantic similarity measures have been proposed based on this hypothesis (Hindle, 1990; Grefenstette, 1994; Dagan et al., 1994; Dagan et al., 1995; Lin, 1998; Dagan et al., 1999). |
Introduction | In general, most semantic similarity measures have the following form: |
Introduction | Our technical contribution in this paper is to show that in the case where the context profiles are multinomial distributions, the priors are Dirichlet, and the base similarity measure is the Bhattacharyya coefficient (Bhattacharyya, 1943), we can derive an analytical form for Eq. |
Method | In this section, we show that if our base similarity measure is BC and the distributions under which we take the expectation are Dirichlet distributions, then Eq. |
Method | To put it all together, we can obtain a new Bayesian similarity measure on words, which can be calculated only from the hyperparameters for the Dirichlet prior, 04 and 6, and the observed counts C(wi, fk). |
Experimental Evaluation | Local algorithms We described 12 distributional similarity measures computed over our corpus (Section 5.1). |
Experimental Evaluation | In addition, we obtained similarity lists learned by Lin and Pantel (2001), and replicated 3 similarity measures learned by Szpektor and Dagan (2008), over the RCV1 corpus7. |
Experimental Evaluation | For each distributional similarity measure (altogether 16 measures), we learned a graph by inserting any edge (u, v) , when u is in the top K templates most similar to 2). |
Learning Entailment Graph Edges | Similarity function We consider two similarity functions: The Lin (2001) similarity measure, and the Balanced Inclusion (BInc) similarity measure (Szpektor and Dagan, 2008). |
Named Entity Disambiguation by Leveraging Semantic Knowledge | Because the key problem of named entity disambiguation is to measure the similarity between name observations, we integrate the structural semantic relatedness in the similarity measure , so that it can better reflect the actual similarity between name observations. |
The Structural Semantic Relatedness Measure | The lexical relatedness lr between two WordNet concepts are measured using the Lin (l998)’s WordNet semantic similarity measure . |
The Structural Semantic Relatedness Measure | The problem of quantifying the relatedness between nodes in a graph is not a new problem, e.g., the structural equivalence and structural similarity (the SimRank in Jeh and Widom (2002) and the similarity measure in Leicht et al. |
The Structural Semantic Relatedness Measure | However, these similarity measures are not suitable for our task, because all of them assume that the edges are uniform so that they cannot take edge weight into consideration. |
Algorithm | sim(h, h’) is a similarity measure between argument heads. |
Algorithm | The similarity measure we use is based on the slot distributions of the arguments. |
Algorithm | The similarity measure between two head words is then defined as the cosine measure of their vectors. |
Evaluation | The Levenshtein similarity measure , on the other hand, is too restrictive and thus results in comparatively high precisions, but very low recall. |
Temporal Script Graphs | On the basis of this pseudo-parse, we compute the similarity measure sim: |
Temporal Script Graphs | The semantic constraints check whether the event descriptions of the merged node would be sufficiently consistent according to the similarity measure from Section 5.2. |
Genre Distance Measures | We can derive such distance measures from the genre hierarchy in a way similar to word similarity measures that were invented for lexical hierarchies such as WordNet (see (Pedersen et al., 2007) for an overview). |
Genre Distance Measures | The Leaeoek & Chodorow similarity measure (Leacock and Chodorow, 1998) normalizes the path length measure (6) by the maximum number of nodes D when traversing down from the root. |
Genre Distance Measures | Several other similarity measures have been proposed based on the Resnik similarity such as the one by (Lin, 1998): |