Experiments and Evaluations | We first describe our experimental settings and define evaluation metrics to evaluate induced soft clusterings of verb classes. |
Experiments and Evaluations | This kind of normalization for soft clusterings was performed for other evaluation metrics as in Springorum et al. |
Experiments and Evaluations | (2003) evaluated hard clusterings based on a gold standard with multiple classes per verb. |
Introduction | Moreover, to the best of our knowledge, none of the following approaches attempt to quantitatively evaluate soft clusterings of verb classes induced by polysemy-aware unsupervised approaches (Korhonen et al., 2003; Lapata and Brew, 2004; Li and Brew, 2007; Schulte im Walde et al., 2008). |
Consensus Clustering | Our model gives a distribution over phylogenies p (given observations :13 and learned parameters (ID—and thus gives a posterior distribution over clusterings e, which can be used to answer various queries. |
Consensus Clustering | More similar clusterings achieve larger R, with R(e’, e) = 1 iff e’ = e. In all cases, 0 S R(e’,e) = R(e,e’) g 1. |
Consensus Clustering | As explained above, the sij are coreference probabilities sij that can be estimated from a sample of clusterings 6. |
Experiments | For PHYLO, the entity clustering is the result of (1) training the model using EM, (2) sampling from the posterior to obtain a distribution over clusterings , and (3) finding a consensus clustering. |