Consensus Decoding Algorithms | For MBR decoding, we instead leverage a similarity measure 8(6; 6’) to choose a translation using the model’s probability distribution P(e| f), which has support over a set of possible translations E. The Viterbi derivation 6* is the mode of this distribution. |
Consensus Decoding Algorithms | Given any similarity measure 8 and a k-best list E, the minimum Bayes risk translation can be found by computing the similarity between all pairs of sentences in E, as in Algorithm 1. |
Consensus Decoding Algorithms | An example of a linear similarity measure is bag-of-words precision, which can be written as: |
Introduction | The Bayes optimal decoding objective is to minimize risk based on the similarity measure used for evaluation. |
Introduction | Unfortunately, with a nonlinear similarity measure like BLEU, we must resort to approximating the expected loss using a k-best list, which accounts for only a tiny fraction of a model’s full posterior distribution. |
Introduction | We show that if the similarity measure is linear in features of a sentence, then computing expected similarity for all k sentences requires only k similarity evaluations. |
Problem Formulation | For each term t in the dictionary and each SMS token 3,, we define a similarity measure a(t, 3,) that measures how closely the term 75 matches the SMS token 3,. |
Problem Formulation | Combining the similarity measure and the inverse document frequency (idf) of t in the corpus, we define a weight function to (t, 3,). |
Problem Formulation | The similarity measure and the weight function are discussed in detail in Section 5.1. |
System Implementation | The weight function is a combination of similarity measure between t and Si and Inverse Document Frequency (idf) of t. The next two subsections explain the calculation of the similarity measure and the idf in detail. |
System Implementation | 5.1.1 Similarity Measure |
System Implementation | For term t E D and token 3%- of the SMS, the similarity measure a(t, 81) between them is |
A Statistical Inclusion Measure | Our research goal was to develop a directional similarity measure suitable for learning asymmetric relations, focusing empirically on lexical expansion. |
Abstract | This paper investigates the nature of directional (asymmetric) similarity measures , which aim to quantify distributional feature inclusion. |
Background | Then, word vectors are compared by some vector similarity measure . |
Conclusions and Future work | This paper advocates the use of directional similarity measures for lexical expansion, and potentially for other tasks, based on distributional inclusion of feature vectors. |
Evaluation and Results | We tested our similarity measure by evaluating its utility for lexical expansion, compared with baselines of the LIN, WeedsPrec and balPrec measures |
Evaluation and Results | Next, for each similarity measure , the terms found similar to any of the event’s seeds (‘u —> seed’) were taken as expansion terms. |
Introduction | Often, distributional similarity measures are used to identify expanding terms (e.g. |
Introduction | More generally, directional relations are abundant in NLP settings, making symmetric similarity measures less suitable for their identification. |
Introduction | Despite the need for directional similarity measures , their investigation counts, to the best of our knowledge, only few works (Weeds and Weir, 2003; Geffet and Dagan, 2005; Bhagat et al., 2007; Szpektor and Dagan, 2008; Michelbacher et al., 2007) and is utterly lacking. |
Abstract | The best results are obtained with a novel second-order distributional similarity measure , and the positive effect is specially relevant for out-of-domain data. |
Conclusions and Future Work | We have empirically shown how automatically generated selectional preferences, using WordNet and distributional similarity measures , are able to effectively generalize lexical features and, thus, improve classification performance in a large-scale argument classification task on the CoNLL-2005 dataset. |
Related Work | Pantel and Lin (2000) obtained very good results using the distributional similarity measure defined by Lin (1998). |
Results and Discussion | The second-order distributional similarity measures perform best overall, both in precision and recall. |
Selectional Preference Models | We will refer to this similarity measure as simg‘n. |
Selectional Preference Models | We will refer to these similarity measures as simE-ZE and simi’gg hereinafter. |
Evaluation setting and results | Our method was evaluated for each (P1, P2, P3) combination and similarity measures J0 and 197,, separately. |
Introduction and related work | In this paper, we propose a novel unsupervised approach that compares the major senses of a MWE and its semantic head using distributional similarity measures to test the compositionality of the MWE. |
Proposed approach | Lee (1999) shows that J performs better than other symmetric similarity measures such as cosine, Jensen-Shannon divergence, etc. |
Proposed approach | Given the major uses of a MWE and its semantic head, the MWE is considered as compositional, when the corresponding distributional similarity measure (Jc or 197,) value is above a parameter threshold, sim. |
Unsupervised parameter tuning | The best performing distributional similarity measure is an. |
Experiments | Siml and Sim2 respectively mean Formula 3.1 and Formula 3.2 are used in postprocessing as the similarity measure between |
Experiments | Sim2 achieves more performance improvement than Siml, which demonstrates the effectiveness of the similarity measure in Formula 3.2. |
Our Approach | One simple and straightforward similarity measure is the J accard |