Adaptive Online Algorithms | 1We specify the loss function for MT in section 3.1. |
Adaptive Online Algorithms | The relationship to SGD can be seen by lineariz-ing the loss function 6421)) m €t(wt_1) + (w —wt_1)TV€t(wt_1) and taking the derivative of (6). |
Adaptive Online Algorithms | MIRA/AROW requires selecting the loss function 6 so that wt can be solved in closed-form, by a quadratic program (QP), or in some other way that is better than linearizing. |
Adaptive Online MT | AdaGrad (lines 9—10) is a crucial piece, but the loss function , regularization technique, and parallelization strategy described in this section are equally important in the MT setting. |
Adaptive Online MT | 3.1 Pairwise Logistic Loss Function |
Adaptive Online MT | The pairwise approach results in simple, convex loss functions suitable for online learning. |
Experiments | However, we found no simple way to change the relative performance characteristics of our various systems; notably, modifying the parameters of the loss function mentioned in Section 4 or changing it entirely did not trade off these three metrics but merely increased or decreased them in lockstep. |
Learning | This optimizes for the 0-1 loss; however, we are much more interested in optimizing with respect to a coreference-specific loss function . |
Learning | We modify Equation 1 to use a new probability distribution P’ instead of P, where P’(a|xi) oc P(a|mi)exp(l(a,C)) and l(a, C) is a loss function . |
Learning | In order to perform inference efficiently, l(a, 0) must decompose linearly across mentions: l(a, C) = 2:121 l(az-, C Commonly-used coreference metrics such as MUC (Vilain et al., 1995) and B3 (Bagga and Baldwin, 1998) do not have this property, so we instead make use of a parameterized loss function that does and fit the parameters to give good performance. |
Related Work | Our BASIC model is a mention-ranking approach resembling models used by Denis and Baldridge (2008) and Rahman and Ng (2009), though it is trained using a novel parameterized loss function . |
Learning Representation for Contextual Document | The loss function is defined as negative log of 30 f tmax function: |
Learning Representation for Contextual Document | The loss function is closely related to contrastive estimation (Smith and Eisner, 2005), which defines where the positive example takes probability mass from. |
Learning Representation for Contextual Document | In our experiments, the so ftmazc loss function consistently outperforms pairwise ranking loss function , which is taken as our default setting. |
Experiments | The loss function in our evaluation is the standard Mean Square Error (MSE), but to allow a better interpretation of the results, we display its root (RMSE) in tables and figures.6 |
Methods | 2 is the standard regularisation loss function , namely the sum squared error over the training instances.4 |
Methods | Biconvex functions and possible applications have been well studied in the optimisation literature (Quesada and Grossmann, 1995; 4Note that other loss functions could be used here, such as logistic loss for classification, or more generally bilinear |
Deceptive Answer Prediction with User Preference Graph | where L(waZ-,yi) is a loss function that measures discrepancy between the predicted label wT - x,- and the true label yi, where yz- 6 {+1, —l}. |
Deceptive Answer Prediction with User Preference Graph | The common used loss functions include L(p, y) = (p — y)2 (least square), L(p, y) = ln (1 + exp (—py)) (logistic regression). |
Deceptive Answer Prediction with User Preference Graph | For simplicity, here we use the least square loss function . |
Related Work | Instance weighting is a method for domain adaptation in which instance-dependent weights are assigned to the loss function that is minimized during the training process. |
Related Work | Let [(30, y, 6) be some loss function . |
Related Work | Then, as shown in Jiang and Zhai (2007), the loss function can be weighted by 6il(:c,y,6), such that [3, = where P8 and Pt are the source and target distributions, respectively. |