Index of papers in Proc. ACL 2013 that mention
  • loss function
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Adaptive Online Algorithms
1We specify the loss function for MT in section 3.1.
Adaptive Online Algorithms
The relationship to SGD can be seen by lineariz-ing the loss function 6421)) m €t(wt_1) + (w —wt_1)TV€t(wt_1) and taking the derivative of (6).
Adaptive Online Algorithms
MIRA/AROW requires selecting the loss function 6 so that wt can be solved in closed-form, by a quadratic program (QP), or in some other way that is better than linearizing.
Adaptive Online MT
AdaGrad (lines 9—10) is a crucial piece, but the loss function , regularization technique, and parallelization strategy described in this section are equally important in the MT setting.
Adaptive Online MT
3.1 Pairwise Logistic Loss Function
Adaptive Online MT
The pairwise approach results in simple, convex loss functions suitable for online learning.
loss function is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Durrett, Greg and Hall, David and Klein, Dan
Experiments
However, we found no simple way to change the relative performance characteristics of our various systems; notably, modifying the parameters of the loss function mentioned in Section 4 or changing it entirely did not trade off these three metrics but merely increased or decreased them in lockstep.
Learning
This optimizes for the 0-1 loss; however, we are much more interested in optimizing with respect to a coreference-specific loss function .
Learning
We modify Equation 1 to use a new probability distribution P’ instead of P, where P’(a|xi) oc P(a|mi)exp(l(a,C)) and l(a, C) is a loss function .
Learning
In order to perform inference efficiently, l(a, 0) must decompose linearly across mentions: l(a, C) = 2:121 l(az-, C Commonly-used coreference metrics such as MUC (Vilain et al., 1995) and B3 (Bagga and Baldwin, 1998) do not have this property, so we instead make use of a parameterized loss function that does and fit the parameters to give good performance.
Related Work
Our BASIC model is a mention-ranking approach resembling models used by Denis and Baldridge (2008) and Rahman and Ng (2009), though it is trained using a novel parameterized loss function .
loss function is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
He, Zhengyan and Liu, Shujie and Li, Mu and Zhou, Ming and Zhang, Longkai and Wang, Houfeng
Learning Representation for Contextual Document
The loss function is defined as negative log of 30 f tmax function:
Learning Representation for Contextual Document
The loss function is closely related to contrastive estimation (Smith and Eisner, 2005), which defines where the positive example takes probability mass from.
Learning Representation for Contextual Document
In our experiments, the so ftmazc loss function consistently outperforms pairwise ranking loss function , which is taken as our default setting.
loss function is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lampos, Vasileios and Preoţiuc-Pietro, Daniel and Cohn, Trevor
Experiments
The loss function in our evaluation is the standard Mean Square Error (MSE), but to allow a better interpretation of the results, we display its root (RMSE) in tables and figures.6
Methods
2 is the standard regularisation loss function , namely the sum squared error over the training instances.4
Methods
Biconvex functions and possible applications have been well studied in the optimisation literature (Quesada and Grossmann, 1995; 4Note that other loss functions could be used here, such as logistic loss for classification, or more generally bilinear
loss function is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng
Deceptive Answer Prediction with User Preference Graph
where L(waZ-,yi) is a loss function that measures discrepancy between the predicted label wT - x,- and the true label yi, where yz- 6 {+1, —l}.
Deceptive Answer Prediction with User Preference Graph
The common used loss functions include L(p, y) = (p — y)2 (least square), L(p, y) = ln (1 + exp (—py)) (logistic regression).
Deceptive Answer Prediction with User Preference Graph
For simplicity, here we use the least square loss function .
loss function is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Plank, Barbara and Moschitti, Alessandro
Related Work
Instance weighting is a method for domain adaptation in which instance-dependent weights are assigned to the loss function that is minimized during the training process.
Related Work
Let [(30, y, 6) be some loss function .
Related Work
Then, as shown in Jiang and Zhai (2007), the loss function can be weighted by 6il(:c,y,6), such that [3, = where P8 and Pt are the source and target distributions, respectively.
loss function is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: