Experiments | As shown in Table 1, optimizing this objective function gives a ROUGE-1 F—measure score 32.44%. |
Experiments | Figure 1: ROUGE-1 F-measure scores on DUC—03 when oz and K vary in objective function 51(8) —|— AR1(S), |
Introduction | Of course, none of this is useful if the objective function .7: is inappropriate for the summarization task. |
Monotone Submodular Objectives | Objective functions for extractive summarization usually measure these two separately and then mix them together trading off encouraging relevance and penalizing redundancy. |
Monotone Submodular Objectives | The redundancy penalty usually violates the monotonicity of the objective functions (Carbonell and Goldstein, 1998; Lin and Bilmes, 2010). |
Submodularity in Summarization | (1999) on the budgeted maximum cover problem to the general submodular framework, and show a practical greedy algorithm with a (1 — 1/ fi)-approximation factor, where each greedy step adds the unit with the largest ratio of objective function gain to scaled cost, while not violating the budget constraint (see (Lin and Bilmes, 2010) for details). |
Submodularity in Summarization | In particular, Carbonell and Goldstein (1998) define an objective function gain of adding element k to set S (k ¢ 8) as: |
Submodularity in Summarization | Although the authors may not have noticed, their objective functions are also submodular, adding more evidence suggesting that submodularity is natural for summarization tasks. |
Introduction | The full objective function of the model thus learns semantic vectors that are imbued with nuanced sentiment information. |
Our Model | We can efficiently learn parameters for the joint objective function using alternating maximization. |
Our Model | This produces a final objective function of, |
Related work | We adopt this insight, but we are able to incorporate it directly into our model’s objective function . |
Experimental Setup | The reason is that the objective function maximizes mutual information. |
Experimental Setup | Highly differentiated classes for frequent words contribute substantially to this objective function whereas putting all rare words in a few large clusters does not hurt the objective much. |
Experimental Setup | However, our focus is on using clustering for improving prediction for rare events; this means that the objective function is counterproductive when contexts are frequency-weighted as they occur in the corpus. |
PCS Induction | We trained this model by optimizing the following objective function: |
PCS Projection | The first term in the objective function is the graph smoothness regularizer which encourages the distributions of similar vertices (large wij) to be similar. |
PCS Projection | While it is possible to derive a closed form solution for this convex objective function , it would require the inversion of a matrix of order |Vf|. |
Constraints on Inter-Domain Variability | We augment the multi-conditional log-likelihood L(6, 04) with the weighted regularization term G(6) to get the composite objective function: |
Empirical Evaluation | The initial learning rate and the weight decay (the inverse squared variance of the Gaussian prior) were set to 0.01, and both parameters were reduced by the factor of 2 every iteration the objective function estimate went down. |
Learning and Inference | The stochastic gradient descent algorithm iterates over examples and updates the weight vector based on the contribution of every considered example to the objective function L R(6, 04, 6). |