Budgeted Submodular Maximization with Cost Function | The algorithm it-eratively adds to the current summary the element 3,- that has the largest ratio of the objective function gain to the additional cost, unless adding it violates the budget constraint. |
Budgeted Submodular Maximization with Cost Function | After the loop, the algorithm compares Gi with the {3*} that has the largest value of the objective function among all subtrees that are under the budget, and it outputs the summary candidate with the largest value. |
Joint Model of Extraction and Compression | 4.1 Objective Function |
Joint Model of Extraction and Compression | We designed our objective function by combining this relevance score with a penalty for redundancy and too-compressed sentences. |
Joint Model of Extraction and Compression | The behavior can be represented by a submodular objective function that reduces word scores depending on those already included in the summary. |
Experiments | of our system that approximates the submodular objective function proposed by (Lin and Bilmes, 2011).7 As shown in the results, our best system8 which uses the hs dispersion function achieves a better ROUGE-1 F-score than all other systems. |
Experiments | (3) We also analyze the contributions of individual components of the new objective function towards summarization performance by selectively setting certain parameters to 0. |
Experiments | However, since individual components within our objective function are parametrized it is easy to tune them for a specific task or genre. |
Framework | We start by describing a generic objective function that can be widely applied to several summarization scenarios. |
Framework | This objective function is the sum of a monotone submodular coverage function and a non-submodular dispersion function. |
Framework | We then describe a simple greedy algorithm for optimizing this objective function with provable approximation guarantees for three natural dispersion functions. |
Using the Framework | generate a graph and instantiate our summarization objective function with specific components that capture the desiderata of a given summarization task. |
Using the Framework | We model this property in our objective function as follows. |
Using the Framework | We then add this component to our objective function as w(S) = Zues Mu)- |
Experiments | As shown in Table 1, optimizing this objective function gives a ROUGE-1 F—measure score 32.44%. |
Experiments | Figure 1: ROUGE-1 F-measure scores on DUC—03 when oz and K vary in objective function 51(8) —|— AR1(S), |
Introduction | Of course, none of this is useful if the objective function .7: is inappropriate for the summarization task. |
Monotone Submodular Objectives | Objective functions for extractive summarization usually measure these two separately and then mix them together trading off encouraging relevance and penalizing redundancy. |
Monotone Submodular Objectives | The redundancy penalty usually violates the monotonicity of the objective functions (Carbonell and Goldstein, 1998; Lin and Bilmes, 2010). |
Submodularity in Summarization | (1999) on the budgeted maximum cover problem to the general submodular framework, and show a practical greedy algorithm with a (1 — 1/ fi)-approximation factor, where each greedy step adds the unit with the largest ratio of objective function gain to scaled cost, while not violating the budget constraint (see (Lin and Bilmes, 2010) for details). |
Submodularity in Summarization | In particular, Carbonell and Goldstein (1998) define an objective function gain of adding element k to set S (k ¢ 8) as: |
Submodularity in Summarization | Although the authors may not have noticed, their objective functions are also submodular, adding more evidence suggesting that submodularity is natural for summarization tasks. |
Inference with First Order Variables | Express the inference objective as a linear objective function ; and |
Inference with First Order Variables | For the grammatical error correction task, the variables in ILP are indicators of the corrections that a word needs, the objective function measures how grammatical the whole sentence is if some corrections are accepted, and the constraints guarantee that the corrections do not conflict with each other. |
Inference with First Order Variables | 3.2 The Objective Function |
Inference with Second Order Variables | (17) A new objective function combines the weights from both first and second order variables: |
Introduction | Variables of ILP are indicators of possible grammatical error corrections, the objective function aims to select the best set of corrections, and the constraints help to enforce a valid and grammatical output. |
Image-level Content Planning | 4.1 Variables and Objective Function The following set of indicator variables encodes the selection of objects and ordering: |
Image-level Content Planning | The objective function , F, that we will maximize is a weighted linear combination of these indicator variables and can be optimized using integer linear programming: |
Image-level Content Planning | We use IBM CPLEX to optimize this objective function subject to the constraints introduced next in §4.2. |
Surface Realization | 5.1 Variables and Objective Function The following set of variables encodes the selection of phrases and their ordering in constructing 5’ sentences. |
Surface Realization | Finally, we define the objective function F as: |
Surface Realization | the objective function (Eq. |
Our Approach | Wh611 noring the coupling between Vp, it can be solved by minimizing the objective function as follows: |
Our Approach | Combining equations (1) and (2), we get the following objective function: |
Our Approach | If we set a small value for Ap, the objective function behaves like the traditional NMF and the importance of data sparseness is emphasized; while a big value of Ap indicates Vp should be very closed to V1, and equation (3) aims to remove the noise introduced by statistical machine translation. |
Discussion and Related Work | Ando and Zhang (2005) defined an objective function that combines the original problem on the labeled data with a set of auxiliary problems on unlabeled data. |
Discussion and Related Work | The combined objective function is then alternatingly optimized with the labeled and unlabeled data. |
Introduction | The learning algorithm then optimizes a regularized, convex objective function that is expressed in terms of these features. |
Introduction | distributed clustering algorithm with a similar objective function as the Brown algorithm. |
Query Classification | We made a small modification to the objective function for logistic regression to take into account the prior distribution and to use 50% as a uniform decision boundary for all the classes. |
Query Classification | When training the classifier for a class with [9 positive examples out of a total of n examples, we change the objective function to: |
Query Classification | We suspect that such features make the optimization of the objective function much more difficult. |
Conclusion | We are also interested in ways to modify the objective function to place more emphasis on learning a good joint model, instead of equally weighting the learning of the joint and single-task models. |
Hierarchical Joint Learning | L-BFGS and gradient descent, two frequently used numerical optimization algorithms, require computing the value and partial derivatives of the objective function using the entire training set. |
Hierarchical Joint Learning | It requires a stochastic objective function, which is meant to be a low computational cost estimate of the real objective function . |
Hierarchical Joint Learning | In most NLP models, such as logistic regression with a Gaussian prior, computing the stochastic objective function is fairly straightforward: you compute the model likelihood and partial derivatives for a randomly sampled subset of the training data. |
Background | This objective function can be optimized by the stochastic gradient method or other numerical optimization methods. |
Method | The squared-loss criterion1 is used to formulate the objective function . |
Method | Thus, the objective function can be optimized by L-BFGS-B (Zhu et al., 1997), a generic quasi-Newton gradient-based optimizer. |
Method | The first term in Equation (5) is the same as Equation (2), which is the traditional CRFs leam-ing objective function on the labeled data. |
Related Work | And third, the derived label information from the graph is smoothed into the model by optimizing a modified objective function . |
Introduction | Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function . |
Proposed Method 2.1 Bigram Gain Maximization by ILP | where 0;, is an auxiliary variable we introduce that is equal to |nbflaef — :8 2(3) * 715,8 , and nbyef is a constant that can be dropped from the objective function . |
Proposed Method 2.1 Bigram Gain Maximization by ILP | To train this regression model using the given reference abstractive summaries, rather than trying to minimize the squared error as typically done, we propose a new objective function . |
Proposed Method 2.1 Bigram Gain Maximization by ILP | The objective function for training is thus to minimize the KL distance: |
Related Work | They used a modified objective function in order to consider whether the selected sentence is globally optimal. |
Introduction | Also, SGD is very easy to implement because it does not need to use the Hessian information on the objective function . |
Log-Linear Models | SGD uses a small randomly-selected subset of the training samples to approximate the gradient of the objective function given by Equation 2. |
Log-Linear Models | The learning rate parameters for SGD were then tuned in such a way that they maximized the value of the objective function in 30 passes. |
Log-Linear Models | Figure 3 shows how the value of the objective function changed as the training proceeded. |
Approach | Further, these approaches typically depend on specific semantic signals such as sentiment- or topic-labels for their objective functions . |
Approach | This results in the following objective function: |
Approach | The objective function in Equation 2 could be coupled with any two given vector composition functions f, g from the literature. |
Conclusion | To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models. |
Overview | We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings. |
Discussion and Future Work | Accordingly, our objective function is replaced by: |
The Algorithm | 3.4 The Objective Function |
The Algorithm | We now define our objective function in terms of the variables. |
The Algorithm | We are also constrained by the linear programming framework, hence we set the objective function as |
Abstract | Objective function We denote by 0 the set of all the parameters to be optimized, including forward phrase and lexicon translation probabilities and their backward counterparts. |
Abstract | Therefore, we design the objective function to be maximized as: |
Abstract | First, we propose a new objective function (Eq. |
Background | The objective function is the sum of weights over the edges of g and the constraint 137;]- + mjk — mik g 1 on the binary variables enforces that whenever 137; j = mjk = 1, then also 137;], = 1 (transitivity). |
Sequential Approximation Algorithms | Then, at each iteration a single node v is reattached (see below) to the FRG in a way that improves the objective function . |
Sequential Approximation Algorithms | This is repeated until the value of the objective function cannot be improved anymore by reattaching a node. |
Sequential Approximation Algorithms | Clearly, at each reattachment the value of the objective function cannot decrease, since the optimization algorithm considers the previous graph as one of its candidate solutions. |
Stochastic Optimization Methods | Stochastic optimization methods have proven to be extremely efficient for the training of models involving computationally expensive objective functions like those encountered with our task (Vishwanathan et al., 2006) and, in fact, the online backpropagation learning used in the neural network parser of Henderson (2004) is a form of stochastic gradient descent. |
Stochastic Optimization Methods | In our experiments SGD converged to a lower objective function value than L-BFGS, however it required far |
Stochastic Optimization Methods | Utilization of stochastic optimization routines requires the implementation of a stochastic objective function . |
The Model | 2.2 Computing the Objective Function |
Abstract | It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function . |
Existing algorithms 3.1 Yarowsky | (2007) provide an objective function for this algorithm using a generalized definition of cross-entropy in terms of Bregman distance, which motivates our objective in section 4. |
Graph propagation | 6.5 Objective function |
Introduction | Variants of this algorithm have been formalized as optimizing an objective function in previous work by Abney (2004) and Haffari and Sarkar (2007), but it is not clear that any perform as well as the Yarowsky algorithm itself. |
Introduction | well-understood as minimizing an objective function at each iteration, and it obtains state of the art performance on several different NLP data sets. |
Connotation Induction Algorithms | Objective function : We aim to maximize: F : (pprosody + (pcoord + (Dneu |
Connotation Induction Algorithms | (Dneu : a Z wzged _ zj m Soft constraints (edge weights): The weights in the objective function are set as follows: |
Precision, Coverage, and Efficiency | Objective function : We aim to maximize: |
Precision, Coverage, and Efficiency | Hard constraints We add penalties to the objective function if the polarity of a pair of words is not consistent with its corresponding semantic relations. |
Precision, Coverage, and Efficiency | Notice that dszjlr, d317,; satisfying above inequalities will be always of negative values, hence in order to maximize the objective function , the LP solver will try to minimize the absolute values of dsjj, dsgf, effectively pushing i and j toward the same polarity. |
Introduction | SUMMA hierarchically clusters the sentences by time, and then summarizes the clusters using an objective function that optimizes salience and coherence. |
Summarizing Within the Hierarchy | 4.4 Objective Function |
Summarizing Within the Hierarchy | Having estimated salience, redundancy, and two forms of coherence, we can now put this information together into a single objective function that measures the quality of a candidate hierarchical summary. |
Summarizing Within the Hierarchy | Intuitively, the objective function should balance salience and coherence. |
Conditional Random Fields | The objective function is then a smooth convex function to be minimized over an unconstrained |
Conditional Random Fields | In the following, we will jointly use both penalty terms, yielding the so-called elastic net penalty (Zhou and Hastie, 2005) which corresponds to the objective function |
Conditional Random Fields | However, the introduction of a 61 penalty term makes the optimization of (6) more problematic, as the objective function is no longer differentiable in 0. |
Experimental Setup | The reason is that the objective function maximizes mutual information. |
Experimental Setup | Highly differentiated classes for frequent words contribute substantially to this objective function whereas putting all rare words in a few large clusters does not hurt the objective much. |
Experimental Setup | However, our focus is on using clustering for improving prediction for rare events; this means that the objective function is counterproductive when contexts are frequency-weighted as they occur in the corpus. |
Model | The objective function is defined as a linear combination of the potentials from different predictors with a parameter A to balance the contribution of two components: opinion entity identification and opinion relation extraction. |
Results | The objective function of ILP-W/O-ENTITY can be represented as |
Results | For ILP-W-SINGLE-RE, we simply remove the variables associated with one opinion relation in the objective function (1) and constraints. |
Results | The formulation of ILP-W/O-IMPLICIT—RE removes the variables associated with potential 7“,- in the objective function and corresponding constraints. |
Abstract | Model parameters are estimated using a generalized expectation (GE) objective function that penalizes the mismatch between model predictions and linguistic expectation constraints. |
Generalized Expectation Criteria | Generalized expectation criteria (Mann and McCallum, 2008; Druck et al., 2008) are terms in a parameter estimation objective function that express a preference on the value of a model expectation. |
Generalized Expectation Criteria | 2In general, the objective function could also include the likelihood of available labeled data, but throughout this paper we assume we have no parsed sentences. |
Introduction | With GE we may add a term to the objective function that encourages a feature-rich CRF to match this expectation on unlabeled data, and in the process learn about related features. |
Adding Regularization | In this section, we briefly review regularizers and then add two regularizers, inspired by Gaussian (L2, Section 3.1) and Dirichlet priors (Beta, Section 3.2), to the anchor objective function (Equation 3). |
Adding Regularization | Instead of optimizing a function just of the data cc and parameters 6, f (cc, 6), one optimizes an objective function that includes a regularizer that is only a function of parameters: f (w, 6) + 716). |
Adding Regularization | This requires including the topic matrix as part of the objective function . |
Anchor Words: Scalable Topic Models | Once we have established the anchor objective function, in the next section we regularize the objective function . |
A multitask transfer learning solution | We learn the optimal weight vectors {fikfifzv fiT and 3 by optimizing the following objective function: |
A multitask transfer learning solution | The objective function follows standard empirical risk minimization with regularization. |
A multitask transfer learning solution | Recall that we impose a constraint FV = 0 when optimizing the objective function . |
Introduction | Finally, the evolution of the objective function of the adapted K -means is modeled to automatically define the “best” number of clusters. |
Polythetic Post-Retrieval Clustering | To assure convergence, an objective function Q is defined which decreases at each processing step. |
Polythetic Post-Retrieval Clustering | The classical objective function is defined in Equation (1) where wk, is a cluster labeled k, xi 6 wk, is an object in the cluster, mm is the centroid of the cluster wk, and E(., is the Euclidean distance. |
Polythetic Post-Retrieval Clustering | A direct consequence of the change in similarity measure is the definition of a new objective function Q53 to ensure convergence. |
Bilingually-Guided Dependency Grammar Induction | In that case, we can use a single parameter 04 to control both weights for different objective functions . |
Bilingually-Guided Dependency Grammar Induction | When 04 = 1 it is the unsupervised objective function in Formula (6). |
Bilingually-Guided Dependency Grammar Induction | Contrary, if 04 = 0, it is the projection objective function (Formula (7)) for projected instances. |
Unsupervised Dependency Grammar Induction | We select a simple classifier objective function as the unsupervised objective function which is instinctively in accordance with the parsing objective: |
Discussion | In MDL, there is a single objective function to (1) maximize the likelihood of observing the data, and at the same time (2) minimize the length of the model description (which depends on the model size). |
Discussion | However, the search procedure for MDL is usually nontrivial, and for our task of unsupervised tagging, we have not found a direct objective function which we can optimize and produce good tagging results. |
Small Models | Finally, we add an objective function that minimizes the number of grammar variables that are assigned a value of 1. |
Small Models | objective function value of 459.3 |
Experiments | For projective parsing, several algorithms (McDonald and Pereira, 2006; Carreras, 2007; Koo and Collins, 2010; Ma and Zhao, 2012) have been proposed to solve the model training problems (calculation of objective function and gradient) for different factorizations. |
Our Approach | We introduce a multiplier 7 as a tradeoff between the two contributions (parallel and unsupervised) of the objective function K, and the final objective function K I has the following form: |
Our Approach | To train our parsing model, we need to find out the parameters A that minimize the objective function K I in equation (11). |
Our Approach | objective function and the gradient of the objective function . |
Additional Details of the Algorithm | Next, we modify the objective function in Eq. ' |
Additional Details of the Algorithm | Thus the new objective function consists of a sun of L x M 2 terms, each corresponding to a differen combination of inside and outside features. |
Introduction | 2) Optimization of a convex objective function using EM. |
The Learning Algorithm for L-PCFGS | Step 2: Use the EM algorithm to find 75 values that maximize the objective function in Eq. |
Experimental Setup | The weights are estimated by minimizing the objective function |
Results | (2013), however, our objective function yielded consistently better results in all experimental settings. |
Results | 8For this post-hoc analysis, we include a sparsity parameter in the objective function of Equation 5 in order to get more interpretable results; hidden units are therefore maximally activated by a only few concepts. |
Results | The adaptation of NN is straightforward; the new objective function is derived as |
Experimental Results | For a given value of 6 we solve the NNSE(Text) and J NNSE(Brain+Text) objective function as detailed in Equation 1 and 4 respectively. |
Joint NonNegative Sparse Embedding | new objective function is: |
Joint NonNegative Sparse Embedding | With A or D fixed, the objective function for NNSE(Text) and JNNSE(Brain+Text) is convex. |
NonNegative Sparse Embedding | NNSE solves the following objective function: |
Related Work | The SGD uses a small randomly-selected subset of the training samples to approximate the gradient of an objective function . |
System Architecture | .n, parameter estimation is performed by maximizing the objective function, |
System Architecture | The final objective function is as follows: |
System Architecture | t E('wt) = 'w* + H (I — vofimH(w*))('wo — 10*), m=1 where w* is the optimal weight vector, and H is the Hessian matrix of the objective function . |
Experiments | Thus, the better starting point provided by EMGI has more impact than the integer program that includes G1 in its objective function . |
Minimized models for supertagging | There are two complementary ways to use grammar-informed initialization with the IP-minimization approach: (1) using EMGI output as the starting grammar/lexicon and (2) using the tag transitions directly in the IP objective function . |
Minimized models for supertagging | For the second, we modify the objective function used in the two IP-minimization steps to be: |
Minimized models for supertagging | In this way, we combine the minimization and GI strategies into a single objective function that finds a minimal grammar set while keeping the more likely tag bigrams in the chosen solution. |
Probabilistic Cross-Lingual Latent Semantic Analysis | Putting L(C) and R(C) together, we would like to maximize the following objective function which is a regularized log-likelihood: |
Probabilistic Cross-Lingual Latent Semantic Analysis | Specifically, we will search for a set of values for all our parameters that can maximize the objective function defined above. |
Probabilistic Cross-Lingual Latent Semantic Analysis | However, there is no closed form solution in the M-step for the whole objective function . |
Introduction | The full objective function of the model thus learns semantic vectors that are imbued with nuanced sentiment information. |
Our Model | We can efficiently learn parameters for the joint objective function using alternating maximization. |
Our Model | This produces a final objective function of, |
Related work | We adopt this insight, but we are able to incorporate it directly into our model’s objective function . |
Abstract | However, to go beyond tuning weights in the loglinear SMT model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the MT training pipeline is needed. |
Conclusion | While monolingual MEANT alone accurately reflects adequacy via semantic frames and optimizing SMT against MEANT improves translation, the new cross-lingual XMEANT semantic objective function moves closer toward deep integration of semantics into the MT training pipeline. |
Introduction | In order to continue driving MT towards better translation adequacy by deeply integrating semantic frame criteria into the MT training pipeline, it is necessary to have a cross-lingual semantic objective function that assesses the semantic frame similarities of input and output sentences. |
Model Variations | For MT feature weight optimization, we use iterative k-best optimization with an Expected-BLEU objective function (Rosti et al., 2010). |
Neural Network Joint Model (NNJ M) | While we cannot train a neural network with this guarantee, we can explicitly encourage the log-softmaX normalizer to be as close to 0 as possible by augmenting our training objective function: |
Neural Network Joint Model (NNJ M) | Note that 04 = 0 is equivalent to the standard neural network objective function . |
Relation Identification | The score of graph G (encoded as 2) can be written as the objective function quz, where gbe = ¢Tg(e). |
Relation Identification | To handle the constraint Az g b, we introduce multipliers p 2 0 to get the Lagrangian relaxation of the objective function: |
Relation Identification | L(z) is an upper bound on the unrelaxed objective function quz, and is equal to it if and only if the constraints AZ g b are satisfied. |
Intervention Prediction Models | Similar to the traditional maximum margin based Support Vector Machine (SVM) formulation, our model’s objective function is defined as: |
Intervention Prediction Models | Replacing the term fw (253,193) with the contents of Equation 1 in the minimization objective above, reveals the key difference from the traditional SVM formulation - the objective function has a maximum term inside the global minimization problem making it non-convex. |
Intervention Prediction Models | The algorithm then performs two step iteratively - first it determines the structural assignments for the negative examples, and then optimizes the fixed objective function using a cutting plane algorithm. |
Background | Given a sentence e of length |e| = I and a sentence f of length |f| = J, our goal is to find the best bidirectional alignment between the two sentences under a given objective function . |
Background | The HMM objective function f : X —> R can be written as a linear function of :c |
Background | Similarly define the objective function |
Pairwise Markov Random Fields and Loopy Belief Propagation | We next define our objective function . |
Pairwise Markov Random Fields and Loopy Belief Propagation | and x to observed ones X (variables with known labels, if any), our objective function is associated with the following joint probability distribution |
Pairwise Markov Random Fields and Loopy Belief Propagation | Finding the best assignments to unobserved variables in our objective function is the inference problem. |
Bilingually-constrained Recursive Auto-encoders | After that, we introduce the BRAE on the network structure, objective function and parameter inference. |
Bilingually-constrained Recursive Auto-encoders | In the semi-supervised RAE for phrase embedding, the objective function over a (phrase, label) pair (av, 25) includes the reconstruction error and the prediction error, as illustrated in Fig. |
Bilingually-constrained Recursive Auto-encoders | 3.3.1 The Objective Function |
Datasets | Due to this discrepancy, the objective function in Eq. |
Experiments | For this model, we also introduce a hyperparameter 6 that weights the error at annotated nodes (1 — 6) higher than the error at unannotated nodes (6); since we have more confidence in the annotated labels, we want them to contribute more towards the objective function . |
Recursive Neural Networks | This induces a supervised objective function over all sentences: a regularized sum over all node losses normalized by the number of nodes N in the training set, |
Method | With the addition of the (0 prior, the MAP (maximum a posteriori) objective function is |
Method | Let F (6) be the objective function in |
Method | (Note that we don’t allow m = 0 because this can cause 6" + 6m to land on the boundary of the probability simplex, where the objective function is undefined.) |
Conclusion | By both changing the objective function to include a bias toward sparser models and improving the pruning techniques and efficiency, we achieve significant gains on test data with practical speed. |
Experiments | Given an unlimited amount of time, we would tune the prior to maximize end-to-end performance, using an objective function such as BLEU. |
Phrasal Inversion Transduction Grammar | First we change the objective function by incorporating a prior over the phrasal parameters. |
AL-SMT: Multilingual Setting | This goal is formalized by the following objective function: |
AL-SMT: Multilingual Setting | The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero. |
Sentence Selection: Multiple Language Pairs | The goal is to optimize the objective function (1) with minimum human effort in providing the translations. |
Image Clustering with Annotated Auxiliary Data | Based on the graphical model representation in Figure 3, we derive the log-likelihood objective function , in a similar way as in (Cohn and Hofmann, 2000), as follows |
Image Clustering with Annotated Auxiliary Data | objective function ignores all the biases from the |
Image Clustering with Annotated Auxiliary Data | points based on the objective function £ in Equation (5). |
Conclusion | SMT comparably or better than the state-of-the-art beam-search strategy, converging on solutions with higher objective function in a shorter time. |
Experiments | Both algorithms do not show any clear score improvement with increasing running time which suggests that the decoder’s objective function is not very well correlated with the BLEU score on this corpus. |
The Traveling Salesman Problem and its variants | LK works by generating an initial random feasible solution for the TSP problem, and then repeatedly identifying an ordered subset of k edges in the current tour and an ordered subset of k edges not included in the tour such that when they are swapped the objective function is improved. |
Statistical Paraphrase Generation | In SMT, however, the optimization objective function in MERT is the MT evaluation criteria, such as BLEU. |
Statistical Paraphrase Generation | We therefore introduce a new optimization objective function in this paper. |
Statistical Paraphrase Generation | Replacement f-measure (rf): We use rf as the optimization objective function in MERT, which is similar to the conventional f-measure and lever-agesrp and rr: 7“f = (2 X 7”]? |
PCS Induction | We trained this model by optimizing the following objective function: |
PCS Projection | The first term in the objective function is the graph smoothness regularizer which encourages the distributions of similar vertices (large wij) to be similar. |
PCS Projection | While it is possible to derive a closed form solution for this convex objective function , it would require the inversion of a matrix of order |Vf|. |
Constraints on Inter-Domain Variability | We augment the multi-conditional log-likelihood L(6, 04) with the weighted regularization term G(6) to get the composite objective function: |
Empirical Evaluation | The initial learning rate and the weight decay (the inverse squared variance of the Gaussian prior) were set to 0.01, and both parameters were reduced by the factor of 2 every iteration the objective function estimate went down. |
Learning and Inference | The stochastic gradient descent algorithm iterates over examples and updates the weight vector based on the contribution of every considered example to the objective function L R(6, 04, 6). |
Limitations of Topic Models and LSA for Modeling Sentences | In effect, LSA allows missing and observed words to equally impact the objective function . |
Limitations of Topic Models and LSA for Modeling Sentences | Moreover, the true semantics of the concept definitions is actually related to some missing words, but such true semantics will not be favored by the objective function , since equation 2 allows for too strong an impact by Xij = 0 for any missing word. |
The Proposed Approach | The model parameters (vectors in P and Q) are optimized by minimizing the objective function: |
Surface Realization | Input : relation instances R = {(indi,argi>}£f=1, generated abstracts A = {absfijib objective function f , cost function G |
Surface Realization | We employ the following objective function: |
Surface Realization | Algorithm 1 sequentially finds an abstract with the greatest ratio of objective function gain to length, and add it to the summary if the gain is nonnegative. |
Abstract | Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. |
Concluding Remarks | In this paper, we derive a new lower-bound approximate to the objective function used in the regularized compression algorithm. |
Proposed Method | The new objective function is written out as Equation (4). |
WTMF on Graphs | To implement this, we add a regularization term in the objective function of WTMF (equation 2) for each linked pairs ijl , 6233-2: |
WTMF on Graphs | Therefore we approximate the objective function by treating the vector length |Q.,j as fixed values during the ALS iterations: |
Weighted Textual Matrix Factorization | P and Q are optimized by minimize the objective function: |
Learning | The gradient of the regularised objective function then becomes: |
Learning | We learn the gradient using backpropagation through structure (Goller and Kuchler, 1996), and minimize the objective function using L-BFGS. |
Learning | pred(l=l|v, 6) = Singid(Wlabel ’U + blabel) (9) Given our corpus of CCG parses with label pairs (N, l), the new objective function becomes: |
Deceptive Answer Prediction with User Preference Graph | The best parameters w* can be found by minimizing the following objective function: |
Deceptive Answer Prediction with User Preference Graph | The new objective function is changed as: |
Deceptive Answer Prediction with User Preference Graph | In the above objective function , we impose a user graph regularization term |
Our Proposed Approach | The objective function can be transformed |
Our Proposed Approach | Given the low-dimensional semantic representation of the test data, an objective function can be defined as follows: |
Our Proposed Approach | The story boundaries which minimize the objective function 8 in Eq. |
HMM alignment | Let us rewrite the objective function as follows: |
HMM alignment | Note how this recovers the original objective function when matching variables are found. |
Introduction | This captures the positional information in the IBM models in a framework that admits exact parameter estimation inference, though the objective function is not concave: local maxima are a concern. |
Introduction | We will first briefly introduce single word vector representations and then describe the CVG objective function , tree scoring and inference. |
Introduction | The main objective function in Eq. |
Introduction | The objective function is not differentiable due to the hinge loss. |