Conclusion | We are also interested in ways to modify the objective function to place more emphasis on learning a good joint model, instead of equally weighting the learning of the joint and single-task models. |
Hierarchical Joint Learning | L-BFGS and gradient descent, two frequently used numerical optimization algorithms, require computing the value and partial derivatives of the objective function using the entire training set. |
Hierarchical Joint Learning | It requires a stochastic objective function, which is meant to be a low computational cost estimate of the real objective function . |
Hierarchical Joint Learning | In most NLP models, such as logistic regression with a Gaussian prior, computing the stochastic objective function is fairly straightforward: you compute the model likelihood and partial derivatives for a randomly sampled subset of the training data. |
Conditional Random Fields | The objective function is then a smooth convex function to be minimized over an unconstrained |
Conditional Random Fields | In the following, we will jointly use both penalty terms, yielding the so-called elastic net penalty (Zhou and Hastie, 2005) which corresponds to the objective function |
Conditional Random Fields | However, the introduction of a 61 penalty term makes the optimization of (6) more problematic, as the objective function is no longer differentiable in 0. |
Experiments | Thus, the better starting point provided by EMGI has more impact than the integer program that includes G1 in its objective function . |
Minimized models for supertagging | There are two complementary ways to use grammar-informed initialization with the IP-minimization approach: (1) using EMGI output as the starting grammar/lexicon and (2) using the tag transitions directly in the IP objective function . |
Minimized models for supertagging | For the second, we modify the objective function used in the two IP-minimization steps to be: |
Minimized models for supertagging | In this way, we combine the minimization and GI strategies into a single objective function that finds a minimal grammar set while keeping the more likely tag bigrams in the chosen solution. |
Probabilistic Cross-Lingual Latent Semantic Analysis | Putting L(C) and R(C) together, we would like to maximize the following objective function which is a regularized log-likelihood: |
Probabilistic Cross-Lingual Latent Semantic Analysis | Specifically, we will search for a set of values for all our parameters that can maximize the objective function defined above. |
Probabilistic Cross-Lingual Latent Semantic Analysis | However, there is no closed form solution in the M-step for the whole objective function . |