Algorithm | Finding the weight vector 0 that minimizes the £2-regularized average of this loss function is the structured support vector machine (SVM) problem (Taskar et al., 2003; Tsochantaridis et al., 2005): |
Algorithm | Denote by 0t_1 the value of the weight vector before the t-th round. |
Algorithm | Let Agbi = qb(}_9i, wi) — qbQ‘oi, Then the algorithm updates the weight vector 0’5 as follows: |
Experiments | The single-threaded running time for PNDP+ and Pegasos/DP+ is about 40 minutes per epoch, measured on a dual-core AMD 2.4GHz CPU with 8GB of memory; for CRF, it takes about 100 minutes for each epoch, which is almost entirely because the weight vector 0 is less sparse with CRF learning. |
Experiments | The initial weight vector was 0. |
Experiments | If not indicated otherwise, the perceptron was run for 10 epochs with learning rate 77 = 0.0001, started at zero weight vector , using deduplicated 100-best lists. |
Joint Feature Selection in Distributed Stochastic Learning | The mixed weight vector is resent to each shard to start another epoch of training in parallel on each shard. |
Joint Feature Selection in Distributed Stochastic Learning | Reduced weight vectors are mixed and the result is resent to each shard to start another epoch of parallel training on each shard. |
Multi-objective Algorithms | For each sentence pair (f, e) in the devset, we first generate an N-best list L E {h} using the current weight vector w (line 5). |
Multi-objective Algorithms | Input: Devset, max number of iterations I Output: A set of (pareto-optimal) weight vectors 1: Initialize 111. |
Theory of Pareto Optimality 2.1 Definitions and Concepts | Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier. |