Index of papers in Proc. ACL that mention
  • weight vector
Jiang, Jing
A multitask transfer learning solution
Let wk denote the weight vector of the linear classifier that separates positive instances of auxiliary type Ak, from negative instances, and let wT denote a similar weight vector for the target type ’2'.
A multitask transfer learning solution
If different relation types are totally unrelated, these weight vectors should also be independent of each other.
A multitask transfer learning solution
But because we observe similar syntactic structures across different relation types, we now assume that these weight vectors are related through a common component V2
Abstract
The proposed framework models the commonality among different relation types through a shared weight vector , enables knowledge learned from the auxiliary relation types to be transferred to the target relation type, and allows easy control of the tradeoff between precision and recall.
Conclusions and future work
In the multitask learning framework that we introduced, different relation types are treated as different but related tasks that are learned together, with the common structures among the relation types modeled by a shared weight vector .
Experiments
the number of nonzero entries in the shared weight vector V. To see how the performance may vary as H changes, we plot the performance of TL-comb and TL-auto in terms of the average Fl across the seven target relation types, with H ranging from 100 to 50000.
weight vector is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Introducing Nonlocal Features
We now outline three different ways of learning the weight vector 21) with nonlocal features.
Introducing Nonlocal Features
In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Representation and Learning
The score of an arc (ai, mi) is defined as the scalar product between a weight vector 21) and a feature vector (13((ai, m»), where (I) is a feature extraction function over an arc (thus extracting features from the antecedent and the anaphor).
Representation and Learning
We find the weight vector 21) by online learning using a variant of the structured perceptron (Collins, 2002).
Representation and Learning
For each instance it uses the current weight vector 21) to make a prediction 3),- given the input 30,.
weight vector is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Introduction
With this framework, adaptation to a new domain simply consists of updating a weight vector , and multiple domains can be supported by the same system.
Introduction
For each sentence that is being decoded, we choose the weight vector that is optimized on the closest cluster, allowing for adaptation even with unlabelled and heterogeneous test data.
Translation Model Architecture
To combine statistics from a vector of n component corpora, we can use a weighted version of equation 1, which adds a weight vector A of length n (Sennrich, 2012b):
Translation Model Architecture
Table 1: Illustration of instance weighting with weight vectors for two corpora.
Translation Model Architecture
In our implementation, the weight vector is set globally, but can be overridden on a per-sentence basis.
weight vector is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Training method
Input: Training set 8 = {(xt,yt)}tT=1 Output: Model weight vector w
Training method
where w is a weight vector and f is a feature representation of an input x and an output y.
Training method
Learning a mapping between an input-output pair corresponds to finding a weight vector w such that the best scoring path of a given sentence is the same as (or close to) the correct path.
weight vector is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Online Learning Algorithm
, D, surrogate weight vector 2)
Online Learning Algorithm
(a) Mapping a surrogate weight vector to a tensor X1
Online Learning Algorithm
Figure 2: Algorithm for mapping a surrogate weight vector X to a tensor.
Tensor Model Construction
As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector , which serves as a “surro-
Tensor Model Construction
gate” weight vector .
Tensor Space Representation
Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
weight vector is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Experiments
0 Delta-IDF: Takes the dot product of the Delta IDF weight vector (Formula 1) with the document’s term frequency vector.
Experiments
0 Spread: Takes the dot product of the distribution spread weight vector (Formula 3) with the document’s term frequency vector.
Feature Weighting Methods
We calculate the Delta IDF score of every term in V, and get the Delta IDF weight vector A = (A_z'df1, ..., A_idf|V|) for all terms.
Feature Weighting Methods
When the dataset is imblanced, to avoid building a biased model, we down sample the majority class before calculating the Delta IDF score and then use the a bias balancing procedure to balance the Delta IDF weight vector .
Feature Weighting Methods
This procedure first divides the Delta IDF weight vector to two vectors, one of which contains all the features with positive scores, and the other of which contains all the features with negative scores.
weight vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wu, Zhili and Markert, Katja and Sharoff, Serge
Structural SVMs
Let x be a document and wm a weight vector associated with the genre class m in a corpus with k genres at the most fine-grained level.
Structural SVMs
The predicted class is the class achieving the maximum inner product between x and the weight vector for the class, denoted as,
Structural SVMs
Accurate prediction requires that when a document vector is multiplied with the weight vector associated with its own class, the resulting inner product should be larger than its inner products with a weight vector for any other genre class m. This helps us to define criteria for weight vectors .
weight vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Doyle, Gabriel and Bicknell, Klinton and Levy, Roger
Experiment
Table 1: Data, markedness matrix, weight vector , and joint log-probabilities for the IBPOT and the phonological standard constraints.
The IBPOT Model
The IBPOT model defines a generative process for mappings between input and output forms based on three latent variables: the constraint violation matrices F (faithfulness) and M (markedness), and the weight vector w. The cells of the violation matrices correspond to the number of violations of a constraint by a given input-output mapping.
The IBPOT Model
The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function:
The IBPOT Model
We initialize the model with a randomly-drawn markedness violation matrix M and weight vector 212.
weight vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tang, Hao and Keshet, Joseph and Livescu, Karen
Algorithm
Finding the weight vector 0 that minimizes the £2-regularized average of this loss function is the structured support vector machine (SVM) problem (Taskar et al., 2003; Tsochantaridis et al., 2005):
Algorithm
Denote by 0t_1 the value of the weight vector before the t-th round.
Algorithm
Let Agbi = qb(}_9i, wi) — qbQ‘oi, Then the algorithm updates the weight vector 0’5 as follows:
Experiments
The single-threaded running time for PNDP+ and Pegasos/DP+ is about 40 minutes per epoch, measured on a dual-core AMD 2.4GHz CPU with 8GB of memory; for CRF, it takes about 100 minutes for each epoch, which is almost entirely because the weight vector 0 is less sparse with CRF learning.
weight vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Congle and Baldwin, Tyler and Ho, Howard and Kimelfeld, Benny and Li, Yunyao
Model
The conditional probability of an assignment 04, given an input sequence x and the weight vector 9 = (61, .
Model
When performing inference, we wish to select the output sequence with the highest probability, given the input sequence X and the weight vector 9 (i.e., MAP inference).
Model
A weight vector 9 = (61, .
weight vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Forest Reranking
As usual, we define the score of a parse y to be the dot product between a high dimensional feature representation and a weight vector w:
Forest Reranking
Using a machine learning algorithm, the weight vector w can be estimated from the training data where each sentence 3,- is labelled with its correct (“gold-standard”) parse As for the learner, Collins (2000) uses the boosting algorithm and Charniak and Johnson (2005) use the maximum entropy estimator.
Forest Reranking
Now we train the reranker to pick the oracle parses as often as possible, and in case an error is made (line 6), perform an update on the weight vector (line 7), by adding the difference between two feature representations.
weight vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
The initial weight vector was 0.
Experiments
If not indicated otherwise, the perceptron was run for 10 epochs with learning rate 77 = 0.0001, started at zero weight vector , using deduplicated 100-best lists.
Joint Feature Selection in Distributed Stochastic Learning
The mixed weight vector is resent to each shard to start another epoch of training in parallel on each shard.
Joint Feature Selection in Distributed Stochastic Learning
Reduced weight vectors are mixed and the result is resent to each shard to start another epoch of parallel training on each shard.
weight vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Introduction
L1 regularization penalizes the weight vector for its Ll-norm (i.e.
Log-Linear Models
In effect, it forces the weight to receive the total Ll penalty that would have been applied if the weight had been updated by the true gradients, assuming that the current weight vector resides in the same orthant as the true weight vector .
Log-Linear Models
problem as a Ll-constrained problem (Lee et al., 2006), where the conditional log-likelihood of the training data is maximized under a fixed constraint of the Ll-norm of the weight vector .
Log-Linear Models
(2008) describe efficient algorithms for projecting a weight vector onto the Ll-ball.
weight vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Paraphrasing for Web Search
S X3184 = arg min{Z ETTaDz-Label, 62,-; A3184, M” i=1 The objective of MERT is to find the optimal feature weight vector Xi” that minimizes the error criterion Err according to the NDCG scores of top-l paraphrase candidates.
Paraphrasing for Web Search
where is the best paraphrase candidate according to the paraphrasing model based on the weight vector All”, N(Dfabel, 62,, R) is the NDCG score of computed on the documents ranked by R of Q,- and labeled document set ’Dfabez of 62,-.
Paraphrasing for Web Search
How to learn the weight vector {A6521 is a standard leaming-to-rank task.
weight vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Adaptive Online MT
A fixed threadpool of workers computes gradients in parallel and sends them to a master thread, which updates a central weight vector .
Adaptive Online MT
During a tuning run, the online method decodes the tuning set under many more weight vectors than a MERT—style batch method.
Experiments
Our algorithm decodes each example with a new weight vector , thus exploring more of the search space for the same tuning set.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chaturvedi, Snigdha and Goldwasser, Dan and Daumé III, Hal
Intervention Prediction Models
The model uses the pseudocode shown in Algorithm 1 to iteratively refine the weight vectors .
Intervention Prediction Models
Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector, qb(p, 7“, h, t), is extracted for each thread and using the weight vector , w, this model defines a decision function, similar to what is shown in Equation 1.
Intervention Prediction Models
w is the weight vector , is the squared hinge loss function and fw (tj, pj) is defined in Equation 1.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
Let 2m be the feature weight vector for member decoder dm, the training procedure proceeds as follows:
Collaborative Decoding
For each decoder dm, find a new feature weight vector 2;,1 which optimizes the specified evaluation criterion L on D using the MERT algorithm based on the n-best list Jim generated by dm:
Collaborative Decoding
where T denotes the translations selected by re-ranking the translations in Jim using a new feature weight vector A
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Cross-Language Structural Correspondence Learning
to constrain the hypothesis space, i.e., the space of possible weight vectors , of the target task by considering multiple different but related prediction tasks.
Cross-Language Structural Correspondence Learning
The subspace is used to constrain the learning of the target task by restricting the weight vector w to lie in the subspace defined by 6T.
Cross-Language Text Classification
wis a weight vector that parameterizes the classifier, denotes the matrix transpose.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tomasoni, Mattia and Huang, Minlie
The summarization framework
We trained a Linear Regression classifier to learn the weight vector W = (7.01, w2, 2123, 2124) that would combine the above feature.
The summarization framework
It was calculated as dot product between the learned weight vector W and the feature vector for answer \II“.
The summarization framework
In order to learn the weight vector V that would combine the above scores, we asked three human annotators to generate question-biased extractive summaries based on all answers available for a certain question.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Introduction
distance between hypotheses when projected onto the line defined by the weight vector w.
Learning in SMT
Given an input sentence in the source language cc 6 X, we want to produce a translation 3/ E 3/(53) using a linear model parameterized by a weight vector w:
The Relative Margin Machine in SMT
More formally, the spread is the distance between y+, and the worst candidate (yw, d“’) <— arg min(y,d)€y($i),p($i) 8($i, y, d), after projecting both onto the line defined by the weight vector w. For each y’, this projection is conveniently given by s(:cZ-, y’, d), thus the spread is calculated as 68($i, y+, yw).
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki
Multi-objective Algorithms
For each sentence pair (f, e) in the devset, we first generate an N-best list L E {h} using the current weight vector w (line 5).
Multi-objective Algorithms
Input: Devset, max number of iterations I Output: A set of (pareto-optimal) weight vectors 1: Initialize 111.
Theory of Pareto Optimality 2.1 Definitions and Concepts
Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K and Silver, David and Barzilay, Regina
Adding Linguistic Knowledge to the Monte-Carlo Framework
where y,- is the ith hidden unit of 37, and {ii is the weight vector corresponding to yi.
Adding Linguistic Knowledge to the Monte-Carlo Framework
Q(8t7a’t7 Z “7 ° f7 where 7.3 is the weight vector .
Monte-Carlo Framework for Computer Games
Here f (s, a) E R” is a real-valued feature function, and U7 is a weight vector .
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wei, Wei and Gulla, Jon Atle
Empirical Analysis
In the training process of HL-flat, the algorithm reflexes the restriction in the HL-SOT algorithm that requires the weight vector wig; of the classifier i is only updated on the examples that are positive for its parent node.
The HL-SOT Approach
Defining the f function Let wl, ..., 212 N be weight vectors that define linear-threshold classifiers ofeach node in SOT.
The HL-SOT Approach
The Formula 1 restricts that the weight vector wig; of the classifier i is only updated on the examples that are positive for its parent node.
weight vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: