Index of papers in Proc. ACL that mention

weight vector

Seen in text as:

weight vector (105)
weight vectors (21)

Seen in 121 sentences in 23 papers.

1. Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction

Jiang, Jing

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A multitask transfer learning solution	Let wk denote the weight vector of the linear classifier that separates positive instances of auxiliary type Ak, from negative instances, and let wT denote a similar weight vector for the target type ’2'.
A multitask transfer learning solution	If different relation types are totally unrelated, these weight vectors should also be independent of each other.
A multitask transfer learning solution	But because we observe similar syntactic structures across different relation types, we now assume that these weight vectors are related through a common component V2
Abstract	The proposed framework models the commonality among different relation types through a shared weight vector , enables knowledge learned from the auxiliary relation types to be transferred to the target relation type, and allows easy control of the tradeoff between precision and recall.
Conclusions and future work	In the multitask learning framework that we introduced, different relation types are treated as different but related tasks that are learned together, with the common structures among the relation types modeled by a shared weight vector .
Experiments	the number of nonzero entries in the shared weight vector V. To see how the performance may vary as H changes, we plot the performance of TL-comb and TL-auto in terms of the average Fl across the seven target relation types, with H ranging from 100 to 50000.

weight vector is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introducing Nonlocal Features	We now outline three different ways of learning the weight vector 21) with nonlocal features.
Introducing Nonlocal Features	In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Representation and Learning	The score of an arc (ai, mi) is defined as the scalar product between a weight vector 21) and a feature vector (13((ai, m»), where (I) is a feature extraction function over an arc (thus extracting features from the antecedent and the anaphor).
Representation and Learning	We find the weight vector 21) by online learning using a variant of the structured perceptron (Collins, 2002).
Representation and Learning	For each instance it uses the current weight vector 21) to make a prediction 3),- given the input 30,.

weight vector is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

3. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	With this framework, adaptation to a new domain simply consists of updating a weight vector , and multiple domains can be supported by the same system.
Introduction	For each sentence that is being decoded, we choose the weight vector that is optimized on the closest cluster, allowing for adaptation even with unlabelled and heterogeneous test data.
Translation Model Architecture	To combine statistics from a vector of n component corpora, we can use a weighted version of equation 1, which adds a weight vector A of length n (Sennrich, 2012b):
Translation Model Architecture	Table 1: Illustration of instance weighting with weight vectors for two corpora.
Translation Model Architecture	In our implementation, the weight vector is set globally, but can be overridden on a per-sentence basis.

weight vector is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

4. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Training method	Input: Training set 8 = {(xt,yt)}tT=1 Output: Model weight vector w
Training method	where w is a weight vector and f is a feature representation of an input x and an output y.
Training method	Learning a mapping between an input-output pair corresponds to finding a weight vector w such that the best scoring path of a given sentence is the same as (or close to) the correct path.

weight vector is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. Online Learning in Tensor Space

Cao, Yuan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Online Learning Algorithm	, D, surrogate weight vector 2)
Online Learning Algorithm	(a) Mapping a surrogate weight vector to a tensor X1
Online Learning Algorithm	Figure 2: Algorithm for mapping a surrogate weight vector X to a tensor.
Tensor Model Construction	As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector , which serves as a “surro-
Tensor Model Construction	gate” weight vector .
Tensor Space Representation	Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.

weight vector is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	0 Delta-IDF: Takes the dot product of the Delta IDF weight vector (Formula 1) with the document’s term frequency vector.
Experiments	0 Spread: Takes the dot product of the distribution spread weight vector (Formula 3) with the document’s term frequency vector.
Feature Weighting Methods	We calculate the Delta IDF score of every term in V, and get the Delta IDF weight vector A = (A_z'df1, ..., A_idf\|V\|) for all terms.
Feature Weighting Methods	When the dataset is imblanced, to avoid building a biased model, we down sample the majority class before calculating the Delta IDF score and then use the a bias balancing procedure to balance the Delta IDF weight vector .
Feature Weighting Methods	This procedure first divides the Delta IDF weight vector to two vectors, one of which contains all the features with positive scores, and the other of which contains all the features with negative scores.

weight vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

7. Fine-Grained Genre Classification Using Structural Learning Algorithms

Wu, Zhili and Markert, Katja and Sharoff, Serge

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Structural SVMs	Let x be a document and wm a weight vector associated with the genre class m in a corpus with k genres at the most fine-grained level.
Structural SVMs	The predicted class is the class achieving the maximum inner product between x and the weight vector for the class, denoted as,
Structural SVMs	Accurate prediction requires that when a document vector is multiplied with the weight vector associated with its own class, the resulting inner product should be larger than its inner products with a weight vector for any other genre class m. This helps us to define criteria for weight vectors .

weight vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Nonparametric Learning of Phonological Constraints in Optimality Theory

Doyle, Gabriel and Bicknell, Klinton and Levy, Roger

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	Table 1: Data, markedness matrix, weight vector , and joint log-probabilities for the IBPOT and the phonological standard constraints.
The IBPOT Model	The IBPOT model defines a generative process for mappings between input and output forms based on three latent variables: the constraint violation matrices F (faithfulness) and M (markedness), and the weight vector w. The cells of the violation matrices correspond to the number of violations of a constraint by a given input-output mapping.
The IBPOT Model	The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function:
The IBPOT Model	We initialize the model with a randomly-drawn markedness violation matrix M and weight vector 212.

weight vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Algorithm	Finding the weight vector 0 that minimizes the £2-regularized average of this loss function is the structured support vector machine (SVM) problem (Taskar et al., 2003; Tsochantaridis et al., 2005):
Algorithm	Denote by 0t_1 the value of the weight vector before the t-th round.
Algorithm	Let Agbi = qb(}_9i, wi) — qbQ‘oi, Then the algorithm updates the weight vector 0’5 as follows:
Experiments	The single-threaded running time for PNDP+ and Pegasos/DP+ is about 40 minutes per epoch, measured on a dual-core AMD 2.4GHz CPU with 8GB of memory; for CRF, it takes about 100 minutes for each epoch, which is almost entirely because the weight vector 0 is less sparse with CRF learning.

weight vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Adaptive Parser-Centric Text Normalization

Zhang, Congle and Baldwin, Tyler and Ho, Howard and Kimelfeld, Benny and Li, Yunyao

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	The conditional probability of an assignment 04, given an input sequence x and the weight vector 9 = (61, .
Model	When performing inference, we wish to select the output sequence with the highest probability, given the input sequence X and the weight vector 9 (i.e., MAP inference).
Model	A weight vector 9 = (61, .

weight vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Forest Reranking	As usual, we define the score of a parse y to be the dot product between a high dimensional feature representation and a weight vector w:
Forest Reranking	Using a machine learning algorithm, the weight vector w can be estimated from the training data where each sentence 3,- is labelled with its correct (“gold-standard”) parse As for the learner, Collins (2000) uses the boosting algorithm and Charniak and Johnson (2005) use the maximum entropy estimator.
Forest Reranking	Now we train the reranker to pick the oracle parses as often as possible, and in case an error is made (line 6), perform an update on the weight vector (line 7), by adding the difference between two feature representations.

weight vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The initial weight vector was 0.
Experiments	If not indicated otherwise, the perceptron was run for 10 epochs with learning rate 77 = 0.0001, started at zero weight vector , using deduplicated 100-best lists.
Joint Feature Selection in Distributed Stochastic Learning	The mixed weight vector is resent to each shard to start another epoch of training in parallel on each shard.
Joint Feature Selection in Distributed Stochastic Learning	Reduced weight vectors are mixed and the result is resent to each shard to start another epoch of parallel training on each shard.

weight vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	L1 regularization penalizes the weight vector for its Ll-norm (i.e.
Log-Linear Models	In effect, it forces the weight to receive the total Ll penalty that would have been applied if the weight had been updated by the true gradients, assuming that the current weight vector resides in the same orthant as the true weight vector .
Log-Linear Models	problem as a Ll-constrained problem (Lee et al., 2006), where the conditional log-likelihood of the training data is maximized under a fixed constraint of the Ll-norm of the weight vector .
Log-Linear Models	(2008) describe efficient algorithms for projecting a weight vector onto the Ll-ball.

weight vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Paraphrasing Adaptation for Web Search Ranking

Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Paraphrasing for Web Search	S X3184 = arg min{Z ETTaDz-Label, 62,-; A3184, M” i=1 The objective of MERT is to find the optimal feature weight vector Xi” that minimizes the error criterion Err according to the NDCG scores of top-l paraphrase candidates.
Paraphrasing for Web Search	where is the best paraphrase candidate according to the paraphrasing model based on the weight vector All”, N(Dfabel, 62,, R) is the NDCG score of computed on the documents ranked by R of Q,- and labeled document set ’Dfabez of 62,-.
Paraphrasing for Web Search	How to learn the weight vector {A6521 is a standard leaming-to-rank task.

weight vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Fast and Adaptive Online Training of Feature-Rich Translation Models

Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Adaptive Online MT	A fixed threadpool of workers computes gradients in parallel and sends them to a master thread, which updates a central weight vector .
Adaptive Online MT	During a tuning run, the online method decodes the tuning set under many more weight vectors than a MERT—style batch method.
Experiments	Our algorithm decodes each example with a new weight vector , thus exploring more of the search space for the same tuning set.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. Predicting Instructor's Intervention in MOOC forums

Chaturvedi, Snigdha and Goldwasser, Dan and Daumé III, Hal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Intervention Prediction Models	The model uses the pseudocode shown in Algorithm 1 to iteratively refine the weight vectors .
Intervention Prediction Models	Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector, qb(p, 7“, h, t), is extracted for each thread and using the weight vector , w, this model defines a decision function, similar to what is shown in Equation 1.
Intervention Prediction Models	w is the weight vector , is the squared hinge loss function and fw (tj, pj) is defined in Equation 1.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Collaborative Decoding	Let 2m be the feature weight vector for member decoder dm, the training procedure proceeds as follows:
Collaborative Decoding	For each decoder dm, find a new feature weight vector 2;,1 which optimizes the specified evaluation criterion L on D using the MERT algorithm based on the n-best list Jim generated by dm:
Collaborative Decoding	where T denotes the translations selected by re-ranking the translations in Jim using a new feature weight vector A

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Cross-Language Text Classification Using Structural Correspondence Learning

Prettenhofer, Peter and Stein, Benno

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-Language Structural Correspondence Learning	to constrain the hypothesis space, i.e., the space of possible weight vectors , of the target task by considering multiple different but related prediction tasks.
Cross-Language Structural Correspondence Learning	The subspace is used to constrain the learning of the target task by restricting the weight vector w to lie in the subspace defined by 6T.
Cross-Language Text Classification	wis a weight vector that parameterizes the classifier, denotes the matrix transpose.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Metadata-Aware Measures for Answer Summarization in Community Question Answering

Tomasoni, Mattia and Huang, Minlie

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The summarization framework	We trained a Linear Regression classifier to learn the weight vector W = (7.01, w2, 2123, 2124) that would combine the above feature.
The summarization framework	It was calculated as dot product between the learned weight vector W and the feature vector for answer \II“.
The summarization framework	In order to learn the weight vector V that would combine the above scores, we asked three human annotators to generate question-biased extractive summaries based on all answers available for a certain question.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	distance between hypotheses when projected onto the line defined by the weight vector w.
Learning in SMT	Given an input sentence in the source language cc 6 X, we want to produce a translation 3/ E 3/(53) using a linear model parameterized by a weight vector w:
The Relative Margin Machine in SMT	More formally, the spread is the distance between y+, and the worst candidate (yw, d“’) <— arg min(y,d)€y($i),p($i) 8($i, y, d), after projecting both onto the line defined by the weight vector w. For each y’, this projection is conveniently given by s(:cZ-, y’, d), thus the spread is calculated as 68($i, y+, yw).

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

21. Learning to Translate with Multiple Objectives

Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Multi-objective Algorithms	For each sentence pair (f, e) in the devset, we first generate an N-best list L E {h} using the current weight vector w (line 5).
Multi-objective Algorithms	Input: Devset, max number of iterations I Output: A set of (pareto-optimal) weight vectors 1: Initialize 111.
Theory of Pareto Optimality 2.1 Definitions and Concepts	Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. Learning to Win by Reading Manuals in a Monte-Carlo Framework

Branavan, S.R.K and Silver, David and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Adding Linguistic Knowledge to the Monte-Carlo Framework	where y,- is the ith hidden unit of 37, and {ii is the weight vector corresponding to yi.
Adding Linguistic Knowledge to the Monte-Carlo Framework	Q(8t7a’t7 Z “7 ° f7 where 7.3 is the weight vector .
Monte-Carlo Framework for Computer Games	Here f (s, a) E R” is a real-valued feature function, and U7 is a weight vector .

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Wei, Wei and Gulla, Jon Atle

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Analysis	In the training process of HL-flat, the algorithm reflexes the restriction in the HL-SOT algorithm that requires the weight vector wig; of the classifier i is only updated on the examples that are positive for its parent node.
The HL-SOT Approach	Defining the f function Let wl, ..., 212 N be weight vectors that define linear-threshold classifiers ofeach node in SOT.
The HL-SOT Approach	The Formula 1 restricts that the weight vector wig; of the classifier i is only updated on the examples that are positive for its parent node.

weight vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: