Index of papers in Proc. ACL that mention
  • feature weights
Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun
Background
where hm is a feature function, Am is the associated feature weight , and Z (f) is a constant for normalization:
Conclusion
As our decoder accounts for multiple derivations, we extend the MERT algorithm to tune feature weights with respect to BLEU score for max-translation decoding.
Experiments
We found that accounting for all possible derivations in max-translation decoding resulted in a small negative effect on BLEU score (from 30.11 to 29.82), even though the feature weights were tuned with respect to BLEU score.
Experiments
We concatenate and normalize their feature weights for the joint decoder.
Extended Minimum Error Rate Training
Minimum error rate training (Och, 2003) is widely used to optimize feature weights for a linear model (Och and Ney, 2002).
Extended Minimum Error Rate Training
The key idea of MERT is to tune one feature weight to minimize error rate each time while keep others fixed.
Extended Minimum Error Rate Training
where a is the feature value of current dimension, cc is the feature weight being tuned, and b is the dotproduct of other dimensions.
Introduction
0 As multiple derivations are used for finding optimal translations, we extend the minimum error rate training (MERT) algorithm (Och, 2003) to tune feature weights with respect to BLEU score for max-translation decoding (Section 4).
feature weights is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Introduction
This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets: the feature weights cannot be estimated reliably.
Introduction
Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set.
Introduction
When millions of features are used but the amount of labeled data is limited, it can be difficult to precisely estimate each feature weight .
Tensor Space Representation
Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
Tensor Space Representation
The feature weight tensor W can be decomposed as the sum of a sequence of rank-l component tensors.
Tensor Space Representation
Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.
feature weights is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Finkel, Jenny Rose and Manning, Christopher D.
Abstract
Our model utilizes a hierarchical prior to link the feature weights for shared features in several single-task models and the joint model.
Base Models
Let 6 be the feature weights , and f (s, y, yi_1) the feature function over adjacent segments yz- and yi_1 in sentence 3.4 The log likelihood of a semi-CRF for a single sentence 3 is given by:
Base Models
Let 6 be the vector of feature weights .
Hierarchical Joint Learning
Each model has its own set of parameters ( feature weights ).
Hierarchical Joint Learning
These have corresponding log-likelihood functions £p(Dp; 6p), £n(Dn; 6n), and Lj (133-; 63-), where the PS are the training data for each model, and the 6s are the model-specific parameter ( feature weight ) vectors.
Hierarchical Joint Learning
These three models are linked by a hierarchical prior, and their feature weight vectors are all drawn from this prior.
Introduction
Then, the singly-annotated data can be used to influence the feature weights for the shared features in the joint model.
feature weights is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bao, Junwei and Duan, Nan and Zhou, Ming and Zhao, Tiejun
Abstract
A linear model is defined over derivations, and minimum error rate training is used to tune feature weights based on a set of question-answer pairs.
Introduction
Derivations generated during such a translation procedure are modeled by a linear model, and minimum error rate training (MERT) (Och, 2003) is used to tune feature weights based on a set of question-answer pairs.
Introduction
0 Ai denotes the feature weight of
Introduction
According to the above description, our KB-QA method can be decomposed into four tasks as: (1) search space generation for H(Q); (2) question translation for transforming question spans into their corresponding formal triples; (3) feature design for and (4) feature weight tuning for {A1}.
feature weights is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Abstract
We describe computationally cheap feature weighting techniques and a novel nonlinear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost.
Experiments
Spreading the feature weights reduces the number of data points that must be examined in order to correct the mislabeled instances.
Feature Weighting Methods
Following this idea, we develop computationally cheap feature weighting techniques to counteract such effect by boosting the weight of discriminative features, so that they would not be subdued and the instances with such features would have higher chance to be correctly classified.
Feature Weighting Methods
Specifically, we propose a nonlinear distribution spreading algorithm for feature weighting .
Feature Weighting Methods
The model’s ability to discriminate at the feature level can be further enhanced by leveraging the distribution of feature weights across multiple classes, e.g., multiple emotion categories funny, happy, sad, exciting, boring, etc..
feature weights is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
15: Discriminatively train feature weights N, and threshold 7'
Discussions
The parameter controlling the degree of attenuation in BLT is also optimized together with other feature weights .
Experimental Results
These feature weights are tuned on the dev set to achieve optimal translation performance using downhill simplex method.
Experimental Results
Once we have computed all feature values for all phrase pairs in the training corpus, we discriminatively train feature weights Aks and the threshold 7' using the downhill simplex method to maximize the BLEU score on 06dev set.
Experimental Results
Since the translation engine implements a log-linear model, the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process.
feature weights is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Ambiguity-aware Ensemble Training
We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers.
Ambiguity-aware Ensemble Training
We follow the implementation in CRFsuite.2 At each step, the algorithm approximates a gradient with a small subset of the training examples, and then updates the feature weights .
Ambiguity-aware Ensemble Training
Once the feature weights w are learnt, we can
Supervised Dependency Parsing
2; wdep/Sib are feature weight vectors; the dot product gives scores contributed by corresponding subtrees.
Supervised Dependency Parsing
Assuming the feature weights w are known, the probability of a dependency tree (1 given an input sentence x is defined as:
Supervised Dependency Parsing
The partial derivative with respect to the feature weights w is:
feature weights is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie
Add arc <eC,ej> to GC with
In fact, the score of each arc is calculated as a linear combination of feature weights .
Add arc <eC,ej> to GC with
(2005a; 2005b), we use the Margin Infused Relaxed Algorithm (MIRA) to learn the feature weights based on a training set of documents annotated with dependency structures yi where yi denotes the correct dependency tree for the text Ti.
Add arc <eC,ej> to GC with
'able 5: Top 10 Feature Weights for Coarse-rained Relation Labeling (Eisner Algorithm)
feature weights is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
Och (2003) proposed using a log-linear model to incorporate multiple features for translation, and proposed a minimum error rate training (MERT) method to train the feature weights to optimize a desirable translation metric.
Abstract
(2009) improved a syntactic SMT system by adding as many as ten thousand syntactic features, and used Margin Infused Relaxed Algorithm (MIRA) to train the feature weights .
Abstract
The feature weights are trained on a tuning set with 2010 sentences using MIRA.
feature weights is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Experiments and Results
The development data utilized to tune the feature weights of our decoder is NIST’03 evaluation set, and test sets are NIST’05 and NIST’08 evaluation sets.
Features and Training
Therefore, we can alternatively update graph-based consensus features and feature weights in the log-linear model.
Features and Training
The decoder then adds the new features and retrains all the feature weights by Minimum Error Rate Training (MERT) (Och, 2003).
Features and Training
The decoder with new feature weights then provides new n-best candidates and their posteriors for constructing another consensus graph, which in turn gives rise to next round of
Graph-based Translation Consensus
where 1/) is the feature vector, A is the feature weights , and H (f) is the set of translation hypotheses in the search space.
Graph-based Translation Consensus
Before elaborating how the graph model of consensus is constructed for both a decoder and N-best output re-ranking in section 5, we will describe how the consensus features and their feature weights can be trained in a semi-supervised way, in section 4.
feature weights is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
where {hm(f, e) | m = l, ..., M} is a set offea-tures, and Am is the feature weight corresponding to the m-th feature.
Background
In this work, Minimum Error Rate Training (MERT) proposed by Och (2003) is used to estimate feature weights ,1 over a series of training samples.
Background
where 44(e) is the log-scaled model score of e in the t-th member system, and ,8: is the corresponding feature weight .
feature weights is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jiang, Qixia and Sun, Maosong
Abstract
The basic idea of 83H is to learn the optimal feature weights from prior knowledge to relocate the data such that similar data have similar hash codes.
Introduction
Unlike SSH that tries to find a sequence of hash functions, S3H fixes the random projection directions and seeks the optimal feature weights from prior knowledge to relocate the objects such that similar objects have similar fingerprints.
Semi-Supervised SimHash
where w 6 RM is the feature weight to be determined and d; is the l-th column of the matrix D.
The direction is determined by concatenating w L times.
Instead, S3H seeks the optimal feature weights via L—BFGS, which is still efficient even for very high-dimensional data.
The direction is determined by concatenating w L times.
83H learns the optimal feature weights from prior knowledge to relocate the data such that similar objects have similar fingerprints.
feature weights is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
A Skeleton-based Approach to MT 2.1 Skeleton Identification
Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification
On the other hand, it has different feature weight vectors for indiVidual models (i.e., w and WT).
A Skeleton-based Approach to MT 2.1 Skeleton Identification
Given a baseline phrase-based system, all we need is to learn the feature weights w and WT on the development set (with source-language skeleton annotation) and the skeletal language model [7777 on the target-language side of the bilingual corpus.
Evaluation
All feature weights were learned using minimum error rate training (Och, 2003).
feature weights is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Paraphrasing for Web Search
We utilize minimum error rate training (MERT) (Och, 2003) to optimize feature weights of the paraphrasing model according to NDCG.
Paraphrasing for Web Search
MERT is used to optimize feature weights of our linear-formed paraphrasing model.
Paraphrasing for Web Search
S X3184 = arg min{Z ETTaDz-Label, 62,-; A3184, M” i=1 The objective of MERT is to find the optimal feature weight vector Xi” that minimizes the error criterion Err according to the NDCG scores of top-l paraphrase candidates.
feature weights is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Perez-Rosas, Veronica and Mihalcea, Rada and Morency, Louis-Philippe
Discussion
Feature Weights
Discussion
Figure 2: Visual and acoustic feature weights .
Discussion
To determine the role played by each of the visual and acoustic features, we compare the feature weights assigned by the learning algorithm, as shown in Figure 2.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Model
the training epoch (x-axis) and parsing feature weights (in legend).
Model
We have some parameters to tune: parsing feature weight 0p, beam size, and training epoch.
Model
Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights , where “Seg”, “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Experimental Results
Next we extract phrase pairs, Hiero rules and tree-to-string rules from the original word alignment and the improved word alignment, and tune all the feature weights on the tuning set.
Integrating Empty Categories in Machine Translation
The feature weights can be tuned on a tuning set in a log-linear model along with other usual features/costs, including language model scores, bi-direction translation probabilities, etc.
Related Work
First, in addition to the preprocessing of training data and inserting recovered empty categories, we implement sparse features to further boost the performance, and tune the feature weights directly towards maximizing the machine translation metric.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben
Evaluation
One of the main approaches adopted by previous systems involves the identification of features that measure writing skill, and then the application of linear or stepwise regression to find optimal feature weights so that the correlation with manually assigned scores is maximised.
Previous work
Linear regression is used to assign optimal feature weights that maximise the correlation with the examiner’s scores.
Previous work
Feature weights and/or scores can be fitted to a marking scheme by stepwise or linear regression.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
where 5 is a real-valued vector of feature weights and j?
A Joint Model with Unlabeled Parallel Text
where (5; and 5: are the vectors of feature weights for L1 and L2, respectively (for brevity we denote them as 61 and 62 in the remaining sections).
A Joint Model with Unlabeled Parallel Text
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights .
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Woodsend, Kristian and Lapata, Mirella
Experimental Setup
We learned the feature weights with a linear SVM, using the software SVM-OOPS (Woodsend and Gondzio, 2009).
Experimental Setup
This tool gave us directly the feature weights as well as support vector values, and it allowed different penalties to be applied to positive and negative misclassifications, enabling us to compensate for the unbalanced data set.
Experimental Setup
For each phrase, features were extracted and salience scores calculated from the feature weights determined through SVM training.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Discussion
Table 8: Reordering feature weights .
Discussion
Feature weight analysis: Table 8 shows the syntactic and semantic reordering feature weights .
Discussion
It shows that the semantic feature weights decrease in the presence of the syntactic features, indicating that the decoder learns to trust semantic features less in the presence of the more accurate syntactic features.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Experiments on Parsing-Based SMT
The feature weights are tuned on the development set using the minimum error
Experiments on Phrase-Based SMT
And Koehn's implementation of minimum error rate training (Och, 2003) is used to tune the feature weights on the development set.
Improving Statistical Bilingual Word Alignment
feature weights , respectively.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.
Expt. 2: Predicting Pairwise Preferences
Feature Weights .
Expt. 2: Predicting Pairwise Preferences
Finally, we make two observations about feature weights in the RTER model.
Expt. 2: Predicting Pairwise Preferences
Second, good MT evaluation feature weights are not good weights for RTE.
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Le and Han, Xianpei
Introduction
Feature weighting .
Introduction
Currently, we set all features with an uniform weight w E (0, 1), which is used to control the relative importance of the feature in the final tree similarity: the larger the feature weight , the more important the feature in the final tree similarity.
Introduction
feature weighting algorithm which can accurately
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
Let 2m be the feature weight vector for member decoder dm, the training procedure proceeds as follows:
Collaborative Decoding
For each decoder dm, find a new feature weight vector 2;,1 which optimizes the specified evaluation criterion L on D using the MERT algorithm based on the n-best list Jim generated by dm:
Collaborative Decoding
where T denotes the translations selected by re-ranking the translations in Jim using a new feature weight vector A
feature weights is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: