Introduction | This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets: the feature weights cannot be estimated reliably. |
Introduction | Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set. |
Introduction | When millions of features are used but the amount of labeled data is limited, it can be difficult to precisely estimate each feature weight . |
Tensor Space Representation | Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses. |
Tensor Space Representation | The feature weight tensor W can be decomposed as the sum of a sequence of rank-l component tensors. |
Tensor Space Representation | Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge. |
Abstract | A linear model is defined over derivations, and minimum error rate training is used to tune feature weights based on a set of question-answer pairs. |
Introduction | Derivations generated during such a translation procedure are modeled by a linear model, and minimum error rate training (MERT) (Och, 2003) is used to tune feature weights based on a set of question-answer pairs. |
Introduction | 0 Ai denotes the feature weight of |
Introduction | According to the above description, our KB-QA method can be decomposed into four tasks as: (1) search space generation for H(Q); (2) question translation for transforming question spans into their corresponding formal triples; (3) feature design for and (4) feature weight tuning for {A1}. |
Abstract | We describe computationally cheap feature weighting techniques and a novel nonlinear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. |
Experiments | Spreading the feature weights reduces the number of data points that must be examined in order to correct the mislabeled instances. |
Feature Weighting Methods | Following this idea, we develop computationally cheap feature weighting techniques to counteract such effect by boosting the weight of discriminative features, so that they would not be subdued and the instances with such features would have higher chance to be correctly classified. |
Feature Weighting Methods | Specifically, we propose a nonlinear distribution spreading algorithm for feature weighting . |
Feature Weighting Methods | The model’s ability to discriminate at the feature level can be further enhanced by leveraging the distribution of feature weights across multiple classes, e.g., multiple emotion categories funny, happy, sad, exciting, boring, etc.. |
Add arc <eC,ej> to GC with | In fact, the score of each arc is calculated as a linear combination of feature weights . |
Add arc <eC,ej> to GC with | (2005a; 2005b), we use the Margin Infused Relaxed Algorithm (MIRA) to learn the feature weights based on a training set of documents annotated with dependency structures yi where yi denotes the correct dependency tree for the text Ti. |
Add arc <eC,ej> to GC with | 'able 5: Top 10 Feature Weights for Coarse-rained Relation Labeling (Eisner Algorithm) |
Ambiguity-aware Ensemble Training | We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers. |
Ambiguity-aware Ensemble Training | We follow the implementation in CRFsuite.2 At each step, the algorithm approximates a gradient with a small subset of the training examples, and then updates the feature weights . |
Ambiguity-aware Ensemble Training | Once the feature weights w are learnt, we can |
Supervised Dependency Parsing | 2; wdep/Sib are feature weight vectors; the dot product gives scores contributed by corresponding subtrees. |
Supervised Dependency Parsing | Assuming the feature weights w are known, the probability of a dependency tree (1 given an input sentence x is defined as: |
Supervised Dependency Parsing | The partial derivative with respect to the feature weights w is: |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | On the other hand, it has different feature weight vectors for indiVidual models (i.e., w and WT). |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | Given a baseline phrase-based system, all we need is to learn the feature weights w and WT on the development set (with source-language skeleton annotation) and the skeletal language model [7777 on the target-language side of the bilingual corpus. |
Evaluation | All feature weights were learned using minimum error rate training (Och, 2003). |
Discussion | Table 8: Reordering feature weights . |
Discussion | Feature weight analysis: Table 8 shows the syntactic and semantic reordering feature weights . |
Discussion | It shows that the semantic feature weights decrease in the presence of the syntactic features, indicating that the decoder learns to trust semantic features less in the presence of the more accurate syntactic features. |
Introduction | Feature weighting . |
Introduction | Currently, we set all features with an uniform weight w E (0, 1), which is used to control the relative importance of the feature in the final tree similarity: the larger the feature weight , the more important the feature in the final tree similarity. |
Introduction | feature weighting algorithm which can accurately |