Introducing Nonlocal Features | We now outline three different ways of learning the weight vector 21) with nonlocal features. |
Introducing Nonlocal Features | In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used. |
Representation and Learning | The score of an arc (ai, mi) is defined as the scalar product between a weight vector 21) and a feature vector (13((ai, m»), where (I) is a feature extraction function over an arc (thus extracting features from the antecedent and the anaphor). |
Representation and Learning | We find the weight vector 21) by online learning using a variant of the structured perceptron (Collins, 2002). |
Representation and Learning | For each instance it uses the current weight vector 21) to make a prediction 3),- given the input 30,. |
Online Learning Algorithm | , D, surrogate weight vector 2) |
Online Learning Algorithm | (a) Mapping a surrogate weight vector to a tensor X1 |
Online Learning Algorithm | Figure 2: Algorithm for mapping a surrogate weight vector X to a tensor. |
Tensor Model Construction | As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector , which serves as a “surro- |
Tensor Model Construction | gate” weight vector . |
Tensor Space Representation | Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses. |
Experiments | 0 Delta-IDF: Takes the dot product of the Delta IDF weight vector (Formula 1) with the document’s term frequency vector. |
Experiments | 0 Spread: Takes the dot product of the distribution spread weight vector (Formula 3) with the document’s term frequency vector. |
Feature Weighting Methods | We calculate the Delta IDF score of every term in V, and get the Delta IDF weight vector A = (A_z'df1, ..., A_idf|V|) for all terms. |
Feature Weighting Methods | When the dataset is imblanced, to avoid building a biased model, we down sample the majority class before calculating the Delta IDF score and then use the a bias balancing procedure to balance the Delta IDF weight vector . |
Feature Weighting Methods | This procedure first divides the Delta IDF weight vector to two vectors, one of which contains all the features with positive scores, and the other of which contains all the features with negative scores. |
Experiment | Table 1: Data, markedness matrix, weight vector , and joint log-probabilities for the IBPOT and the phonological standard constraints. |
The IBPOT Model | The IBPOT model defines a generative process for mappings between input and output forms based on three latent variables: the constraint violation matrices F (faithfulness) and M (markedness), and the weight vector w. The cells of the violation matrices correspond to the number of violations of a constraint by a given input-output mapping. |
The IBPOT Model | The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function: |
The IBPOT Model | We initialize the model with a randomly-drawn markedness violation matrix M and weight vector 212. |
Intervention Prediction Models | The model uses the pseudocode shown in Algorithm 1 to iteratively refine the weight vectors . |
Intervention Prediction Models | Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector, qb(p, 7“, h, t), is extracted for each thread and using the weight vector , w, this model defines a decision function, similar to what is shown in Equation 1. |
Intervention Prediction Models | w is the weight vector , is the squared hinge loss function and fw (tj, pj) is defined in Equation 1. |