Online Learning in Tensor Space
Cao, Yuan and Khudanpur, Sanjeev

Article Structure

Abstract

We propose an online learning algorithm based on tensor-space models.

Introduction

Many NLP applications use models that try to incorporate a large number of linguistic features so that as much human knowledge of language can be brought to bear on the (prediction) task as possible.

Tensor Space Representation

Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.

Tensor Model Construction

To apply a tensor model, we first need to convert the feature vector into a tensor (I).

Online Learning Algorithm

We now turn to the problem of learning the feature weight tensor.

Experiments

In this section we shows empirical results of the training algorithm on a parsing task.

Conclusion and Future Work

In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA.

Topics

feature weights

Appears in 18 sentences as: feature weight (6) feature weights (11) features weights (1)
In Online Learning in Tensor Space
  1. This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets: the feature weights cannot be estimated reliably.
    Page 1, “Introduction”
  2. Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set.
    Page 1, “Introduction”
  3. When millions of features are used but the amount of labeled data is limited, it can be difficult to precisely estimate each feature weight .
    Page 1, “Introduction”
  4. This low-rank approximation imposes structural constraints on the feature weights and can be regarded as a form of regularization.
    Page 1, “Introduction”
  5. With this representation, we no longer need to estimate individual feature weights directly but only a small number of “bases” instead.
    Page 1, “Introduction”
  6. This property makes the the tensor model very effective when training a large number of feature weights in a low-resource environment.
    Page 1, “Introduction”
  7. Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
    Page 2, “Tensor Space Representation”
  8. The feature weight tensor W can be decomposed as the sum of a sequence of rank-l component tensors.
    Page 3, “Tensor Space Representation”
  9. Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.
    Page 3, “Tensor Space Representation”
  10. By contrast, a linear tensor model only needs to learn H ZdDzl nd “bases” of the m feature weights instead of individual weights directly.
    Page 3, “Tensor Space Representation”
  11. In other words, a true feature weight is now approximated by a set of bases.
    Page 3, “Tensor Space Representation”

See all papers in Proc. ACL 2014 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

learning algorithm

Appears in 9 sentences as: learning algorithm (7) learning algorithms (3)
In Online Learning in Tensor Space
  1. We propose an online learning algorithm based on tensor-space models.
    Page 1, “Abstract”
  2. We apply with the proposed algorithm to a parsing task, and show that even with very little training data the learning algorithm based on a tensor model performs well, and gives significantly better results than standard learning algorithms based on traditional vector-space models.
    Page 1, “Abstract”
  3. Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
    Page 1, “Introduction”
  4. A tensor weight learning algorithm is then proposed in 4.
    Page 2, “Introduction”
  5. Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
    Page 2, “Tensor Space Representation”
  6. As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector, which serves as a “surro-
    Page 4, “Tensor Model Construction”
  7. Here we propose an online learning algorithm similar to MIRA but modified to accommodate tensor models.
    Page 5, “Online Learning Algorithm”
  8. ,m, where 95,- is the input and y,- is the reference or oracle hypothesis, are fed to the weight learning algorithm in sequential order.
    Page 5, “Online Learning Algorithm”
  9. In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.

Perceptron

Appears in 9 sentences as: Perceptron (9)
In Online Learning in Tensor Space
  1. Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
    Page 1, “Introduction”
  2. As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron ) on the training data and estimate a weight vector, which serves as a “surro-
    Page 4, “Tensor Model Construction”
  3. For comparison, we also investigated training the reranker with Perceptron and MIRA.
    Page 7, “Experiments”
  4. The f -scores of the held-out and evaluation set given by T-MIRA as well as the Perceptron and
    Page 7, “Experiments”
  5. When very few labeled data are available for training (compared with the number of features), T-MIRA performs much better than the vector-based models MIRA and Perceptron .
    Page 7, “Experiments”
  6. To further contrast the behavior of T-MIRA, MIRA and Perceptron , we plot the f -scores on both the training and held-out sets given by these algorithms after each training epoch in Figure 3.
    Page 7, “Experiments”
  7. It is clearly seen that both MIRA and Perceptron do much better than T-MIRA on the training set.
    Page 7, “Experiments”
  8. As an aside, observe that MIRA consistently outperformed Perceptron , as expected.
    Page 7, “Experiments”
  9. To make further comparison of the two strategies, in Figure 4 we plot the 20 largest singular values of the matrices which the surrogate weights (given by the Perceptron after running for l epoch) are mapped to by both strategies (from the experiment with training data sections 2-5).
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention Perceptron.

See all papers in Proc. ACL that mention Perceptron.

Back to top.

feature set

Appears in 7 sentences as: feature set (5) feature sets (2)
In Online Learning in Tensor Space
  1. This regularizes the model complexity and makes the tensor model highly effective in situations where a large feature set is defined but very limited resources are available for training.
    Page 1, “Abstract”
  2. This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets : the feature weights cannot be estimated reliably.
    Page 1, “Introduction”
  3. Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set .
    Page 1, “Introduction”
  4. In general, if V features are defined for a learning problem, and we (i) organize the feature set as a tensor (I) E Rnlxn2x"'nD and (ii) use H component rank-l tensors to approximate the corresponding target weight tensor.
    Page 3, “Tensor Space Representation”
  5. Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.
    Page 3, “Tensor Space Representation”
  6. ways of mapping, which is an intractable number of possibilities even for modest sized feature sets , making it impractical to carry out a brute force search.
    Page 4, “Tensor Model Construction”
  7. This can be regarded as a form of model regular-ization.Therefore, compared with the traditional vector-space models, learning in the tensor space is very effective when a large feature set is defined, but only small amount of training data is available.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

weight vector

Appears in 7 sentences as: weight vector (6) weight vectors (1)
In Online Learning in Tensor Space
  1. Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
    Page 2, “Tensor Space Representation”
  2. As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector , which serves as a “surro-
    Page 4, “Tensor Model Construction”
  3. gate” weight vector .
    Page 5, “Tensor Model Construction”
  4. , D, surrogate weight vector 2)
    Page 5, “Online Learning Algorithm”
  5. (a) Mapping a surrogate weight vector to a tensor X1
    Page 5, “Online Learning Algorithm”
  6. Figure 2: Algorithm for mapping a surrogate weight vector X to a tensor.
    Page 5, “Online Learning Algorithm”
  7. Denote the weight vector of the dth mode of the hth tensor at time t as wit.
    Page 6, “Online Learning Algorithm”

See all papers in Proc. ACL 2014 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.

vector space

Appears in 6 sentences as: vector space (6)
In Online Learning in Tensor Space
  1. Most traditional models are linear models, in the sense that both the features of the data and model parameters are represented as vectors in a vector space .
    Page 1, “Introduction”
  2. On the other hand, tensor models have many more degrees of “design freedom” than vector space models.
    Page 1, “Introduction”
  3. Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
    Page 2, “Tensor Space Representation”
  4. A vector space linear model requires estimating 1,000,000 free parameters.
    Page 2, “Tensor Space Representation”
  5. Then the total number of parameters to be learned for this tensor model is H ZdDzl nd, which is usually much smaller than V = HdDzl nd for a traditional vector space model.
    Page 3, “Tensor Space Representation”
  6. Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.
    Page 3, “Tensor Space Representation”

See all papers in Proc. ACL 2014 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

reranking

Appears in 3 sentences as: reranker (1) reranking (3)
In Online Learning in Tensor Space
  1. We used the Charniak parser (Charniak et al., 2005) for our experiment, and we used the proposed algorithm to train the reranking feature weights.
    Page 7, “Experiments”
  2. For comparison, we also investigated training the reranker with Perceptron and MIRA.
    Page 7, “Experiments”
  3. There are around V = 1.33 million features in all defined for reranking, and the n-best size for reranking is set to 50.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.