Index of papers in Proc. ACL 2014 that mention
  • learning algorithm
Björkelund, Anders and Kuhn, Jonas
Introducing Nonlocal Features
In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Introducing Nonlocal Features
Thus the learning algorithm always reaches the end of a document, avoiding the problem that early updates discard parts of the training data.
Introducing Nonlocal Features
When we applied LaSO, we noticed that it performed worse than the baseline learning algorithm when only using local features.
Introduction
The main reason why early updates underper-form in our setting is that the task is too difficult and that the learning algorithm is not able to profit from all training data.
Introduction
Put another way, early updates happen too early, and the learning algorithm rarely reaches the end of the instances as it halts, updates, and moves on to the next instance.
Representation and Learning
Algorithm 1 shows pseudocode for the leam-ing algorithm, which we will refer to as the baseline learning algorithm .
Results
Since early updates do not always make use of the complete documents during training, it can be expected that it will require either a very wide beam or more iterations to get up to par with the baseline learning algorithm .
Results
Recall that with only local features, delayed LaSO is equivalent to the baseline learning algorithm .
Results
From these results we conclude that we are better off when the learning algorithm handles one document at a time, instead of getting feedback within documents.
learning algorithm is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Abstract
We propose an online learning algorithm based on tensor-space models.
Abstract
We apply with the proposed algorithm to a parsing task, and show that even with very little training data the learning algorithm based on a tensor model performs well, and gives significantly better results than standard learning algorithms based on traditional vector-space models.
Conclusion and Future Work
In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA.
Introduction
Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
Introduction
A tensor weight learning algorithm is then proposed in 4.
Online Learning Algorithm
Here we propose an online learning algorithm similar to MIRA but modified to accommodate tensor models.
Online Learning Algorithm
,m, where 95,- is the input and y,- is the reference or oracle hypothesis, are fed to the weight learning algorithm in sequential order.
Tensor Model Construction
As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector, which serves as a “surro-
Tensor Space Representation
Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Collins, Michael
Abstract
We introduce a provably correct learning algorithm for latent-variable PCFGs.
Additional Details of the Algorithm
The learning algorithm for L-PCFGs can be useI as an initializer for the EM algorithm for L PCFGs.
Experiments on Parsing
This section describes parsing experiments using the learning algorithm for L-PCFGs.
Experiments on Parsing
Table 1: Results on the development data (section 22) and test data (section 23) for various learning algorithms for L-PCFGs.
Experiments on Parsing
In this special case, the L-PCFG learning algorithm is equivalent to a simple algorithm, with the following steps: 1) define the matrix Q with entries wam = count(w1,w2)/N where count(w1,w2) is the number of times that bi-gram (101,102) is seen in the data, and N = wam count(w1, 7.02).
The Learning Algorithm for L-PCFGS
Our goal is to design a learning algorithm for L-PCFGs.
The Learning Algorithm for L-PCFGS
4.2 The Learning Algorithm
The Learning Algorithm for L-PCFGS
Figure 1 shows the learning algorithm for L-PCFGs.
The Matrix Decomposition Algorithm
This section describes the matrix decomposition algorithm used in Step 1 of the learning algorithm .
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.
Abstract
Additive tree metrics can be leveraged by “meta-algorithms” such as neighbor-joining (Saitou and Nei, 1987) and recursive grouping (Choi et al., 2011) to provide consistent learning algorithms for latent trees.
Abstract
In our learning algorithm , we assume that examples of the form (1002,3302) for i E [N] = {1, .
Abstract
The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Introduction
(2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity.
Introduction
To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
Introduction
In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms .
Related Work
Under this assumption, many feature learning algorithms are proposed to obtain better classification performance (Pang and Lee, 2008; Liu, 2012; Feldman, 2013).
Related Work
We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE.
Related Work
In the following sections, we introduce the traditional method before presenting the details of SSWE learning algorithms .
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ciobanu, Alina Maria and Dinu, Liviu P.
Abstract
We use aligned subsequences as features for machine learning algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates.
Conclusions and Future Work
and Waterman, 1981), and other learning algorithms for discriminating between cognates and non-cognates.
Our Approach
Therefore, because the edit distance was widely used in this research area and produced good results, we are encouraged to employ orthographic alignment for identifying pairs of cognates, not only to compute similarity scores, as was previously done, but to use aligned subsequences as features for machine learning algorithms .
Our Approach
3.3 Learning Algorithms
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ji, Yangfeng and Eisenstein, Jacob
Implementation
The learning algorithm is applied in a shift-reduce parser, where the training data consists of the (unique) list of shift and reduce operations required to produce the gold RST parses.
Introduction
Alternatively, our approach can be seen as a nonlinear learning algorithm for incremental structure prediction, which overcomes feature sparsity through effective parameter tying.
Large-Margin Learning Framework
of our learning algorithm are different.
Large-Margin Learning Framework
Algorithm 1 Mini-batch learning algorithm Input: Training set D, Regularization parameters A and 7', Number of iteration T, Initialization matrix A0, and Threshold 5 whilet = l,...,Tdo
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Feature Weighting Methods
ing, better classification and regression models can be built by using the feature weights generated by these models as a pre-weight on the data points for other machine learning algorithms .
Related Work
Noise tolerance techniques aim to improve the learning algorithm itself to avoid over-fitting caused by mislabeled instances in the training phase, so that the constructed classifier becomes more noise-tolerant.
Related Work
Decision tree (Mingers, 1989; Vannoorenberghe and Denoeux, 2002) and boosting (Jiang, 2001; Kalaia and Servediob, 2005; Karmaker and Kwek, 2006) are two learning algorithms that have been investigated in many studies.
Related Work
For example, useful information can be removed with noise elimination, since annotation errors are likely to occur on ambiguous instances that are potentially valuable for learning algorithms .
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
This section first introduces the fundamental supervised learning method, and then describes a baseline active learning algorithm .
Abstract
3.2 Active Learning Algorithm
Abstract
4.4 Bilingual Active Learning Algorithm
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew
Introduction
In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
Introduction
effort to explore graph-based semi-supervised learning algorithms for the wikification task.
Semi-supervised Graph Regularization
We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie
Add arc <eC,ej> to GC with
As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data.
Add arc <eC,ej> to GC with
Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al.
Add arc <eC,ej> to GC with
We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
Implicitly, the weight learning algorithm can be seen as a gradient descent procedure minimizing the difference between the scores of highest scoring (Viterbi) state sequences, and the label state sequences.
Introduction
Pseudocode of the learning algorithm for the partially labeled case is given in Algorithm 1.
Introduction
We see that while all three learning algorithms perform better than the baseline, the performance of the purely unsupervised system is inferior to supervised approaches.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Related Work
Our work builds on one such approach — SampleRank (Wick et al., 2011), a sampling-based learning algorithm .
Sampling-Based Dependency Parsing with Global Features
We begin with the notation before addressing the decoding and learning algorithms .
Sampling-Based Dependency Parsing with Global Features
ure 4 summarizes the learning algorithm .
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: