Index of papers in Proc. ACL that mention

learning algorithm

Seen in text as:

learning algorithm (163)
learning algorithms (86)
Learning Algorithm (10)

Seen in 260 sentences in 49 papers.

1. Learning to Follow Navigational Directions

Vogel, Adam and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We learn this correspondence with a reinforcement learning algorithm , using the deviation of the route we follow from the intended path as a reward signal.
Approximate Dynamic Programming	To learn these weights 6 we use SARSA (Sutton and Barto, 1998), an online learning algorithm similar to Q-learning (Watkins and Dayan, 1992).
Approximate Dynamic Programming	Algorithm 1 details the learning algorithm , which we follow here.
Approximate Dynamic Programming	We also compare against the policy gradient learning algorithm of Branavan et al.
Introduction	Solved using a reinforcement learning algorithm , our system acquires the meaning of spatial words through
Introduction	We frame direction following as an apprenticeship learning problem and solve it with a reinforcement learning algorithm , extending previous work on interpreting instructions by Branavan et al.
Reinforcement Learning Formulation	2Our learning algorithm is not dependent on a determin-
Reinforcement Learning Formulation	Learning exactly which words influence decision making is difficult; reinforcement learning algorithms have problems with the large, sparse feature vectors common in natural language processing.

learning algorithm is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Paraphrase-Driven Learning for Open Question Answering

Fader, Anthony and Zettlemoyer, Luke and Etzioni, Oren

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	0 We describe scalable learning algorithms that induce general question templates and lexical variants of entities and relations.
Learning	PARALEX uses a two-part learning algorithm ; it first induces an overly general lexicon (Section 5.1) and then learns to score derivations to increase accuracy (Section 5.2).
Learning	The lexical learning algorithm constructs a lexicon L from a corpus of question paraphrases C =
Learning	Figure 1: Our lexicon learning algorithm .
Overview of the Approach	Learning The learning algorithm induces a lexicon L and estimates the parameters 6 of the linear ranking function.
Overview of the Approach	The final result is a scalable learning algorithm that requires no manual annotation of questions.
Related Work	The learning algorithms presented in this paper are similar to algorithms used for paraphrase extraction from sentence-aligned corpora (Barzilay and McKeown, 2001; Barzilay and Lee, 2003; Quirk et al., 2004; Bannard and Callison-Burch, 2005; Callison-Burch, 2008; Marton et al., 2009).

learning algorithm is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

3. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introducing Nonlocal Features	In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Introducing Nonlocal Features	Thus the learning algorithm always reaches the end of a document, avoiding the problem that early updates discard parts of the training data.
Introducing Nonlocal Features	When we applied LaSO, we noticed that it performed worse than the baseline learning algorithm when only using local features.
Introduction	The main reason why early updates underper-form in our setting is that the task is too difficult and that the learning algorithm is not able to profit from all training data.
Introduction	Put another way, early updates happen too early, and the learning algorithm rarely reaches the end of the instances as it halts, updates, and moves on to the next instance.
Representation and Learning	Algorithm 1 shows pseudocode for the leam-ing algorithm, which we will refer to as the baseline learning algorithm .
Results	Since early updates do not always make use of the complete documents during training, it can be expected that it will require either a very wide beam or more iterations to get up to par with the baseline learning algorithm .
Results	Recall that with only local features, delayed LaSO is equivalent to the baseline learning algorithm .
Results	From these results we conclude that we are better off when the learning algorithm handles one document at a time, instead of getting feedback within documents.

learning algorithm is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

4. Learning Better Data Representation Using Inference-Driven Metric Learning

Dhillon, Paramveer S. and Talukdar, Partha Pratim and Crammer, Koby

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A <— METRICLEARNER(X, 3,1?)	We also experimented with the supervised large-margin metric learning algorithm (LMNN) presented in (Weinberger and Saul, 2009).
Abstract	We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA).
Abstract	Through a variety of experiments on different real-world datasets, we find IDML—IT, a semi-supervised metric learning algorithm to be the most effective.
Conclusion	In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA).
Introduction	Recently, several supervised metric learning algorithms have been proposed (Davis et al., 2007; Weinberger and Saul, 2009).
Introduction	Even though different supervised and semi-supervised metric learning algorithms have recently been proposed, effectiveness of the transformed spaces learned by them in NLP
Introduction	We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective.
Metric Learning	We shall now review two recently proposed metric learning algorithms .
Metric Learning	The ITML metric learning algorithm , which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data.
Metric Learning	In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined.

learning algorithm is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

5. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Additive tree metrics can be leveraged by “meta-algorithms” such as neighbor-joining (Saitou and Nei, 1987) and recursive grouping (Choi et al., 2011) to provide consistent learning algorithms for latent trees.
Abstract	In our learning algorithm , we assume that examples of the form (1002,3302) for i E [N] = {1, .
Abstract	The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	(2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity.
Introduction	To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
Introduction	In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms .
Related Work	Under this assumption, many feature learning algorithms are proposed to obtain better classification performance (Pang and Lee, 2008; Liu, 2012; Feldman, 2013).
Related Work	We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE.
Related Work	In the following sections, we introduce the traditional method before presenting the details of SSWE learning algorithms .

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

7. A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cohen, Shay B. and Collins, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce a provably correct learning algorithm for latent-variable PCFGs.
Additional Details of the Algorithm	The learning algorithm for L-PCFGs can be useI as an initializer for the EM algorithm for L PCFGs.
Experiments on Parsing	This section describes parsing experiments using the learning algorithm for L-PCFGs.
Experiments on Parsing	Table 1: Results on the development data (section 22) and test data (section 23) for various learning algorithms for L-PCFGs.
Experiments on Parsing	In this special case, the L-PCFG learning algorithm is equivalent to a simple algorithm, with the following steps: 1) define the matrix Q with entries wam = count(w1,w2)/N where count(w1,w2) is the number of times that bi-gram (101,102) is seen in the data, and N = wam count(w1, 7.02).
The Learning Algorithm for L-PCFGS	Our goal is to design a learning algorithm for L-PCFGs.
The Learning Algorithm for L-PCFGS	4.2 The Learning Algorithm
The Learning Algorithm for L-PCFGS	Figure 1 shows the learning algorithm for L-PCFGs.
The Matrix Decomposition Algorithm	This section describes the matrix decomposition algorithm used in Step 1 of the learning algorithm .

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Online Learning in Tensor Space

Cao, Yuan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose an online learning algorithm based on tensor-space models.
Abstract	We apply with the proposed algorithm to a parsing task, and show that even with very little training data the learning algorithm based on a tensor model performs well, and gives significantly better results than standard learning algorithms based on traditional vector-space models.
Conclusion and Future Work	In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA.
Introduction	Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
Introduction	A tensor weight learning algorithm is then proposed in 4.
Online Learning Algorithm	Here we propose an online learning algorithm similar to MIRA but modified to accommodate tensor models.
Online Learning Algorithm	,m, where 95,- is the input and y,- is the reference or oracle hypothesis, are fed to the weight learning algorithm in sequential order.
Tensor Model Construction	As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector, which serves as a “surro-
Tensor Space Representation	Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

9. Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Wei, Wei and Gulla, Jon Atle

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We further propose a specific hierarchical learning algorithm , called HL-SOT algorithm, which is developed based on generalizing an online-leaming algorithm H-RLS (Cesa—Bianchi et al., 2006).
Introduction	0 A specific hierarchical learning algorithm is
Related Work	In (Turney, 2002), an unsupervised learning algorithm was proposed to classify reviews as recommended or not recommended by averaging sentiment annotation of phrases in reviews that contain adjectives or adverbs.
The HL-SOT Approach	Then a specific hierarchical learning algorithm is further proposed to solve the formulated problem.
The HL-SOT Approach	Therefore we propose a specific hierarchical learning algorithm , named HL-SOT algorithm, that is able to train each node classifier in a batch-learning setting and allows separately learning for the threshold of each node classifier.
The HL-SOT Approach	Then the hierarchical classification function f is parameterized by the weight matrix W = (7.01, ..., wN)T and threshold vector 6 = (61, ..., 6N)T. The hierarchical learning algorithm HL-SOT is proposed for learning the parameters of W and 6.

learning algorithm is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

10. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The goal of our learning algorithm is to learn a mapping from inputs (unsegmented sentences) x E X to outputs (segmented paths) y E 3?
Conclusion	The second is a discriminative online learning algorithm based on MIRA that enables us to incorporate arbitrary features to our hybrid model.
Related work	In this section, we discuss related approaches based on several aspects of learning algorithms and search space representation methods.
Related work	Our approach overcomes the limitation of the original hybrid model by a discriminative online learning algorithm for training.
Training method	Therefore, we require a learning algorithm that can efficiently handle large and complex lattice structures.
Training method	Algorithm 1 Generic Online Learning Algorithm
Training method	Algorithm 1 outlines the generic online learning algorithm (McDonald, 2006) used in our framework.

learning algorithm is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Fast Online Lexicon Learning for Grounded Language Acquisition

Chen, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	In this section we review the leXicon learning algorithm introduced by Chen and Mooney (2011) as well as the overall task they designed to test semantic understanding of navigation instructions.
Experiments	We evaluate our new lexicon learning algorithm as well as the other modifications to the navigation system using the same three tasks as Chen and Mooney (2011).
Experiments	Here SGOLL has a decidedly large advantage over the lexicon learning algorithm from Chen and Mooney, requiring an order of magnitude less time to run.
Experiments	We have introduced a novel, online lexicon learning algorithm that is much faster than the one proposed by Chen and Mooney and also performs better on the navigation tasks they devised.
Introduction	Their lexicon learning algorithm finds the common connected subgraph that occurs with a word by taking intersections of the graphs that represent the different contexts in which the word appears.
Introduction	In addition to the new lexicon learning algorithm , we also look at modifying the meaning representation grammar (MRG) for their formal semantic language.
Online Lexicon Learning Algorithm	In addition to introducing a new lexicon learning algorithm , we also made another modification to the original system proposed by Chen and Mooney (2011).

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	extension of the model and learning algorithm to word sequences and (2) feature functions that relate acoustic measurements to sub-word units.
Experiments	We compare the CRF4, Passive-Aggressive (PA), and Pegasos learning algorithms .
Experiments	4We use the term “CRF” since the learning algorithm corresponds to CRF learning, although the task is multiclass classification rather than a sequence or structure prediction task.
Experiments	Models labeled X/Y use learning algorithm X and feature set Y.
Problem setting	In the next section we present a learning algorithm that aims to minimize the expected zero-one loss.

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Part-of-speech tagging with antagonistic adversaries

Sogaard, Anders

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Many discriminative learning algorithms are sensitive to such shifts because highly indicative features may swamp other indicative features.
Abstract	Regularized and adversarial learning algorithms have been proposed to be more robust against covariate shifts.
Abstract	We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
Conclusion	We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches.
Experiments	All learning algorithms do the same number of passes over each training data set.
Introduction	Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed.
Introduction	Most discriminate learning algorithms only update parameters when training examples are misclassified.

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Semi-Supervised Convex Training for Dependency Parsing

Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	By combining a supervised large margin loss with an unsupervised least squares loss, a dis-criminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems.
Conclusion and Future Work	Unlike previous proposed approaches, we introduce a convex objective for the semi-supervised learning algorithm by combining a convex structured SVM loss and a convex least square loss.
Introduction	Supervised learning algorithms still represent the state of the art approach for inferring dependency parsers from data (McDonald et al., 2005a; McDonald and Pereira, 2006; Wang et al., 2007).
Introduction	Unfortunately, although significant recent progress has been made in the area of semi-supervised learning, the performance of semi-supervised learning algorithms still fall far short of expectations, particularly in challenging real-world tasks such as natural language parsing or machine translation.
Introduction	The basic idea is to bootstrap a supervised learning algorithm by alternating between inferring the missing label information and retraining.

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Classification of Semantic Relationships between Nominals Using Pattern Clusters

Davidov, Dmitry and Rappoport, Ari

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	5.3 Parameters and Learning Algorithm
Experimental Setup	Selection of learning algorithm and its algorithm-specific parameters were done as follows.
Experimental Setup	Since each dataset has only 140 examples, the computation time of each learning algorithm is negligible.
Introduction	The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms .
Related Work	Various learning algorithms have been used for relation classification.
Related Work	Freely available tools like Weka (Witten and Frank, 1999) allow easy experimentation with common learning algorithms (Hendrickx et al., 2007).

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parser	Thus our system uses an unbounded number of transition relations, which has an apparent disadvantage for learning algorithms .
Model and Training	In this section we introduce the adopted learning algorithm and discuss the model parameters.
Model and Training	4.1 Learning Algorithm
Model and Training	Algorithm 2 Learning Algorithm

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Spectral Learning of Latent-Variable PCFGs

Cohen, Shay B. and Stratos, Karl and Collins, Michael and Foster, Dean P. and Ungar, Lyle

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006).
Deriving Empirical Estimates	We now state a PAC-style theorem for the learning algorithm .
Introduction	Recent work has introduced polynomial-time learning algorithms (and consistent estimation methods) for two important cases of hidden-variable models: Gaussian mixture models (Dasgupta, 1999; Vempala and Wang, 2004) and hidden Markov models (Hsu et al., 2009).
Proofs	Figure 4: The spectral learning algorithm .
Related Work	(2011) consider spectral learning algorithms of tree-structured directed bayes nets.

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

18. Question Answering Using Enhanced Lexical Semantic Models

Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms .
Introduction	First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings.
Learning QA Matching Models	A binary classifier can be trained easily using any machine learning algorithm in this standard supervised learning setting.
Learning QA Matching Models	We tested two learning algorithms in this setting: logistic regression and boosted decision trees (Friedman, 2001).
Learning QA Matching Models	The former is the log-linear model widely used in the NLP community and the latter is a robust nonlinear learning algorithm that has shown great empirical performance.

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

19. An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming

Yang, Xiaofeng and Su, Jian and Lang, Jun and Tan, Chew Lim and Liu, Ting and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Entity-mention Model with ILP	However, normal machine learning algorithms work on attribute-value vectors, which only allows the representation of atomic proposition.
Entity-mention Model with ILP	This requirement motivates our use of Inductive Logic Programming (ILP), a learning algorithm capable of inferring logic programs.
Experiments and Results	Default parameters were applied for all the other settings in ALEPH as well as other learning algorithms used in the experiments.
Introduction	Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms .
Modelling Coreference Resolution	Based on the training instances, a binary classifier can be generated using any discriminative learning algorithm .

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. Combining Data and Mathematical Models of Language Change

Sonderegger, Morgan and Niyogi, Partha

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	Each model describes the diachronic, population-level consequences of assuming a particular learning algorithm for individuals.
Discussion	By using simple models, we were able to consider a range of learning algorithms corresponding to different explanations for the observed diachronic dynamics.
Modeling preliminaries	This setting allows us to determine the diachronic, population-level consequences of assumptions about the learning algorithm used by individuals, as well as assumptions about population structure or the input they receive.
Models	We now describe 5 DS models, each corresponding to a learning algorithm A used by individual language learners.
Models	The models differ along two dimensions, corresponding to assumptions about the learning algorithm (A): whether or not it is assumed that the stress of examples is possibly mistransmitted (Models 1, 3, 5), and how the N and V probabil-

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

learning algorithm (5)

21. Reading between the Lines: Learning to Map High-Level Instructions to Commands

Branavan, S.R.K. and Zettlemoyer, Luke and Barzilay, Regina

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an efficient approximate approach for learning this environment model as part of a policy-gradient reinforcement learning algorithm for text interpretation.
Algorithm	The learning algorithm is provided with a set of documents d E D, an environment in which to execute command sequences 5’, and a reward function The goal is to estimate two sets of parameters: 1) the parameters 6 of the policy function, and 2) the partial environment transition model q(5’ \|5 , c), which is the observed portion of the true model 19(5’ \|5, c).
Introduction	Our method efficiently achieves both of these goals as part of a policy-gradient reinforcement learning algorithm .
Related Work	Interpreting Instructions Our approach is most closely related to the reinforcement learning algorithm for mapping text instructions to commands developed by Branavan et al.
Related Work	We address this limitation by expanding a policy learning algorithm to take advantage of a partial environment model estimated during learning.

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

learning algorithm (5)
log-linear (3)

22. Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet.
Character Classification Model	The classifier can be trained with online learning algorithms such as perceptron, or offline learning models such as support vector machines.
Conclusion and Future Work	This work presents a novel discriminative learning algorithm to utilize the knowledge in the massive natural annotations on the Internet.
Introduction	In this work we take for example a most important problem, word segmentation, and propose a novel discriminative learning algorithm to leverage the knowledge in massive natural annotations of web text.

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

23. Automatic Detection of Cognates Using Orthographic Alignment

Ciobanu, Alina Maria and Dinu, Liviu P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use aligned subsequences as features for machine learning algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates.
Conclusions and Future Work	and Waterman, 1981), and other learning algorithms for discriminating between cognates and non-cognates.
Our Approach	Therefore, because the edit distance was widely used in this research area and produced good results, we are encouraged to employ orthographic alignment for identifying pairs of cognates, not only to compute similarity scores, as was previously done, but to use aligned subsequences as features for machine learning algorithms .
Our Approach	3.3 Learning Algorithms

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

24. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Architecture	ADF learning algorithm : procedure ADF(q, c, a, [3)
System Architecture	Figure 1: The proposed ADF online learning algorithm .
System Architecture	Prior work on convergence analysis of existing online learning algorithms (Murata, 1998; Hsu et

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Implementation	The learning algorithm is applied in a shift-reduce parser, where the training data consists of the (unique) list of shift and reduce operations required to produce the gold RST parses.
Introduction	Alternatively, our approach can be seen as a nonlinear learning algorithm for incremental structure prediction, which overcomes feature sparsity through effective parameter tying.
Large-Margin Learning Framework	of our learning algorithm are different.
Large-Margin Learning Framework	Algorithm 1 Mini-batch learning algorithm Input: Training set D, Regularization parameters A and 7', Number of iteration T, Initialization matrix A0, and Threshold 5 whilet = l,...,Tdo

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Feature Weighting Methods	ing, better classification and regression models can be built by using the feature weights generated by these models as a pre-weight on the data points for other machine learning algorithms .
Related Work	Noise tolerance techniques aim to improve the learning algorithm itself to avoid over-fitting caused by mislabeled instances in the training phase, so that the constructed classifier becomes more noise-tolerant.
Related Work	Decision tree (Mingers, 1989; Vannoorenberghe and Denoeux, 2002) and boosting (Jiang, 2001; Kalaia and Servediob, 2005; Karmaker and Kwek, 2006) are two learning algorithms that have been investigated in many studies.
Related Work	For example, useful information can be removed with noise elimination, since annotation errors are likely to occur on ambiguous instances that are potentially valuable for learning algorithms .

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

27. Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora

Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This section first introduces the fundamental supervised learning method, and then describes a baseline active learning algorithm .
Abstract	3.2 Active Learning Algorithm
Abstract	4.4 Bilingual Active Learning Algorithm

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Over the past decade, supervised learning algorithms have gained widespread acceptance in natural language processing (NLP).
Introduction	The learning algorithm then optimizes a regularized, convex objective function that is expressed in terms of these features.
Introduction	However, the supervised learning algorithms can typically identify useful clusters and assign proper weights to them, effectively adapting the clusters to the domain.

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

Titov, Ivan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	However, most learning algorithms operate under assumption that the learning data originates from the same distribution as the test data, though in practice this assumption is often violated.
Introduction	We explain how the introduced regularizer can be integrated into the stochastic gradient descent learning algorithm for our model.
Learning and Inference	In this section we describe an approximate learning algorithm based on the mean-field approximation.
Learning and Inference	Though we believe that our approach is independent of the specific learning algorithm , we provide the description for completeness.

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

30. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Hoffmann, Raphael and Zhang, Congle and Ling, Xiao and Zettlemoyer, Luke and Weld, Daniel S.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Recently, researchers have developed multi-instance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO—of (Jobs, Apple).
Conclusion	Since the processs of matching database tuples to sentences is inherently heuristic, researchers have proposed multi-instance learning algorithms as a means for coping with the resulting noisy data.
Learning	We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7".
Modeling Overlapping Relations	Figure 2: The MULTIR Learning Algorithm

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

31. Automatic Generation of Story Highlights

Woodsend, Kristian and Lapata, Mirella

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Training We obtained phrase-based salience scores using a supervised machine learning algorithm .
Modeling	We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details).
Modeling	Let fi denote the salience score for phrase i, determined by the machine learning algorithm , and li is its length in tokens.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

32. Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees

Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Our work builds on one such approach — SampleRank (Wick et al., 2011), a sampling-based learning algorithm .
Sampling-Based Dependency Parsing with Global Features	We begin with the notation before addressing the decoding and learning algorithms .
Sampling-Based Dependency Parsing with Global Features	ure 4 summarizes the learning algorithm .

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Smoothing Natural Language Sequences	Formally, we define the smoothing task as follows: let D = {(X, z)\|x is a word sequence, z is a label sequence} be a labeled dataset of word sequences, and let M be a machine learning algorithm that will learn a function f to predict the correct labels.
Smoothing Natural Language Sequences	As an example, consider the string “Researchers test reformulated gasolines on newer engines.” In a common dataset for NP chunking, the word “reformulated” never appears in the training data, but appears four times in the test set as part of the NP “reformulated gasolines.” Thus, a learning algorithm supplied with word-level features would
Smoothing Natural Language Sequences	In particular, we seek to represent each word by a distribution over its contexts, and then provide the learning algorithm with features computed from this distribution.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Vector space semantics with frequency-driven motifs

Srivastava, Shashank and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Implicitly, the weight learning algorithm can be seen as a gradient descent procedure minimizing the difference between the scores of highest scoring (Viterbi) state sequences, and the label state sequences.
Introduction	Pseudocode of the learning algorithm for the partially labeled case is given in Algorithm 1.
Introduction	We see that while all three learning algorithms perform better than the baseline, the performance of the purely unsupervised system is inferior to supervised approaches.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. Heterogeneous Transfer Learning for Image Clustering via the SocialWeb

Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Works	However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly.
Related Works	Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space.
Related Works	For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm , which is called deep transfer, for transferring knowledge between biological domains and Web domains.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

36. Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

Costa, Francisco and Branco, António

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparing the two Datasets	Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation.
Comparing the two Datasets	Table 3: Performance of several machine learning algorithms on the Portuguese data for the TempEval-1 tasks.
Introduction	The results of machine learning algorithms over the data thus obtained are compared to those reported for the English TempEval-l competition.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

37. Text-level Discourse Dependency Parsing

Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Add arc <eC,ej> to GC with	As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data.
Add arc <eC,ej> to GC with	Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al.
Add arc <eC,ej> to GC with	We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

38. Learning 5000 Relational Extractors

Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Extraction with Lexicons	A learning algorithm expands the seed phrases into a set of lexicons.
Extraction with Lexicons	The semantic lexicons are added as features to the CRF learning algorithm .
Introduction	When learning an extractor for relation R, LUCHS extracts seed phrases from R’s training data and uses a semi-supervised learning algorithm to create several relation-specific lexicons at different points on a precision-recall spectrum.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CRF (10)
F1 score (8)
overfitting (6)

39. Collective Tweet Wikification based on Semi-supervised Graph Regularization

Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
Introduction	effort to explore graph-based semi-supervised learning algorithms for the wikification task.
Semi-supervised Graph Regularization	We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

40. Unsupervised Ontology Induction from Text

Poon, Hoifung and Domingos, Pedro

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We then present the OntoUSP Markov logic network and the inference and learning algorithms used with it.
Unsupervised Ontology Induction with Markov Logic	Finally, we describe the learning algorithm and how OntoUSP induces the ontology while learning the semantic parser.
Unsupervised Ontology Induction with Markov Logic	Algorithm 2 gives pseudo-code for OntoUSP’s learning algorithm .

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

41. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm .
Introduction	After background discussion on learning in SMT (§2), we introduce a novel online learning algorithm for relative margin maximization suitable for SMT (§3).
The Relative Margin Machine in SMT	We address the above-mentioned limitations by introducing a novel online learning algorithm for relative margin maximization, RM.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

42. Open Information Extraction Using Wikipedia

Wu, Fei and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner.
Introduction	0 Using the same learning algorithm and features as TextRunner, we compare four different ways to generate positive and negative training data with TextRunner’s method, concluding that our Wikipedia heuristic is responsible for the bulk of WOE’s improved accuracy.
Wikipedia-based Open IE	WOEPOS uses the same learning algorithm and selection of features as TextRunner: a two-order CRF chain model is trained with the Mallet package (McCallum, 2002).

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

43. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	We next investigate the features that were given high weight by our learning algorithm (in the constituent parsing case).
Web-count Features	A learning algorithm can then weight features so that they compare appropriately
Web-count Features	As discussed in Section 5, the top features learned by our learning algorithm duplicate the handcrafted configurations used in previous work (Nakov and Hearst, 2005b) but also add numerous others, and, of course, apply to many more attachment types.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

44. Global Learning of Typed Entailment Rules

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	(b) DIRT: (Lin and Pantel, 2001) a widely-used rule learning algorithm .
Experimental Evaluation	(c) BInc: (Szpektor and Dagan, 2008) a directional rule learning algorithm .
Learning Typed Entailment Graphs	Our learning algorithm is composed of two steps: (1) Given a set of typed predicates and their instances extracted from a corpus, we train a (local) entailment classifier that estimates for every pair of predicates whether one entails the other.

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

45. Topic Modeling Based Classification of Clinical Reports

Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Topic modeling is an unsupervised learning algorithm that can automatically discover themes of a document collection.
Background	Text classification is a supervised learning algorithm where documents’ categories are learned from pre-labeled set of documents.
Experiments	Weka was used to conduct classification which is a collection of machine learning algorithms for data mining tasks written in Java (Hall et al., 2009).

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

topic model (23)
SVM (14)
LDA (6)

46. Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Croce, Danilo and Moschitti, Alessandro and Basili, Roberto and Palmer, Martha

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We have proposed new approaches to characterize verb classes in learning algorithms .
Conclusion	The advantage of kernel methods is that they can be directly used in some learning algorithms , e.g., SVMs, to train verb classifiers.
Introduction	Note that their coding in learning algorithms is rather complex: we need to take into account syntactic structures, which may require an exponential number of syntactic features (i.e., all their possible substructures).

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms .
The Entity Grid Model	To make this representation accessible to machine learning algorithms , Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
The Entity Grid Model	(2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

48. The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Instead of using Mallet (McCallum, 2002) as a machine learning toolkit, we employ the Weka Data Mining Software (Hall et al., 2009) for classification, since it offers a wider range of state-of-the-art machine learning algorithms .
Experiments	Learning Algorithms
Experiments	We evaluated several learning algorithms from the Weka toolkit with respect to their performance on

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

Cai, Qingqing and Yates, Alexander

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm .
Introduction	On a dataset of 917 questions taken from 81 domains of the Freebase database, a standard learning algorithm for semantic parsing yields a parser with an F1 of 0.21, in large part because of the number of logical symbols that appear during testing but never appear during training.
Previous Work	Owing to the complexity of the general case, researchers have resorted to defining standard similarity metrics between relations and attributes, as well as machine learning algorithms for learning and predicting matches between relations (Doan et al., 2004; Wick et al., 2008b; Wick et al., 2008a; Nottelmann and Straccia, 2007; Berlin and Motro, 2006).

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: