Index of papers in Proc. ACL that mention
  • learning algorithm
Vogel, Adam and Jurafsky, Daniel
Abstract
We learn this correspondence with a reinforcement learning algorithm , using the deviation of the route we follow from the intended path as a reward signal.
Approximate Dynamic Programming
To learn these weights 6 we use SARSA (Sutton and Barto, 1998), an online learning algorithm similar to Q-learning (Watkins and Dayan, 1992).
Approximate Dynamic Programming
Algorithm 1 details the learning algorithm , which we follow here.
Approximate Dynamic Programming
We also compare against the policy gradient learning algorithm of Branavan et al.
Introduction
Solved using a reinforcement learning algorithm , our system acquires the meaning of spatial words through
Introduction
We frame direction following as an apprenticeship learning problem and solve it with a reinforcement learning algorithm , extending previous work on interpreting instructions by Branavan et al.
Reinforcement Learning Formulation
2Our learning algorithm is not dependent on a determin-
Reinforcement Learning Formulation
Learning exactly which words influence decision making is difficult; reinforcement learning algorithms have problems with the large, sparse feature vectors common in natural language processing.
learning algorithm is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Fader, Anthony and Zettlemoyer, Luke and Etzioni, Oren
Introduction
0 We describe scalable learning algorithms that induce general question templates and lexical variants of entities and relations.
Learning
PARALEX uses a two-part learning algorithm ; it first induces an overly general lexicon (Section 5.1) and then learns to score derivations to increase accuracy (Section 5.2).
Learning
The lexical learning algorithm constructs a lexicon L from a corpus of question paraphrases C =
Learning
Figure 1: Our lexicon learning algorithm .
Overview of the Approach
Learning The learning algorithm induces a lexicon L and estimates the parameters 6 of the linear ranking function.
Overview of the Approach
The final result is a scalable learning algorithm that requires no manual annotation of questions.
Related Work
The learning algorithms presented in this paper are similar to algorithms used for paraphrase extraction from sentence-aligned corpora (Barzilay and McKeown, 2001; Barzilay and Lee, 2003; Quirk et al., 2004; Bannard and Callison-Burch, 2005; Callison-Burch, 2008; Marton et al., 2009).
learning algorithm is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Introducing Nonlocal Features
In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Introducing Nonlocal Features
Thus the learning algorithm always reaches the end of a document, avoiding the problem that early updates discard parts of the training data.
Introducing Nonlocal Features
When we applied LaSO, we noticed that it performed worse than the baseline learning algorithm when only using local features.
Introduction
The main reason why early updates underper-form in our setting is that the task is too difficult and that the learning algorithm is not able to profit from all training data.
Introduction
Put another way, early updates happen too early, and the learning algorithm rarely reaches the end of the instances as it halts, updates, and moves on to the next instance.
Representation and Learning
Algorithm 1 shows pseudocode for the leam-ing algorithm, which we will refer to as the baseline learning algorithm .
Results
Since early updates do not always make use of the complete documents during training, it can be expected that it will require either a very wide beam or more iterations to get up to par with the baseline learning algorithm .
Results
Recall that with only local features, delayed LaSO is equivalent to the baseline learning algorithm .
Results
From these results we conclude that we are better off when the learning algorithm handles one document at a time, instead of getting feedback within documents.
learning algorithm is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Dhillon, Paramveer S. and Talukdar, Partha Pratim and Crammer, Koby
A <— METRICLEARNER(X, 3,1?)
We also experimented with the supervised large-margin metric learning algorithm (LMNN) presented in (Weinberger and Saul, 2009).
Abstract
We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA).
Abstract
Through a variety of experiments on different real-world datasets, we find IDML—IT, a semi-supervised metric learning algorithm to be the most effective.
Conclusion
In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA).
Introduction
Recently, several supervised metric learning algorithms have been proposed (Davis et al., 2007; Weinberger and Saul, 2009).
Introduction
Even though different supervised and semi-supervised metric learning algorithms have recently been proposed, effectiveness of the transformed spaces learned by them in NLP
Introduction
We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective.
Metric Learning
We shall now review two recently proposed metric learning algorithms .
Metric Learning
The ITML metric learning algorithm , which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data.
Metric Learning
In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined.
learning algorithm is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.
Abstract
Additive tree metrics can be leveraged by “meta-algorithms” such as neighbor-joining (Saitou and Nei, 1987) and recursive grouping (Choi et al., 2011) to provide consistent learning algorithms for latent trees.
Abstract
In our learning algorithm , we assume that examples of the form (1002,3302) for i E [N] = {1, .
Abstract
The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Introduction
(2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity.
Introduction
To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
Introduction
In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms .
Related Work
Under this assumption, many feature learning algorithms are proposed to obtain better classification performance (Pang and Lee, 2008; Liu, 2012; Feldman, 2013).
Related Work
We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE.
Related Work
In the following sections, we introduce the traditional method before presenting the details of SSWE learning algorithms .
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Collins, Michael
Abstract
We introduce a provably correct learning algorithm for latent-variable PCFGs.
Additional Details of the Algorithm
The learning algorithm for L-PCFGs can be useI as an initializer for the EM algorithm for L PCFGs.
Experiments on Parsing
This section describes parsing experiments using the learning algorithm for L-PCFGs.
Experiments on Parsing
Table 1: Results on the development data (section 22) and test data (section 23) for various learning algorithms for L-PCFGs.
Experiments on Parsing
In this special case, the L-PCFG learning algorithm is equivalent to a simple algorithm, with the following steps: 1) define the matrix Q with entries wam = count(w1,w2)/N where count(w1,w2) is the number of times that bi-gram (101,102) is seen in the data, and N = wam count(w1, 7.02).
The Learning Algorithm for L-PCFGS
Our goal is to design a learning algorithm for L-PCFGs.
The Learning Algorithm for L-PCFGS
4.2 The Learning Algorithm
The Learning Algorithm for L-PCFGS
Figure 1 shows the learning algorithm for L-PCFGs.
The Matrix Decomposition Algorithm
This section describes the matrix decomposition algorithm used in Step 1 of the learning algorithm .
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Abstract
We propose an online learning algorithm based on tensor-space models.
Abstract
We apply with the proposed algorithm to a parsing task, and show that even with very little training data the learning algorithm based on a tensor model performs well, and gives significantly better results than standard learning algorithms based on traditional vector-space models.
Conclusion and Future Work
In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA.
Introduction
Many learning algorithms applied to NLP problems, such as the Perceptron (Collins,
Introduction
A tensor weight learning algorithm is then proposed in 4.
Online Learning Algorithm
Here we propose an online learning algorithm similar to MIRA but modified to accommodate tensor models.
Online Learning Algorithm
,m, where 95,- is the input and y,- is the reference or oracle hypothesis, are fed to the weight learning algorithm in sequential order.
Tensor Model Construction
As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector, which serves as a “surro-
Tensor Space Representation
Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses.
learning algorithm is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Wei, Wei and Gulla, Jon Atle
Introduction
We further propose a specific hierarchical learning algorithm , called HL-SOT algorithm, which is developed based on generalizing an online-leaming algorithm H-RLS (Cesa—Bianchi et al., 2006).
Introduction
0 A specific hierarchical learning algorithm is
Related Work
In (Turney, 2002), an unsupervised learning algorithm was proposed to classify reviews as recommended or not recommended by averaging sentiment annotation of phrases in reviews that contain adjectives or adverbs.
The HL-SOT Approach
Then a specific hierarchical learning algorithm is further proposed to solve the formulated problem.
The HL-SOT Approach
Therefore we propose a specific hierarchical learning algorithm , named HL-SOT algorithm, that is able to train each node classifier in a batch-learning setting and allows separately learning for the threshold of each node classifier.
The HL-SOT Approach
Then the hierarchical classification function f is parameterized by the weight matrix W = (7.01, ..., wN)T and threshold vector 6 = (61, ..., 6N)T. The hierarchical learning algorithm HL-SOT is proposed for learning the parameters of W and 6.
learning algorithm is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Background
The goal of our learning algorithm is to learn a mapping from inputs (unsegmented sentences) x E X to outputs (segmented paths) y E 3?
Conclusion
The second is a discriminative online learning algorithm based on MIRA that enables us to incorporate arbitrary features to our hybrid model.
Related work
In this section, we discuss related approaches based on several aspects of learning algorithms and search space representation methods.
Related work
Our approach overcomes the limitation of the original hybrid model by a discriminative online learning algorithm for training.
Training method
Therefore, we require a learning algorithm that can efficiently handle large and complex lattice structures.
Training method
Algorithm 1 Generic Online Learning Algorithm
Training method
Algorithm 1 outlines the generic online learning algorithm (McDonald, 2006) used in our framework.
learning algorithm is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Chen, David
Background
In this section we review the leXicon learning algorithm introduced by Chen and Mooney (2011) as well as the overall task they designed to test semantic understanding of navigation instructions.
Experiments
We evaluate our new lexicon learning algorithm as well as the other modifications to the navigation system using the same three tasks as Chen and Mooney (2011).
Experiments
Here SGOLL has a decidedly large advantage over the lexicon learning algorithm from Chen and Mooney, requiring an order of magnitude less time to run.
Experiments
We have introduced a novel, online lexicon learning algorithm that is much faster than the one proposed by Chen and Mooney and also performs better on the navigation tasks they devised.
Introduction
Their lexicon learning algorithm finds the common connected subgraph that occurs with a word by taking intersections of the graphs that represent the different contexts in which the word appears.
Introduction
In addition to the new lexicon learning algorithm , we also look at modifying the meaning representation grammar (MRG) for their formal semantic language.
Online Lexicon Learning Algorithm
In addition to introducing a new lexicon learning algorithm , we also made another modification to the original system proposed by Chen and Mooney (2011).
learning algorithm is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tang, Hao and Keshet, Joseph and Livescu, Karen
Discussion
extension of the model and learning algorithm to word sequences and (2) feature functions that relate acoustic measurements to sub-word units.
Experiments
We compare the CRF4, Passive-Aggressive (PA), and Pegasos learning algorithms .
Experiments
4We use the term “CRF” since the learning algorithm corresponds to CRF learning, although the task is multiclass classification rather than a sequence or structure prediction task.
Experiments
Models labeled X/Y use learning algorithm X and feature set Y.
Problem setting
In the next section we present a learning algorithm that aims to minimize the expected zero-one loss.
learning algorithm is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sogaard, Anders
Abstract
Many discriminative learning algorithms are sensitive to such shifts because highly indicative features may swamp other indicative features.
Abstract
Regularized and adversarial learning algorithms have been proposed to be more robust against covariate shifts.
Abstract
We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
Conclusion
We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches.
Experiments
All learning algorithms do the same number of passes over each training data set.
Introduction
Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed.
Introduction
Most discriminate learning algorithms only update parameters when training examples are misclassified.
learning algorithm is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang
Abstract
By combining a supervised large margin loss with an unsupervised least squares loss, a dis-criminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems.
Conclusion and Future Work
Unlike previous proposed approaches, we introduce a convex objective for the semi-supervised learning algorithm by combining a convex structured SVM loss and a convex least square loss.
Introduction
Supervised learning algorithms still represent the state of the art approach for inferring dependency parsers from data (McDonald et al., 2005a; McDonald and Pereira, 2006; Wang et al., 2007).
Introduction
Unfortunately, although significant recent progress has been made in the area of semi-supervised learning, the performance of semi-supervised learning algorithms still fall far short of expectations, particularly in challenging real-world tasks such as natural language parsing or machine translation.
Introduction
The basic idea is to bootstrap a supervised learning algorithm by alternating between inferring the missing label information and retraining.
learning algorithm is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Davidov, Dmitry and Rappoport, Ari
Experimental Setup
5.3 Parameters and Learning Algorithm
Experimental Setup
Selection of learning algorithm and its algorithm-specific parameters were done as follows.
Experimental Setup
Since each dataset has only 140 examples, the computation time of each learning algorithm is negligible.
Introduction
The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms .
Related Work
Various learning algorithms have been used for relation classification.
Related Work
Freely available tools like Weka (Witten and Frank, 1999) allow easy experimentation with common learning algorithms (Hendrickx et al., 2007).
learning algorithm is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim
Dependency Parser
Thus our system uses an unbounded number of transition relations, which has an apparent disadvantage for learning algorithms .
Model and Training
In this section we introduce the adopted learning algorithm and discuss the model parameters.
Model and Training
4.1 Learning Algorithm
Model and Training
Algorithm 2 Learning Algorithm
learning algorithm is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Stratos, Karl and Collins, Michael and Foster, Dean P. and Ungar, Lyle
Abstract
We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006).
Deriving Empirical Estimates
We now state a PAC-style theorem for the learning algorithm .
Introduction
Recent work has introduced polynomial-time learning algorithms (and consistent estimation methods) for two important cases of hidden-variable models: Gaussian mixture models (Dasgupta, 1999; Vempala and Wang, 2004) and hidden Markov models (Hsu et al., 2009).
Proofs
Figure 4: The spectral learning algorithm .
Related Work
(2011) consider spectral learning algorithms of tree-structured directed bayes nets.
learning algorithm is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej
Abstract
Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms .
Introduction
First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings.
Learning QA Matching Models
A binary classifier can be trained easily using any machine learning algorithm in this standard supervised learning setting.
Learning QA Matching Models
We tested two learning algorithms in this setting: logistic regression and boosted decision trees (Friedman, 2001).
Learning QA Matching Models
The former is the log-linear model widely used in the NLP community and the latter is a robust nonlinear learning algorithm that has shown great empirical performance.
learning algorithm is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yang, Xiaofeng and Su, Jian and Lang, Jun and Tan, Chew Lim and Liu, Ting and Li, Sheng
Entity-mention Model with ILP
However, normal machine learning algorithms work on attribute-value vectors, which only allows the representation of atomic proposition.
Entity-mention Model with ILP
This requirement motivates our use of Inductive Logic Programming (ILP), a learning algorithm capable of inferring logic programs.
Experiments and Results
Default parameters were applied for all the other settings in ALEPH as well as other learning algorithms used in the experiments.
Introduction
Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms .
Modelling Coreference Resolution
Based on the training instances, a binary classifier can be generated using any discriminative learning algorithm .
learning algorithm is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sonderegger, Morgan and Niyogi, Partha
Discussion
Each model describes the diachronic, population-level consequences of assuming a particular learning algorithm for individuals.
Discussion
By using simple models, we were able to consider a range of learning algorithms corresponding to different explanations for the observed diachronic dynamics.
Modeling preliminaries
This setting allows us to determine the diachronic, population-level consequences of assumptions about the learning algorithm used by individuals, as well as assumptions about population structure or the input they receive.
Models
We now describe 5 DS models, each corresponding to a learning algorithm A used by individual language learners.
Models
The models differ along two dimensions, corresponding to assumptions about the learning algorithm (A): whether or not it is assumed that the stress of examples is possibly mistransmitted (Models 1, 3, 5), and how the N and V probabil-
learning algorithm is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Zettlemoyer, Luke and Barzilay, Regina
Abstract
We present an efficient approximate approach for learning this environment model as part of a policy-gradient reinforcement learning algorithm for text interpretation.
Algorithm
The learning algorithm is provided with a set of documents d E D, an environment in which to execute command sequences 5’, and a reward function The goal is to estimate two sets of parameters: 1) the parameters 6 of the policy function, and 2) the partial environment transition model q(5’ |5 , c), which is the observed portion of the true model 19(5’ |5, c).
Introduction
Our method efficiently achieves both of these goals as part of a policy-gradient reinforcement learning algorithm .
Related Work
Interpreting Instructions Our approach is most closely related to the reinforcement learning algorithm for mapping text instructions to commands developed by Branavan et al.
Related Work
We address this limitation by expanding a policy learning algorithm to take advantage of a partial environment model estimated during learning.
learning algorithm is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Abstract
In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet.
Character Classification Model
The classifier can be trained with online learning algorithms such as perceptron, or offline learning models such as support vector machines.
Conclusion and Future Work
This work presents a novel discriminative learning algorithm to utilize the knowledge in the massive natural annotations on the Internet.
Introduction
In this work we take for example a most important problem, word segmentation, and propose a novel discriminative learning algorithm to leverage the knowledge in massive natural annotations of web text.
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ciobanu, Alina Maria and Dinu, Liviu P.
Abstract
We use aligned subsequences as features for machine learning algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates.
Conclusions and Future Work
and Waterman, 1981), and other learning algorithms for discriminating between cognates and non-cognates.
Our Approach
Therefore, because the edit distance was widely used in this research area and produced good results, we are encouraged to employ orthographic alignment for identifying pairs of cognates, not only to compute similarity scores, as was previously done, but to use aligned subsequences as features for machine learning algorithms .
Our Approach
3.3 Learning Algorithms
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
System Architecture
ADF learning algorithm : procedure ADF(q, c, a, [3)
System Architecture
Figure 1: The proposed ADF online learning algorithm .
System Architecture
Prior work on convergence analysis of existing online learning algorithms (Murata, 1998; Hsu et
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ji, Yangfeng and Eisenstein, Jacob
Implementation
The learning algorithm is applied in a shift-reduce parser, where the training data consists of the (unique) list of shift and reduce operations required to produce the gold RST parses.
Introduction
Alternatively, our approach can be seen as a nonlinear learning algorithm for incremental structure prediction, which overcomes feature sparsity through effective parameter tying.
Large-Margin Learning Framework
of our learning algorithm are different.
Large-Margin Learning Framework
Algorithm 1 Mini-batch learning algorithm Input: Training set D, Regularization parameters A and 7', Number of iteration T, Initialization matrix A0, and Threshold 5 whilet = l,...,Tdo
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Feature Weighting Methods
ing, better classification and regression models can be built by using the feature weights generated by these models as a pre-weight on the data points for other machine learning algorithms .
Related Work
Noise tolerance techniques aim to improve the learning algorithm itself to avoid over-fitting caused by mislabeled instances in the training phase, so that the constructed classifier becomes more noise-tolerant.
Related Work
Decision tree (Mingers, 1989; Vannoorenberghe and Denoeux, 2002) and boosting (Jiang, 2001; Kalaia and Servediob, 2005; Karmaker and Kwek, 2006) are two learning algorithms that have been investigated in many studies.
Related Work
For example, useful information can be removed with noise elimination, since annotation errors are likely to occur on ambiguous instances that are potentially valuable for learning algorithms .
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
This section first introduces the fundamental supervised learning method, and then describes a baseline active learning algorithm .
Abstract
3.2 Active Learning Algorithm
Abstract
4.4 Bilingual Active Learning Algorithm
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Introduction
Over the past decade, supervised learning algorithms have gained widespread acceptance in natural language processing (NLP).
Introduction
The learning algorithm then optimizes a regularized, convex objective function that is expressed in terms of these features.
Introduction
However, the supervised learning algorithms can typically identify useful clusters and assign proper weights to them, effectively adapting the clusters to the domain.
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan
Introduction
However, most learning algorithms operate under assumption that the learning data originates from the same distribution as the test data, though in practice this assumption is often violated.
Introduction
We explain how the introduced regularizer can be integrated into the stochastic gradient descent learning algorithm for our model.
Learning and Inference
In this section we describe an approximate learning algorithm based on the mean-field approximation.
Learning and Inference
Though we believe that our approach is independent of the specific learning algorithm , we provide the description for completeness.
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hoffmann, Raphael and Zhang, Congle and Ling, Xiao and Zettlemoyer, Luke and Weld, Daniel S.
Abstract
Recently, researchers have developed multi-instance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO—of (Jobs, Apple).
Conclusion
Since the processs of matching database tuples to sentences is inherently heuristic, researchers have proposed multi-instance learning algorithms as a means for coping with the resulting noisy data.
Learning
We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7".
Modeling Overlapping Relations
Figure 2: The MULTIR Learning Algorithm
learning algorithm is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Woodsend, Kristian and Lapata, Mirella
Experimental Setup
Training We obtained phrase-based salience scores using a supervised machine learning algorithm .
Modeling
We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details).
Modeling
Let fi denote the salience score for phrase i, determined by the machine learning algorithm , and li is its length in tokens.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Related Work
Our work builds on one such approach — SampleRank (Wick et al., 2011), a sampling-based learning algorithm .
Sampling-Based Dependency Parsing with Global Features
We begin with the notation before addressing the decoding and learning algorithms .
Sampling-Based Dependency Parsing with Global Features
ure 4 summarizes the learning algorithm .
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Smoothing Natural Language Sequences
Formally, we define the smoothing task as follows: let D = {(X, z)|x is a word sequence, z is a label sequence} be a labeled dataset of word sequences, and let M be a machine learning algorithm that will learn a function f to predict the correct labels.
Smoothing Natural Language Sequences
As an example, consider the string “Researchers test reformulated gasolines on newer engines.” In a common dataset for NP chunking, the word “reformulated” never appears in the training data, but appears four times in the test set as part of the NP “reformulated gasolines.” Thus, a learning algorithm supplied with word-level features would
Smoothing Natural Language Sequences
In particular, we seek to represent each word by a distribution over its contexts, and then provide the learning algorithm with features computed from this distribution.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
Implicitly, the weight learning algorithm can be seen as a gradient descent procedure minimizing the difference between the scores of highest scoring (Viterbi) state sequences, and the label state sequences.
Introduction
Pseudocode of the learning algorithm for the partially labeled case is given in Algorithm 1.
Introduction
We see that while all three learning algorithms perform better than the baseline, the performance of the purely unsupervised system is inferior to supervised approaches.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Related Works
However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly.
Related Works
Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space.
Related Works
For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm , which is called deep transfer, for transferring knowledge between biological domains and Web domains.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Costa, Francisco and Branco, António
Comparing the two Datasets
Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation.
Comparing the two Datasets
Table 3: Performance of several machine learning algorithms on the Portuguese data for the TempEval-1 tasks.
Introduction
The results of machine learning algorithms over the data thus obtained are compared to those reported for the English TempEval-l competition.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie
Add arc <eC,ej> to GC with
As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data.
Add arc <eC,ej> to GC with
Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al.
Add arc <eC,ej> to GC with
We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.
Extraction with Lexicons
A learning algorithm expands the seed phrases into a set of lexicons.
Extraction with Lexicons
The semantic lexicons are added as features to the CRF learning algorithm .
Introduction
When learning an extractor for relation R, LUCHS extracts seed phrases from R’s training data and uses a semi-supervised learning algorithm to create several relation-specific lexicons at different points on a precision-recall spectrum.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew
Introduction
In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
Introduction
effort to explore graph-based semi-supervised learning algorithms for the wikification task.
Semi-supervised Graph Regularization
We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Poon, Hoifung and Domingos, Pedro
Introduction
We then present the OntoUSP Markov logic network and the inference and learning algorithms used with it.
Unsupervised Ontology Induction with Markov Logic
Finally, we describe the learning algorithm and how OntoUSP induces the ontology while learning the semantic parser.
Unsupervised Ontology Induction with Markov Logic
Algorithm 2 gives pseudo-code for OntoUSP’s learning algorithm .
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Introduction
The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm .
Introduction
After background discussion on learning in SMT (§2), we introduce a novel online learning algorithm for relative margin maximization suitable for SMT (§3).
The Relative Margin Machine in SMT
We address the above-mentioned limitations by introducing a novel online learning algorithm for relative margin maximization, RM.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Fei and Weld, Daniel S.
Experiments
The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner.
Introduction
0 Using the same learning algorithm and features as TextRunner, we compare four different ways to generate positive and negative training data with TextRunner’s method, concluding that our Wikipedia heuristic is responsible for the bulk of WOE’s improved accuracy.
Wikipedia-based Open IE
WOEPOS uses the same learning algorithm and selection of features as TextRunner: a two-order CRF chain model is trained with the Mallet package (McCallum, 2002).
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Analysis
We next investigate the features that were given high weight by our learning algorithm (in the constituent parsing case).
Web-count Features
A learning algorithm can then weight features so that they compare appropriately
Web-count Features
As discussed in Section 5, the top features learned by our learning algorithm duplicate the handcrafted configurations used in previous work (Nakov and Hearst, 2005b) but also add numerous others, and, of course, apply to many more attachment types.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Experimental Evaluation
(b) DIRT: (Lin and Pantel, 2001) a widely-used rule learning algorithm .
Experimental Evaluation
(c) BInc: (Szpektor and Dagan, 2008) a directional rule learning algorithm .
Learning Typed Entailment Graphs
Our learning algorithm is composed of two steps: (1) Given a set of typed predicates and their instances extracted from a corpus, we train a (local) entailment classifier that estimates for every pair of predicates whether one entails the other.
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah
Background
Topic modeling is an unsupervised learning algorithm that can automatically discover themes of a document collection.
Background
Text classification is a supervised learning algorithm where documents’ categories are learned from pre-labeled set of documents.
Experiments
Weka was used to conduct classification which is a collection of machine learning algorithms for data mining tasks written in Java (Hall et al., 2009).
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Croce, Danilo and Moschitti, Alessandro and Basili, Roberto and Palmer, Martha
Conclusion
We have proposed new approaches to characterize verb classes in learning algorithms .
Conclusion
The advantage of kernel methods is that they can be directly used in some learning algorithms , e.g., SVMs, to train verb classifiers.
Introduction
Note that their coding in learning algorithms is rather complex: we need to take into account syntactic structures, which may require an exponential number of syntactic features (i.e., all their possible substructures).
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Introduction
From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms .
The Entity Grid Model
To make this representation accessible to machine learning algorithms , Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
The Entity Grid Model
(2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc
Experiments
Instead of using Mallet (McCallum, 2002) as a machine learning toolkit, we employ the Weka Data Mining Software (Hall et al., 2009) for classification, since it offers a wider range of state-of-the-art machine learning algorithms .
Experiments
Learning Algorithms
Experiments
We evaluated several learning algorithms from the Weka toolkit with respect to their performance on
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cai, Qingqing and Yates, Alexander
Abstract
Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm .
Introduction
On a dataset of 917 questions taken from 81 domains of the Freebase database, a standard learning algorithm for semantic parsing yields a parser with an F1 of 0.21, in large part because of the number of logical symbols that appear during testing but never appear during training.
Previous Work
Owing to the complexity of the general case, researchers have resorted to defining standard similarity metrics between relations and attributes, as well as machine learning algorithms for learning and predicting matches between relations (Doan et al., 2004; Wick et al., 2008b; Wick et al., 2008a; Nottelmann and Straccia, 2007; Berlin and Motro, 2006).
learning algorithm is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: