Index of papers in Proc. ACL that mention

**learning algorithm**

Abstract | We learn this correspondence with a reinforcement learning algorithm , using the deviation of the route we follow from the intended path as a reward signal. |

Approximate Dynamic Programming | To learn these weights 6 we use SARSA (Sutton and Barto, 1998), an online learning algorithm similar to Q-learning (Watkins and Dayan, 1992). |

Approximate Dynamic Programming | Algorithm 1 details the learning algorithm , which we follow here. |

Approximate Dynamic Programming | We also compare against the policy gradient learning algorithm of Branavan et al. |

Introduction | Solved using a reinforcement learning algorithm , our system acquires the meaning of spatial words through |

Introduction | We frame direction following as an apprenticeship learning problem and solve it with a reinforcement learning algorithm , extending previous work on interpreting instructions by Branavan et al. |

Reinforcement Learning Formulation | 2Our learning algorithm is not dependent on a determin- |

Reinforcement Learning Formulation | Learning exactly which words influence decision making is difficult; reinforcement learning algorithms have problems with the large, sparse feature vectors common in natural language processing. |

learning algorithm is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

- learning algorithm (12)
- natural language (6)
- feature vector (4)

Introduction | 0 We describe scalable learning algorithms that induce general question templates and lexical variants of entities and relations. |

Learning | PARALEX uses a two-part learning algorithm ; it first induces an overly general lexicon (Section 5.1) and then learns to score derivations to increase accuracy (Section 5.2). |

Learning | The lexical learning algorithm constructs a lexicon L from a corpus of question paraphrases C = |

Learning | Figure 1: Our lexicon learning algorithm . |

Overview of the Approach | Learning The learning algorithm induces a lexicon L and estimates the parameters 6 of the linear ranking function. |

Overview of the Approach | The final result is a scalable learning algorithm that requires no manual annotation of questions. |

Related Work | The learning algorithms presented in this paper are similar to algorithms used for paraphrase extraction from sentence-aligned corpora (Barzilay and McKeown, 2001; Barzilay and Lee, 2003; Quirk et al., 2004; Bannard and Callison-Burch, 2005; Callison-Burch, 2008; Marton et al., 2009). |

learning algorithm is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

- learning algorithm (16)
- word alignment (9)
- perceptron (8)

Introducing Nonlocal Features | In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used. |

Introducing Nonlocal Features | Thus the learning algorithm always reaches the end of a document, avoiding the problem that early updates discard parts of the training data. |

Introducing Nonlocal Features | When we applied LaSO, we noticed that it performed worse than the baseline learning algorithm when only using local features. |

Introduction | The main reason why early updates underper-form in our setting is that the task is too difficult and that the learning algorithm is not able to profit from all training data. |

Introduction | Put another way, early updates happen too early, and the learning algorithm rarely reaches the end of the instances as it halts, updates, and moves on to the next instance. |

Representation and Learning | Algorithm 1 shows pseudocode for the leam-ing algorithm, which we will refer to as the baseline learning algorithm . |

Results | Since early updates do not always make use of the complete documents during training, it can be expected that it will require either a very wide beam or more iterations to get up to par with the baseline learning algorithm . |

Results | Recall that with only local features, delayed LaSO is equivalent to the baseline learning algorithm . |

Results | From these results we conclude that we are better off when the learning algorithm handles one document at a time, instead of getting feedback within documents. |

learning algorithm is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

- coreference (25)
- CoNLL (19)
- weight vector (14)

A <— METRICLEARNER(X, 3,1?) | We also experimented with the supervised large-margin metric learning algorithm (LMNN) presented in (Weinberger and Saul, 2009). |

Abstract | We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). |

Abstract | Through a variety of experiments on different real-world datasets, we find IDML—IT, a semi-supervised metric learning algorithm to be the most effective. |

Conclusion | In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). |

Introduction | Recently, several supervised metric learning algorithms have been proposed (Davis et al., 2007; Weinberger and Saul, 2009). |

Introduction | Even though different supervised and semi-supervised metric learning algorithms have recently been proposed, effectiveness of the transformed spaces learned by them in NLP |

Introduction | We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective. |

Metric Learning | We shall now review two recently proposed metric learning algorithms . |

Metric Learning | The ITML metric learning algorithm , which we reviewed in Section 2.2, is supervised in nature, and hence it does not exploit widely available unlabeled data. |

Metric Learning | In this section, we review Inference Driven Metric Learning (IDML) (Algorithm 1) (Dhillon et al., 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined. |

learning algorithm is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

- learning algorithms (11)
- semi-supervised (10)
- unlabeled data (4)

Abstract | Additive tree metrics can be leveraged by “meta-algorithms” such as neighbor-joining (Saitou and Nei, 1987) and recursive grouping (Choi et al., 2011) to provide consistent learning algorithms for latent trees. |

Abstract | In our learning algorithm , we assume that examples of the form (1002,3302) for i E [N] = {1, . |

Abstract | The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree. |

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- POS tags (15)
- embeddings (10)
- learning algorithm (9)

Introduction | (2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity. |

Introduction | To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. |

Introduction | In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms . |

Related Work | Under this assumption, many feature learning algorithms are proposed to obtain better classification performance (Pang and Lee, 2008; Liu, 2012; Feldman, 2013). |

Related Work | We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE. |

Related Work | In the following sections, we introduce the traditional method before presenting the details of SSWE learning algorithms . |

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- sentiment classification (52)
- word embedding (46)
- neural networks (16)

Abstract | We introduce a provably correct learning algorithm for latent-variable PCFGs. |

Additional Details of the Algorithm | The learning algorithm for L-PCFGs can be useI as an initializer for the EM algorithm for L PCFGs. |

Experiments on Parsing | This section describes parsing experiments using the learning algorithm for L-PCFGs. |

Experiments on Parsing | Table 1: Results on the development data (section 22) and test data (section 23) for various learning algorithms for L-PCFGs. |

Experiments on Parsing | In this special case, the L-PCFG learning algorithm is equivalent to a simple algorithm, with the following steps: 1) define the matrix Q with entries wam = count(w1,w2)/N where count(w1,w2) is the number of times that bi-gram (101,102) is seen in the data, and N = wam count(w1, 7.02). |

The Learning Algorithm for L-PCFGS | Our goal is to design a learning algorithm for L-PCFGs. |

The Learning Algorithm for L-PCFGS | 4.2 The Learning Algorithm |

The Learning Algorithm for L-PCFGS | Figure 1 shows the learning algorithm for L-PCFGs. |

The Matrix Decomposition Algorithm | This section describes the matrix decomposition algorithm used in Step 1 of the learning algorithm . |

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

Abstract | We propose an online learning algorithm based on tensor-space models. |

Abstract | We apply with the proposed algorithm to a parsing task, and show that even with very little training data the learning algorithm based on a tensor model performs well, and gives significantly better results than standard learning algorithms based on traditional vector-space models. |

Conclusion and Future Work | In this paper, we reformulated the traditional linear vector-space models as tensor-space models, and proposed an online learning algorithm named Tensor-MIRA. |

Introduction | Many learning algorithms applied to NLP problems, such as the Perceptron (Collins, |

Introduction | A tensor weight learning algorithm is then proposed in 4. |

Online Learning Algorithm | Here we propose an online learning algorithm similar to MIRA but modified to accommodate tensor models. |

Online Learning Algorithm | ,m, where 95,- is the input and y,- is the reference or oracle hypothesis, are fed to the weight learning algorithm in sequential order. |

Tensor Model Construction | As a way out, we first run a simple vector-model based learning algorithm (say the Perceptron) on the training data and estimate a weight vector, which serves as a “surro- |

Tensor Space Representation | Most of the learning algorithms for NLP problems are based on vector space models, which represent data as vectors qb E R”, and try to learn feature weight vectors w E R” such that a linear model 3/ = w - qb is able to discriminate between, say, good and bad hypotheses. |

learning algorithm is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- feature weights (18)
- learning algorithm (9)
- Perceptron (9)

Introduction | We further propose a specific hierarchical learning algorithm , called HL-SOT algorithm, which is developed based on generalizing an online-leaming algorithm H-RLS (Cesa—Bianchi et al., 2006). |

Introduction | 0 A specific hierarchical learning algorithm is |

Related Work | In (Turney, 2002), an unsupervised learning algorithm was proposed to classify reviews as recommended or not recommended by averaging sentiment annotation of phrases in reviews that contain adjectives or adverbs. |

The HL-SOT Approach | Then a specific hierarchical learning algorithm is further proposed to solve the formulated problem. |

The HL-SOT Approach | Therefore we propose a specific hierarchical learning algorithm , named HL-SOT algorithm, that is able to train each node classifier in a batch-learning setting and allows separately learning for the threshold of each node classifier. |

The HL-SOT Approach | Then the hierarchical classification function f is parameterized by the weight matrix W = (7.01, ..., wN)T and threshold vector 6 = (61, ..., 6N)T. The hierarchical learning algorithm HL-SOT is proposed for learning the parameters of W and 6. |

learning algorithm is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- sentiment analysis (33)
- learning algorithm (8)
- weight vector (3)

Background | The goal of our learning algorithm is to learn a mapping from inputs (unsegmented sentences) x E X to outputs (segmented paths) y E 3? |

Conclusion | The second is a discriminative online learning algorithm based on MIRA that enables us to incorporate arbitrary features to our hybrid model. |

Related work | In this section, we discuss related approaches based on several aspects of learning algorithms and search space representation methods. |

Related work | Our approach overcomes the limitation of the original hybrid model by a discriminative online learning algorithm for training. |

Training method | Therefore, we require a learning algorithm that can efficiently handle large and complex lattice structures. |

Training method | Algorithm 1 Generic Online Learning Algorithm |

Training method | Algorithm 1 outlines the generic online learning algorithm (McDonald, 2006) used in our framework. |

learning algorithm is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- POS tagging (26)
- word segmentation (14)
- word-level (12)

Background | In this section we review the leXicon learning algorithm introduced by Chen and Mooney (2011) as well as the overall task they designed to test semantic understanding of navigation instructions. |

Experiments | We evaluate our new lexicon learning algorithm as well as the other modifications to the navigation system using the same three tasks as Chen and Mooney (2011). |

Experiments | Here SGOLL has a decidedly large advantage over the lexicon learning algorithm from Chen and Mooney, requiring an order of magnitude less time to run. |

Experiments | We have introduced a novel, online lexicon learning algorithm that is much faster than the one proposed by Chen and Mooney and also performs better on the navigation tasks they devised. |

Introduction | Their lexicon learning algorithm finds the common connected subgraph that occurs with a word by taking intersections of the graphs that represent the different contexts in which the word appears. |

Introduction | In addition to the new lexicon learning algorithm , we also look at modifying the meaning representation grammar (MRG) for their formal semantic language. |

Online Lexicon Learning Algorithm | In addition to introducing a new lexicon learning algorithm , we also made another modification to the original system proposed by Chen and Mooney (2011). |

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- semantic parser (17)
- natural language (11)
- Mechanical Turk (10)

Discussion | extension of the model and learning algorithm to word sequences and (2) feature functions that relate acoustic measurements to sub-word units. |

Experiments | We compare the CRF4, Passive-Aggressive (PA), and Pegasos learning algorithms . |

Experiments | 4We use the term “CRF” since the learning algorithm corresponds to CRF learning, although the task is multiclass classification rather than a sequence or structure prediction task. |

Experiments | Models labeled X/Y use learning algorithm X and feature set Y. |

Problem setting | In the next section we present a learning algorithm that aims to minimize the expected zero-one loss. |

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- n-gram (8)
- CRF (7)
- learning algorithm (7)

Abstract | Many discriminative learning algorithms are sensitive to such shifts because highly indicative features may swamp other indicative features. |

Abstract | Regularized and adversarial learning algorithms have been proposed to be more robust against covariate shifts. |

Abstract | We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets. |

Conclusion | We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches. |

Experiments | All learning algorithms do the same number of passes over each training data set. |

Introduction | Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed. |

Introduction | Most discriminate learning algorithms only update parameters when training examples are misclassified. |

learning algorithm is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- perceptron (10)
- learning algorithms (7)
- POS tagging (7)

Abstract | By combining a supervised large margin loss with an unsupervised least squares loss, a dis-criminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems. |

Conclusion and Future Work | Unlike previous proposed approaches, we introduce a convex objective for the semi-supervised learning algorithm by combining a convex structured SVM loss and a convex least square loss. |

Introduction | Supervised learning algorithms still represent the state of the art approach for inferring dependency parsers from data (McDonald et al., 2005a; McDonald and Pereira, 2006; Wang et al., 2007). |

Introduction | Unfortunately, although significant recent progress has been made in the area of semi-supervised learning, the performance of semi-supervised learning algorithms still fall far short of expectations, particularly in challenging real-world tasks such as natural language parsing or machine translation. |

Introduction | The basic idea is to bootstrap a supervised learning algorithm by alternating between inferring the missing label information and retraining. |

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- semi-supervised (39)
- dependency parsing (20)
- unlabeled data (18)

Experimental Setup | 5.3 Parameters and Learning Algorithm |

Experimental Setup | Selection of learning algorithm and its algorithm-specific parameters were done as follows. |

Experimental Setup | Since each dataset has only 140 examples, the computation time of each learning algorithm is negligible. |

Introduction | The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms . |

Related Work | Various learning algorithms have been used for relation classification. |

Related Work | Freely available tools like Weka (Witten and Frank, 1999) allow easy experimentation with common learning algorithms (Hendrickx et al., 2007). |

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- WordNet (13)
- semantic relationships (8)
- learning algorithms (6)

Dependency Parser | Thus our system uses an unbounded number of transition relations, which has an apparent disadvantage for learning algorithms . |

Model and Training | In this section we introduce the adopted learning algorithm and discuss the model parameters. |

Model and Training | 4.1 Learning Algorithm |

Model and Training | Algorithm 2 Learning Algorithm |

learning algorithm is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- dependency tree (11)
- dependency parsing (10)
- learning algorithm (6)

Abstract | We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). |

Deriving Empirical Estimates | We now state a PAC-style theorem for the learning algorithm . |

Introduction | Recent work has introduced polynomial-time learning algorithms (and consistent estimation methods) for two important cases of hidden-variable models: Gaussian mixture models (Dasgupta, 1999; Vempala and Wang, 2004) and hidden Markov models (Hsu et al., 2009). |

Proofs | Figure 4: The spectral learning algorithm . |

Related Work | (2011) consider spectral learning algorithms of tree-structured directed bayes nets. |

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- learning algorithm (5)
- feature vectors (3)
- SVD (3)

Abstract | Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms . |

Introduction | First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings. |

Learning QA Matching Models | A binary classifier can be trained easily using any machine learning algorithm in this standard supervised learning setting. |

Learning QA Matching Models | We tested two learning algorithms in this setting: logistic regression and boosted decision trees (Friedman, 2001). |

Learning QA Matching Models | The former is the log-linear model widely used in the NLP community and the latter is a robust nonlinear learning algorithm that has shown great empirical performance. |

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- lexical semantic (27)
- WordNet (11)
- bag-of-words (10)

Entity-mention Model with ILP | However, normal machine learning algorithms work on attribute-value vectors, which only allows the representation of atomic proposition. |

Entity-mention Model with ILP | This requirement motivates our use of Inductive Logic Programming (ILP), a learning algorithm capable of inferring logic programs. |

Experiments and Results | Default parameters were applied for all the other settings in ALEPH as well as other learning algorithms used in the experiments. |

Introduction | Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms . |

Modelling Coreference Resolution | Based on the training instances, a binary classifier can be generated using any discriminative learning algorithm . |

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- coreference (37)
- ILP (24)
- coreference resolution (22)

Discussion | Each model describes the diachronic, population-level consequences of assuming a particular learning algorithm for individuals. |

Discussion | By using simple models, we were able to consider a range of learning algorithms corresponding to different explanations for the observed diachronic dynamics. |

Modeling preliminaries | This setting allows us to determine the diachronic, population-level consequences of assumptions about the learning algorithm used by individuals, as well as assumptions about population structure or the input they receive. |

Models | We now describe 5 DS models, each corresponding to a learning algorithm A used by individual language learners. |

Models | The models differ along two dimensions, corresponding to assumptions about the learning algorithm (A): whether or not it is assumed that the stress of examples is possibly mistransmitted (Models 1, 3, 5), and how the N and V probabil- |

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Abstract | We present an efficient approximate approach for learning this environment model as part of a policy-gradient reinforcement learning algorithm for text interpretation. |

Algorithm | The learning algorithm is provided with a set of documents d E D, an environment in which to execute command sequences 5’, and a reward function The goal is to estimate two sets of parameters: 1) the parameters 6 of the policy function, and 2) the partial environment transition model q(5’ |5 , c), which is the observed portion of the true model 19(5’ |5, c). |

Introduction | Our method efficiently achieves both of these goals as part of a policy-gradient reinforcement learning algorithm . |

Related Work | Interpreting Instructions Our approach is most closely related to the reinforcement learning algorithm for mapping text instructions to commands developed by Branavan et al. |

Related Work | We address this limitation by expanding a policy learning algorithm to take advantage of a partial environment model estimated during learning. |

learning algorithm is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- learning algorithm (5)
- log-linear (3)

Abstract | In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. |

Character Classification Model | The classifier can be trained with online learning algorithms such as perceptron, or offline learning models such as support vector machines. |

Conclusion and Future Work | This work presents a novel discriminative learning algorithm to utilize the knowledge in the massive natural annotations on the Internet. |

Introduction | In this work we take for example a most important problem, word segmentation, and propose a novel discriminative learning algorithm to leverage the knowledge in massive natural annotations of web text. |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- word segmentation (41)
- Chinese word (10)
- Chinese word segmentation (10)

Abstract | We use aligned subsequences as features for machine learning algorithms in order to infer rules for linguistic changes undergone by words when entering new languages and to discriminate between cognates and non-cognates. |

Conclusions and Future Work | and Waterman, 1981), and other learning algorithms for discriminating between cognates and non-cognates. |

Our Approach | Therefore, because the edit distance was widely used in this research area and produced good results, we are encouraged to employ orthographic alignment for identifying pairs of cognates, not only to compute similarity scores, as was previously done, but to use aligned subsequences as features for machine learning algorithms . |

Our Approach | 3.3 Learning Algorithms |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- edit distance (5)
- SVM (5)
- learning algorithms (4)

System Architecture | ADF learning algorithm : procedure ADF(q, c, a, [3) |

System Architecture | Figure 1: The proposed ADF online learning algorithm . |

System Architecture | Prior work on convergence analysis of existing online learning algorithms (Murata, 1998; Hsu et |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- word segmentation (23)
- Chinese word (10)
- Chinese word segmentation (10)

Implementation | The learning algorithm is applied in a shift-reduce parser, where the training data consists of the (unique) list of shift and reduce operations required to produce the gold RST parses. |

Introduction | Alternatively, our approach can be seen as a nonlinear learning algorithm for incremental structure prediction, which overcomes feature sparsity through effective parameter tying. |

Large-Margin Learning Framework | of our learning algorithm are different. |

Large-Margin Learning Framework | Algorithm 1 Mini-batch learning algorithm Input: Training set D, Regularization parameters A and 7', Number of iteration T, Initialization matrix A0, and Threshold 5 whilet = l,...,Tdo |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- discourse parsing (19)
- shift-reduce (16)
- EDUs (10)

Feature Weighting Methods | ing, better classification and regression models can be built by using the feature weights generated by these models as a pre-weight on the data points for other machine learning algorithms . |

Related Work | Noise tolerance techniques aim to improve the learning algorithm itself to avoid over-fitting caused by mislabeled instances in the training phase, so that the constructed classifier becomes more noise-tolerant. |

Related Work | Decision tree (Mingers, 1989; Vannoorenberghe and Denoeux, 2002) and boosting (Jiang, 2001; Kalaia and Servediob, 2005; Karmaker and Kwek, 2006) are two learning algorithms that have been investigated in many studies. |

Related Work | For example, useful information can be removed with noise elimination, since annotation errors are likely to occur on ambiguous instances that are potentially valuable for learning algorithms . |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- ground truth (9)
- F1 Score (8)
- feature weighting (7)

Abstract | This section first introduces the fundamental supervised learning method, and then describes a baseline active learning algorithm . |

Abstract | 3.2 Active Learning Algorithm |

Abstract | 4.4 Bilingual Active Learning Algorithm |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- relation instances (19)
- relation extraction (16)
- parallel corpora (11)

Introduction | Over the past decade, supervised learning algorithms have gained widespread acceptance in natural language processing (NLP). |

Introduction | The learning algorithm then optimizes a regularized, convex objective function that is expressed in terms of these features. |

Introduction | However, the supervised learning algorithms can typically identify useful clusters and assign proper weights to them, effectively adapting the clusters to the domain. |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- feature vectors (13)
- CoNLL (11)
- named entity (11)

Introduction | However, most learning algorithms operate under assumption that the learning data originates from the same distribution as the test data, though in practice this assumption is often violated. |

Introduction | We explain how the introduced regularizer can be integrated into the stochastic gradient descent learning algorithm for our model. |

Learning and Inference | In this section we describe an approximate learning algorithm based on the mean-field approximation. |

Learning and Inference | Though we believe that our approach is independent of the specific learning algorithm , we provide the description for completeness. |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- latent variables (23)
- semi-supervised (13)
- unlabeled data (12)

Abstract | Recently, researchers have developed multi-instance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO—of (Jobs, Apple). |

Conclusion | Since the processs of matching database tuples to sentences is inherently heuristic, researchers have proposed multi-instance learning algorithms as a means for coping with the resulting noisy data. |

Learning | We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7". |

Modeling Overlapping Relations | Figure 2: The MULTIR Learning Algorithm |

learning algorithm is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- sentence-level (17)
- entity mentions (10)
- precision and recall (6)

Experimental Setup | Training We obtained phrase-based salience scores using a supervised machine learning algorithm . |

Modeling | We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details). |

Modeling | Let fi denote the salience score for phrase i, determined by the machine learning algorithm , and li is its length in tokens. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- ILP (33)
- sentence compression (8)
- phrase-based (7)

Related Work | Our work builds on one such approach — SampleRank (Wick et al., 2011), a sampling-based learning algorithm . |

Sampling-Based Dependency Parsing with Global Features | We begin with the notation before addressing the decoding and learning algorithms . |

Sampling-Based Dependency Parsing with Global Features | ure 4 summarizes the learning algorithm . |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- scoring function (29)
- POS tags (19)
- reranker (16)

Smoothing Natural Language Sequences | Formally, we define the smoothing task as follows: let D = {(X, z)|x is a word sequence, z is a label sequence} be a labeled dataset of word sequences, and let M be a machine learning algorithm that will learn a function f to predict the correct labels. |

Smoothing Natural Language Sequences | As an example, consider the string “Researchers test reformulated gasolines on newer engines.” In a common dataset for NP chunking, the word “reformulated” never appears in the training data, but appears four times in the test set as part of the NP “reformulated gasolines.” Thus, a learning algorithm supplied with word-level features would |

Smoothing Natural Language Sequences | In particular, we seek to represent each word by a distribution over its contexts, and then provide the learning algorithm with features computed from this distribution. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- distributional representations (18)
- POS tagging (16)
- CRF (11)

Introduction | Implicitly, the weight learning algorithm can be seen as a gradient descent procedure minimizing the difference between the scores of highest scoring (Viterbi) state sequences, and the label state sequences. |

Introduction | Pseudocode of the learning algorithm for the partially labeled case is given in Algorithm 1. |

Introduction | We see that while all three learning algorithms perform better than the baseline, the performance of the purely unsupervised system is inferior to supervised approaches. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- segmentation model (17)
- embeddings (8)
- distributional semantics (7)

Related Works | However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly. |

Related Works | Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space. |

Related Works | For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm , which is called deep transfer, for transferring knowledge between biological domains and Web domains. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- feature space (15)
- co-occurrence (13)
- latent variables (12)

Comparing the two Datasets | Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation. |

Comparing the two Datasets | Table 3: Performance of several machine learning algorithms on the Portuguese data for the TempEval-1 tasks. |

Introduction | The results of machine learning algorithms over the data thus obtained are compared to those reported for the English TempEval-l competition. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- machine learning (8)
- learning algorithms (3)
- SVM (3)

Add arc <eC,ej> to GC with | As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data. |

Add arc <eC,ej> to GC with | Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al. |

Add arc <eC,ej> to GC with | We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- discourse parsing (23)
- EDUs (23)
- dependency trees (22)

Extraction with Lexicons | A learning algorithm expands the seed phrases into a set of lexicons. |

Extraction with Lexicons | The semantic lexicons are added as features to the CRF learning algorithm . |

Introduction | When learning an extractor for relation R, LUCHS extracts seed phrases from R’s training data and uses a semi-supervised learning algorithm to create several relation-specific lexicons at different points on a precision-recall spectrum. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- CRF (10)
- F1 score (8)
- overfitting (6)

Introduction | In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data. |

Introduction | effort to explore graph-based semi-supervised learning algorithms for the wikification task. |

Semi-supervised Graph Regularization | We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003): |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- semantic relatedness (16)
- semi-supervised (16)
- coreferential (15)

Introduction | We then present the OntoUSP Markov logic network and the inference and learning algorithms used with it. |

Unsupervised Ontology Induction with Markov Logic | Finally, we describe the learning algorithm and how OntoUSP induces the ontology while learning the semantic parser. |

Unsupervised Ontology Induction with Markov Logic | Algorithm 2 gives pseudo-code for OntoUSP’s learning algorithm . |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- semantic parses (27)
- dependency trees (7)
- logical form (5)

Introduction | The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm . |

Introduction | After background discussion on learning in SMT (§2), we introduce a novel online learning algorithm for relative margin maximization suitable for SMT (§3). |

The Relative Margin Machine in SMT | We address the above-mentioned limitations by introducing a novel online learning algorithm for relative margin maximization, RM. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- feature set (25)
- BLEU (18)
- TER (11)

Experiments | The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner. |

Introduction | 0 Using the same learning algorithm and features as TextRunner, we compare four different ways to generate positive and negative training data with TextRunner’s method, concluding that our Wikipedia heuristic is responsible for the bulk of WOE’s improved accuracy. |

Wikipedia-based Open IE | WOEPOS uses the same learning algorithm and selection of features as TextRunner: a two-order CRF chain model is trained with the Mallet package (McCallum, 2002). |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- CRF (9)
- dependency path (7)
- POS tags (7)

Analysis | We next investigate the features that were given high weight by our learning algorithm (in the constituent parsing case). |

Web-count Features | A learning algorithm can then weight features so that they compare appropriately |

Web-count Features | As discussed in Section 5, the top features learned by our learning algorithm duplicate the handcrafted configurations used in previous work (Nakov and Hearst, 2005b) but also add numerous others, and, of course, apply to many more attachment types. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- n-grams (12)
- dependency parsing (11)
- Berkeley parser (10)

Experimental Evaluation | (b) DIRT: (Lin and Pantel, 2001) a widely-used rule learning algorithm . |

Experimental Evaluation | (c) BInc: (Szpektor and Dagan, 2008) a directional rule learning algorithm . |

Learning Typed Entailment Graphs | Our learning algorithm is composed of two steps: (1) Given a set of typed predicates and their instances extracted from a corpus, we train a (local) entailment classifier that estimates for every pair of predicates whether one entails the other. |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- ILP (31)
- distributional similarity (7)
- WordNet (6)

Background | Topic modeling is an unsupervised learning algorithm that can automatically discover themes of a document collection. |

Background | Text classification is a supervised learning algorithm where documents’ categories are learned from pre-labeled set of documents. |

Experiments | Weka was used to conduct classification which is a collection of machine learning algorithms for data mining tasks written in Java (Hall et al., 2009). |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- topic model (23)
- SVM (14)
- LDA (6)

Conclusion | We have proposed new approaches to characterize verb classes in learning algorithms . |

Conclusion | The advantage of kernel methods is that they can be directly used in some learning algorithms , e.g., SVMs, to train verb classifiers. |

Introduction | Note that their coding in learning algorithms is rather complex: we need to take into account syntactic structures, which may require an exponential number of syntactic features (i.e., all their possible substructures). |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- tree kernels (9)
- IM (5)
- semantic similarity (4)

Introduction | From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms . |

The Entity Grid Model | To make this representation accessible to machine learning algorithms , Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences. |

The Entity Grid Model | (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008). |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- coreference (19)
- graph-based (19)
- coreference resolution (15)

Experiments | Instead of using Mallet (McCallum, 2002) as a machine learning toolkit, we employ the Weka Data Mining Software (Hall et al., 2009) for classification, since it offers a wider range of state-of-the-art machine learning algorithms . |

Experiments | Learning Algorithms |

Experiments | We evaluated several learning algorithms from the Weka toolkit with respect to their performance on |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- feature set (7)
- text classification (4)
- Cosine Similarity (3)

Abstract | Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm . |

Introduction | On a dataset of 917 questions taken from 81 domains of the Freebase database, a standard learning algorithm for semantic parsing yields a parser with an F1 of 0.21, in large part because of the number of logical symbols that appear during testing but never appear during training. |

Previous Work | Owing to the complexity of the general case, researchers have resorted to defining standard similarity metrics between relations and attributes, as well as machine learning algorithms for learning and predicting matches between relations (Doan et al., 2004; Wick et al., 2008b; Wick et al., 2008a; Nottelmann and Straccia, 2007; Berlin and Motro, 2006). |

learning algorithm is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- semantic parser (48)
- logical form (9)
- natural language (7)