Introduction | 0 We describe scalable learning algorithms that induce general question templates and lexical variants of entities and relations. |
Learning | PARALEX uses a two-part learning algorithm ; it first induces an overly general lexicon (Section 5.1) and then learns to score derivations to increase accuracy (Section 5.2). |
Learning | The lexical learning algorithm constructs a lexicon L from a corpus of question paraphrases C = |
Learning | Figure 1: Our lexicon learning algorithm . |
Overview of the Approach | Learning The learning algorithm induces a lexicon L and estimates the parameters 6 of the linear ranking function. |
Overview of the Approach | The final result is a scalable learning algorithm that requires no manual annotation of questions. |
Related Work | The learning algorithms presented in this paper are similar to algorithms used for paraphrase extraction from sentence-aligned corpora (Barzilay and McKeown, 2001; Barzilay and Lee, 2003; Quirk et al., 2004; Bannard and Callison-Burch, 2005; Callison-Burch, 2008; Marton et al., 2009). |
Abstract | Many discriminative learning algorithms are sensitive to such shifts because highly indicative features may swamp other indicative features. |
Abstract | Regularized and adversarial learning algorithms have been proposed to be more robust against covariate shifts. |
Abstract | We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets. |
Conclusion | We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches. |
Experiments | All learning algorithms do the same number of passes over each training data set. |
Introduction | Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed. |
Introduction | Most discriminate learning algorithms only update parameters when training examples are misclassified. |
Dependency Parser | Thus our system uses an unbounded number of transition relations, which has an apparent disadvantage for learning algorithms . |
Model and Training | In this section we introduce the adopted learning algorithm and discuss the model parameters. |
Model and Training | 4.1 Learning Algorithm |
Model and Training | Algorithm 2 Learning Algorithm |
Abstract | Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms . |
Introduction | First, by incorporating the abundant information from a variety of lexical semantic models, the answer selection system can be enhanced substantially, regardless of the choice of learning algorithms and settings. |
Learning QA Matching Models | A binary classifier can be trained easily using any machine learning algorithm in this standard supervised learning setting. |
Learning QA Matching Models | We tested two learning algorithms in this setting: logistic regression and boosted decision trees (Friedman, 2001). |
Learning QA Matching Models | The former is the log-linear model widely used in the NLP community and the latter is a robust nonlinear learning algorithm that has shown great empirical performance. |
Abstract | In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. |
Character Classification Model | The classifier can be trained with online learning algorithms such as perceptron, or offline learning models such as support vector machines. |
Conclusion and Future Work | This work presents a novel discriminative learning algorithm to utilize the knowledge in the massive natural annotations on the Internet. |
Introduction | In this work we take for example a most important problem, word segmentation, and propose a novel discriminative learning algorithm to leverage the knowledge in massive natural annotations of web text. |
Abstract | Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm . |
Introduction | On a dataset of 917 questions taken from 81 domains of the Freebase database, a standard learning algorithm for semantic parsing yields a parser with an F1 of 0.21, in large part because of the number of logical symbols that appear during testing but never appear during training. |
Previous Work | Owing to the complexity of the general case, researchers have resorted to defining standard similarity metrics between relations and attributes, as well as machine learning algorithms for learning and predicting matches between relations (Doan et al., 2004; Wick et al., 2008b; Wick et al., 2008a; Nottelmann and Straccia, 2007; Berlin and Motro, 2006). |
Introduction | The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm . |
Introduction | After background discussion on learning in SMT (§2), we introduce a novel online learning algorithm for relative margin maximization suitable for SMT (§3). |
The Relative Margin Machine in SMT | We address the above-mentioned limitations by introducing a novel online learning algorithm for relative margin maximization, RM. |
Experiments | Instead of using Mallet (McCallum, 2002) as a machine learning toolkit, we employ the Weka Data Mining Software (Hall et al., 2009) for classification, since it offers a wider range of state-of-the-art machine learning algorithms . |
Experiments | Learning Algorithms |
Experiments | We evaluated several learning algorithms from the Weka toolkit with respect to their performance on |
Introduction | From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms . |
The Entity Grid Model | To make this representation accessible to machine learning algorithms , Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences. |
The Entity Grid Model | (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008). |
Background | Topic modeling is an unsupervised learning algorithm that can automatically discover themes of a document collection. |
Background | Text classification is a supervised learning algorithm where documents’ categories are learned from pre-labeled set of documents. |
Experiments | Weka was used to conduct classification which is a collection of machine learning algorithms for data mining tasks written in Java (Hall et al., 2009). |