Abstract | This makes supervised machine learning difficult, through a combination of noisy features and unbalanced class distributions. |
Background | We ground this paper’s discussion of machine learning with a real problem, turning to the annotation of empowerment language in chatl. |
Background | Users, of course, do not express empowerment in every thread in which they participate, which leads to a challenge for machine learning . |
Conclusion | Our experiments show that this model significantly improves machine learning performance. |
Introduction | While machine learning is highly effective for annotation tasks with relatively balanced labels, such as sentiment analysis (Pang and Lee, 2004), more complex social functions are often rarer. |
Introduction | We propose adaptations to existing machine learning algorithms which improve recognition of rare annotations in conversational text data. |
Introduction | We introduce the domain of empowerment in support contexts, along with previous studies on the challenges that these annotations (and similar others) bring to machine learning . |
Prediction | This approach is designed to bias the prediction of our machine learning algorithms in favor of minority classes in a coherent manner. |
Abstract | By evaluating our model on the TempEval data we show that this approach leads to about 2% higher accuracy for all three types of relations —and to the best results for the task when compared to those of other machine learning based systems. |
Introduction | With the introduction of the TimeBank corpu (Pustejovsky et al., 2003), a set of documents an notated with temporal information, it became pos sible to apply machine learning to temporal order ing (Boguraev and Ando, 2005; Mani et al., 2006} These tasks have been regarded as essential fo complete document understanding and are usefu for a wide range of NLP applications such as ques tion answering and machine translation. |
Introduction | First, it allows us to use off-the-shelf machine learning software that, up until now, has been mostly focused on the case of local classifiers. |
Introduction | Hence, in our future work we can focus entirely on temporal relations, as opposed to inference or learning techniques for machine learning . |
Markov Logic | It has long been clear that local classification alone cannot adequately solve all prediction problems we encounter in practice.5 This observation motivated a field within machine learning , often referred to as Statistical Relational Learning (SRL), which focuses on the incorporation of global correlations that hold between statistical variables (Getoor and Taskar, 2007). |
Results | Note that all but the strict scores of Task C are achieved by WVALI (Puscasu, 2007), a hybrid system that combines machine learning and hand-coded rules. |
Temporal Relation Identification | With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible. |
Temporal Relation Identification | Here one could argue that “the introduction of the TimeBank” may OVERLAP with “Machine learning becoming possible” because “introduction” can be understood as a process that is not finished with the release of the data but also includes later advertisements and announcements. |
Comparing the two Datasets | The authors’ objectives were to see “whether a ‘lite’ approach of this kind could yield reasonable performance, before pursuing possibilities that relied on ‘deeper’ NLP analysis methods”, “which of the features would contribute positively to system performance” and “if any [ machine learning ] approach was better suited to the TempEval tasks |
Comparing the two Datasets | For us, the results of (Hepple et al., 2007) are interesting as they allow for a straightforward evaluation of our adaptation efforts, since the same machine learning implementations can be used with the Portuguese data, and then compared to their results. |
Comparing the two Datasets | Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation. |
Introduction | Supervised machine learning approaches are pervasive in the tasks of temporal information processing. |
Introduction | Even when the best performing systems in these competitions are symbolic, there are machine learning solutions with results close to their performance. |
Introduction | In the TERN2004 competition (aimed at identifying and normalizing temporal expressions), a symbolic system performed best, but since then machine learning solutions, such as (Ahn et al., 2007), have appeared that obtain similar results. |
Domain Adaptation in Sentiment Research | Most text-level sentiment classifiers use standard machine learning techniques to learn and select features from labeled corpora. |
Domain Adaptation in Sentiment Research | There are two alternatives to supervised machine learning that can be used to get around this problem: on the one hand, general lists of sentiment clues/features can be acquired from domain-independent sources such as dictionaries or the Internet, on the other hand, unsupervised and weakly-supervised approaches can be used to take advantage of a small number of annotated in-domain examples and/or of unlabelled in-domain data. |
Domain Adaptation in Sentiment Research | On other domains, such as product reviews, the performance of systems that use general word lists is comparable to the performance of supervised machine learning approaches (Gamon and Aue, 2005). |
Experiments | results depends on the genre and size of the n-gram: on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025). |
Integrating the Corpus-based and Dictionary-based Approaches | For this reason, the numbers reported for the corpus-based classifier do not reflect the full potential of machine learning approaches when sufficient in-domain training data is available. |
Introduction | One of the emerging directions in NLP is the development of machine learning methods that perform well not only on the domain on which they were trained, but also on other domains, for which training data is not available or is not sufficient to ensure adequate machine learning . |
Introduction | For example, Osborne (2002) evaluates noise tolerance of shallow parsers, with random classification noise taken to be “crudely approximating annotation errors.” It has been shown, both theoretically and empirically, that this type of noise is tolerated well by the commonly used machine learning algorithms (Cohen, 1997; Blum et al., 1996; Osborne, 2002; Reidsma and Carletta, 2008). |
Introduction | When training data comes from one annotator and test data from another, the first annotator’s biases are sometimes systematic enough for a machine learner to pick them up, with detrimental results for the algorithm’s performance on the test data. |
Introduction | 1The different biases might not amount to much in the small doubly annotated subset, resulting in acceptable inter-annotator agreement; yet when enacted throughout a large number of instances they can be detrimental from a machine learner’s perspective. |
Experiments | In particular, the use of SVMs in (Pang et al., 2002) initially sparked interest in using machine learning methods for sentiment classification. |
Introduction | These methodologies are likely to be rooted in natural language processing and machine learning techniques. |
Introduction | Automatically classifying the sentiment expressed in a blog around selected topics of interest is a canonical machine learning task in this discussion. |
Introduction | However, the treatment of such dictionaries as forms of prior knowledge that can be incorporated in machine learning models is a relatively less explored topic; even lesser so in conjunction with semi-supervised models that attempt to utilize un- |
Related Work | In this section, we briskly cover related work to position our contributions appropriately in the sentiment analysis and machine learning literature. |
Related Work | In this regard, our model brings two interrelated but distinct themes from machine learning to bear on this problem: semi-supervised learning and learning from labeled features. |
Related Work | Most work in machine learning literature on utilizing labeled features has focused on using them to generate weakly labeled examples that are then used for standard supervised learning: (Schapire et al., 2002) propose one such framework for boosting logistic regression; (Wu and Srihari, 2004) build a modified SVM and (Liu et al., 2004) use a combination of clustering and EM based methods to instantiate similar frameworks. |
Abstract | We combine several graph alignment features with lexical semantic similarity measures using machine learning techniques and show that the student answers can be more accurately graded than if the semantic measures were used in isolation. |
Answer Grading System | We define a total of 68 features to be used to train our machine learning system to compute node-node (more specifically, subgraph-subgraph) matches. |
Introduction | In this paper, we explore the possibility of improving upon existing bag-of-words (BOW) approaches to short answer grading by utilizing machine learning techniques. |
Introduction | First, to what extent can machine learning be leveraged to improve upon existing approaches to short answer grading. |
Related Work | A later implementation of the Oxford-UCLES system (Pulman and Sukkarieh, 2005) compares several machine learning techniques, including inductive logic programming, decision tree learning, and Bayesian learning, to the earlier pattern matching approach, with encouraging results. |
Results | Before applying any machine learning techniques, we first test the quality of the eight graph alignment features 2pc; (A1, A8) independently. |
Conclusion | We have conducted exhaustive evaluation with multiple machine learning classifiers and different features sets spanning from lexical information to psychological categories developed by (Tausczik and Pennebaker, 2010). |
Related Work | Multiple techniques have been employed, from various machine learning classifiers, to clustering and topic models. |
Task A: Polarity Classification | We tested five different machine learning algorithms such as Nave Bayes, SVM with polynomial kernel, SVM with RBF kernel, AdaBoost and Stacking, out of which AdaBoost performed the best. |
Task A: Polarity Classification | For our metaphor polarity task, we use LIWC’s statistics of all 64 categories and feed this information as features for the machine learning classifiers. |
Task A: Polarity Classification | To summarize, in this section we have defined the task of polarity classification and we have presented a machine learning solution. |
Task B: Valence Prediction | The learned lessons from this study are: (l) valence prediction is a much harder task than polarity classification both for human annotation and for the machine learning algorithms; (2) the obtained results showed that despite its difficulty this is still a plausible problem; (3) similarly to the polarity classification task, valence prediction with LIWC is improved when shorter contexts (the metaphor/source/target information source) are considered. |
A Machine Learning based approach | We now propose a machine learning based approach to detect thwarting in documents. |
Abstract | In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. |
Abstract | We show that machine learning with annotated corpora (thwarted/non-thwarted) is more effective than the rule based system. |
Introduction | In section 5 we discuss a machine learning based approach which could be used to identify whether a document is thwarted or not. |
Results | Table 1 shows the results for the experiments with the machine learning model. |
Results | Table 1: Results of the machine learning based approach to thwarting detection |
Introduction | To detect and correct grammatical errors, two different approaches are typically used — knowledge engineering or machine learning . |
Introduction | In contrast, the machine learning approach formulates the task as a classification problem based on learning from training data. |
Introduction | On the other hand, the machine learning approach can learn from texts written by ESL learners where grammatical errors have been annotated. |
Related Work | As such, the machine learning approach has become the dominant approach in grammatical error correction. |
Related Work | Previous work in the machine learning approach typically formulates the task as a classification problem. |
Abstract | In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication. |
Abstract | Then, we apply machine learning to train classifiers with groups of these n-grams as features. |
Abstract | Our approach is corpus-driven like the Na'ive Bayes approach, but we interject statistically driven feature selection between the corpus and the machine learning classifiers. |
Introduction | We will argue that the automatic identification of generic expressions should be cast as a machine learning problem instead of a rule-based approach, as there is (i) no transparent marking of genericity in English (as in most other European languages) and (ii) the phenomenon is highly context dependent. |
Introduction | In this paper, we build on insights from formal semantics to establish a corpus-based machine learning approach for the automatic classification of generic expressions. |
Introduction | In our view, these observations call for a corpus-based machine learning approach that is able to capture a variety of factors indicating genericity in combination and in context. |
Introduction | In combination with machine learning methods, several statistical dependency parsing models have reached comparable high parsing accuracy (McDonald et al., 2005b; Nivre et al., 2007b). |
Parser Domain Adaptation | In recent years, two statistical dependency parsing systems, MaltParser (Nivre et al., 2007b) and MS TParser (McDonald et al., 2005b), representing different threads of research in data-driven machine learning approaches have obtained high publicity, for their state-of-the-art performances in open competitions such as CoNLL Shared Tasks. |
Parser Domain Adaptation | Granted for the differences between their approaches, both systems heavily rely on machine learning methods to estimate the parsing model from an annotated corpus as training set. |
Parser Domain Adaptation | Most of these approaches focused on the machine learning perspective instead of the linguistic knowledge embraced in the parsers. |
Discussion | Our work, however, focuses on developing a novel method which explores the relationship between machine learning model with physical world, in order to investigate these models by physical rule which describe our universe. |
Discussion | We hope our attempt will shed some light upon the application of quantum theory into the field of machine learning . |
Introduction | Some researchers have employed the principle and technology of quantum computation to improve the studies on Machine Learning (ML) (Aimeur et al., 2006; A'imeur et al., 2007; Chen et al., 2008; Gambs, 2008; Horn and Gottlieb, 2001; Nasios and Bors, 2007), a field which studies theories and constructions of systems that can learn from data, among which classification is a typical task. |
Introduction | build a computational model based on quantum computation theory to handle classification tasks in order to prove the feasibility of applying the QM model to machine learning . |
Feature Design | The proposed parsing algorithms both rely on machine learning methods. |
Feature Design | The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk. |
Parsing Models | The oracle 0 is typically defined as a machine learning classifier, which characterizes a parser configuration c in terms of a set of features. |
Parsing Models | The SCORE function is typically defined as a machine learning model that scores an edge based on a set of features. |
Introduction | For example, as shown in Figure l, with the background knowledge that both Learning and Graphical models are the topics related to Machine learning, while Machine learning is the sub domain of Computer science, a human can easily determine that the two Michael Jordan in the 15t and 4th observations represent the same person. |
Introduction | 1) Michael Jordan is a in Machine learning |
Introduction | Machine learning Probability Theory V |
The Structural Semantic Relatedness Measure | Statistics Basketball Machine learning 0.5 8 0.00 MVP 0.00 0.45 |
Introduction | Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space. |
Introduction | Various machine learning strategies have been proposed to address this problem, including semi-supervised learning (Zhu, 2007), domain adaptation (Wu and Diet-terich, 2004; Blitzer et al., 2006; Blitzer et al., 2007; Arnold et al., 2007; Chan and Ng, 2007; Daume, 2007; Jiang and Zhai, 2007; Reichart |
Introduction | To consider how heterogeneous transfer learning relates to other types of learning, Figure 1 presents an intuitive illustration of four learning strategies, including traditional machine learning , transfer learning across different distributions, multi-view learning and heterogeneous transfer learning. |
Related Works | However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly. |
Abstract | Our method appropriately inserts linefeeds into a sentence by machine learning , based on the information such as dependencies, clause boundaries, pauses and line length. |
Conclusion | Our method can insert linefeeds so that captions become easy to read, by using machine learning techniques on features such as morphemes, dependencies, clause boundaries, pauses and line length. |
Introduction | In our method, the linefeeds are inserted into only the boundaries between bunset-susl, and the linefeeds are appropriately inserted into a sentence by machine learning , based on the information such as morphemes, dependencies2, clause boundaries, pauses and line length. |
Preliminary Analysis about Linefeed Points | In our research, the points into which linefeeds should be inserted is detected by using machine learning . |
Introduction | We show that the data generated this way is highly reliable and can be used to train a machine learning algorithm. |
Language Identification | We then combine all three models in a machine learning framework using a novel approach. |
Language Identification | This way, we built a robust machine learning framework at a very low cost and without any human labour. |
Language Identification | We used the Weka Machine Learning Toolkit (Witten and Frank, 2005) to implement our DT classifier. |
Conclusion and Future Work | The basic idea is to measure the accuracy improvements of the PPI extraction task by incorporating the parser output as statistical features of a machine learning classifier. |
Evaluation Methodology | the parser output is embedded as statistical features of a machine learning classifier. |
Evaluation Methodology | Recent studies on PPI extraction demonstrated that dependency relations between target proteins are effective features for machine learning classifiers (Katrenko and Adriaans, 2006; Erkan et al., 2007; Seetre et al., 2007). |
Introduction | Our approach to parser evaluation is to measure accuracy improvement in the task of identifying protein-protein interaction (PPI) information in biomedical papers, by incorporating the output of different parsers as statistical features in a machine learning classifier (Yakushiji et al., 2005; Katrenko and Adriaans, 2006; Erkan et al., 2007; Seetre et al., 2007). |
Introduction | In every SMT system, and in machine learning in general, the goal of learning is to find a |
Introduction | Now, recent advances in machine learning have shown that the generalization ability of these learners can be improved by utilizing second order information, as in the Second Order Percep-tron (Cesa-Bianchi et al., 2005), Gaussian Margin Machines (Crammer et al., 2009b), confidence-weighted learning (Dredze and Crammer, 2008), AROW (Crammer et al., 2009a; Chiang, 2012) and Relative Margin Machines (RMM) (Shivaswamy and Jebara, 2009b). |
Introduction | Unfortunately, not all advances in machine learning are easy to apply to structured prediction problems such as SMT; the latter often involve latent variables and surrogate references, resulting in loss functions that have not been well explored in machine learning (Mcallester and Keshet, 2011; Gimpel and Smith, 2012). |
Learning in SMT | RAMPION aims to address the disconnect between MT and machine learning by optimizing a structured ramp loss with a concave-convex procedure. |
Introduction | Ensemble methods are widely used in machine learning and have been shown to be often very effective (Breiman, 1996; Freund and Schapire, 1997; Smyth and Wolpert, 1999; MacKay, 1991; Freund et al., 2004). |
Introduction | These models may have been derived using other machine learning algorithms or they may be based on |
Introduction | Variants of the ensemble problem just formulated have been studied in the past in the natural language processing and machine learning literature. |
Introduction | Later, with the release of manually annotated corpus, such as Penn Discourse Treebank 2.0 (PDTB) (Prasad et al., 2008), recent studies performed implicit discourse relation recognition on natural (i.e., genuine) implicit discourse data (Pitler et al., 2009) (Lin et al., 2009) (Wang et al., 2010) with the use of linguistically informed features and machine learning algorithms. |
Related Work | In their work, they collected word pairs from synthetic data set as features and used machine learning method to classify implicit discourse relation. |
Related Work | Multitask learning is a kind of machine learning method, which learns a main task together with |
Experiments | This paper represents one step towards the reconciliation of traditional formal approaches to compositional semantics with modern machine learning . |
Introduction | In this paper we bridge the gap between recent advances in machine learning and more traditional approaches within computational linguistics. |
Introduction | We show that this combination of state of the art machine learning and an advanced linguistic formalism translates into concise models with competitive performance on a variety of tasks. |
Abstract | Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both nonlinguistic terms and catenae selected heuristically with filters derived from work on paths. |
Introduction | We also extend previous work with development of a linguistically informed, supervised machine learning technique for selection of informative catenae. |
Introduction | We also develop a linguistically informed machine learning technique for catenae selection that captures both key aspects of heuristic filters, and novel characteristics of catenae and paths. |
Introduction | The state-of-the-art approaches for solving this problem, such as (Go et al., 20095; Barbosa and Feng, 2010), basically follow (Pang et al., 2002), who utilize machine learning based classifiers for the sentiment classification of texts. |
Related Work | According to the experimental results, machine learning based classifiers outperform the unsupervised approach, where the best performance is achieved by the SVM classifier with unigram presences as features. |
Related Work | (Go et al., 2009; Parikh and Movassate, 2009; Barbosa and Feng, 2010; Davidiv et al., 2010) all follow the machine learning based approach for sentiment classification of tweets. |
Abstract | With the help of supervised machine learning , we achieve an accuracy of .87 for this task. |
Conclusion | We have presented a machine learning system to automatically detect corresponding edit-turn-pairs. |
Machine Learning with Edit-Turn-Pairs | We used DKPro TC (Daxenberger et al., 2014) to carry out the machine learning experiments on edit-turn-pairs. |
Abstract | Many machine learning datasets are noisy with a substantial number of mislabeled instances. |
Feature Weighting Methods | ing, better classification and regression models can be built by using the feature weights generated by these models as a pre-weight on the data points for other machine learning algorithms. |
Related Work | It uses the difference between the low quality label for each data point and a prediction of the label using supervised machine learning models built upon the low quality labels. |
Copula Models for Text Regression | In NLP, many statistical machine learning methods that capture the dependencies among random variables, including topic models (Blei et al., 2003; Lafferty and Blei, 2005; Wang et al., 2012), always have to make assumptions with the underlying distributions of the random variables, and make use of informative priors. |
Datasets | This mixed form of formal statement and informal speech brought difficulties to machine learning algorithms. |
Introduction | Copula models (Schweizer and Sklar, 1983; Nelsen, 1999) are often used by statisticians (Genest and Favre, 2007; Liu et al., 2012; Masarotto and Varin, 2012) andecononfiMB(Chenandfbn,2006)u)Mudythe bivariate and multivariate stochastic dependency among random variables, but they are very new to the machine learning (Ghahramani et al., 2012; Han et al., 2012; Xiang and Neville, 2013; Lopez-paz et al., 2013) and related communities (Eick-hoff et al., 2013). |
Abstract | Most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences and often fail to capture nonlocal contextual cues that are important for sentiment interpretation. |
Introduction | machine learning algorithms with rich features and take into account the interactions between words to handle compositional effects such as polarity reversal (e.g. |
Related Work | Existing machine learning approaches for the task can be classified based on the use of two ideas. |
Introduction | From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms. |
The Entity Grid Model | To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences. |
The Entity Grid Model | (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008). |
Gaussian Process Regression | Machine learning models for quality estimation typically treat the problem as regression, seeking to model the relationship between features of the text input and the human quality judgement as a continuous response variable. |
Gaussian Process Regression | In this paper we consider Gaussian Processes (GP) (Rasmussen and Williams, 2006), a probabilistic machine learning framework incorporating kernels and Bayesian non-parametrics, widely considered state-of-the-art for regression. |
Introduction | Most empirical work in Natural Language Processing (NLP) is based on supervised machine learning techniques which rely on human annotated data of some form or another. |
Abstract | Evidence from machine learning indicates that increasing the training sample size results in better prediction. |
Introduction | This contradicts theoretical and practical evidence from machine learning that suggests that larger training samples should be beneficial to improve prediction also in SMT. |
Related Work | The focus of many approaches thus has been on feature engineering and on adaptations of machine learning algorithms to the special case of SMT (where gold standard rankings have to be created automatically). |
Abstract | We demonstrate how supervised discriminative machine learning techniques can be used to automate the assessment of ‘English as a Second or Other Language’ (ESOL) examination scripts. |
Introduction | Different techniques have been used, including cosine similarity of vectors representing text in various ways (Attali and Burstein, 2006), often combined with dimensionality reduction techniques such as Latent Semantic Analysis (LSA) (Landauer et al., 2003), generative machine learning models (Rudner and Liang, 2002), domain-specific feature extraction (Attali and Burstein, 2006), and/or modified syntactic parsers (Lonsdale and Strong-Krause, 2003). |
Introduction | We address automated assessment as a supervised discriminative machine learning problem and particularly as a rank preference problem (J oachims, 2002). |
Introduction | Proposed solutions to NER fall into three categories: 1) The rule-based (Krupka and Hausman, 1998); 2) the machine learning based (Finkel and Manning, 2009; Singh et al., 2010) ; and 3) hybrid methods (J ansche and Abney, 2002). |
Our Method | Algorithm 1 outlines our method, where: trains and twink, denote two machine learning processes to get the CRF labeler and the KNN classifier, respectively; reprw converts a word in a tweet into a bag-of-words vector; the reprt function transforms a tweet into a feature matrix that is later fed into the CRF model; the knn function predicts the class of a word; the update function applies the predicted class by KNN to the inputted tweet; the C7“ f function conducts word level NE labeling;7' and 7 represent the minimum labeling confidence of KNN and CRF, respectively, which are experimentally set to 0.1 and 0.001; N (1,000 in our work) denotes the maximum number of new accumulated training data. |
Related Work | Machine learning based systems are commonly used and outperform the rule based systems. |
Approach | We use a machine learning technique for this purpose. |
Approach | We classify the citation sentences into the five categories mentioned above using a machine learning technique. |
Approach | To determine whether a reference is part of the sentence or not, we again use a machine learning approach. |
Experimental Setup | Training We obtained phrase-based salience scores using a supervised machine learning algorithm. |
Modeling | We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details). |
Modeling | Let fi denote the salience score for phrase i, determined by the machine learning algorithm, and li is its length in tokens. |
Conclusions | plication of CRFs, which are a major advance of recent years in machine learning . |
Experimental design | As is the case with many machine learning methods, no strong guidance is available for choosing values for these parameters. |
History of automated hyphenation | Over the years, various machine learning methods have been applied to the hyphenation task. |
Adaptor Grammars | Nonparametric Bayesian inference, where the inference task involves learning not just the values of a finite vector of parameters but which parameters are relevant, has been the focus of intense research in machine learning recently. |
Introduction | Over the last few years there has been considerable interest in Bayesian inference for complex hierarchical models both in machine learning and in computational linguistics. |
Introduction | This paper establishes a theoretical connection between two very different kinds of probabilistic models: Probabilistic Context-Free Grammars (PCFGs) and a class of models known as Latent Dirichlet Allocation (Blei et al., 2003; Griffiths and Steyvers, 2004) models that have been used for a variety of tasks in machine learning . |
Entity-mention Model with ILP | However, normal machine learning algorithms work on attribute-value vectors, which only allows the representation of atomic proposition. |
Introduction | Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms. |
Modelling Coreference Resolution | Both (2) and (1) can be approximated with a machine learning method, leading to the traditional mention-pair model and the entity-mention model for coreference resolution, respectively. |
Experiments | We think this shows one of the strengths of machine learning methods such as CRFs. |
Related Work and Discussion | Parallelization has recently regained attention in the machine learning community because of the need for learning from very large sets of data. |
Related Work and Discussion | (2006) presented the MapReduce framework for a wide range of machine learning algorithms, including the EM algorithm. |
Introduction | The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms. |
Related Work | In this paper, we use these pattern clusters as the (only) source of machine learning features for a nominal relationship classification problem. |
Relationship Classification | In this method we treat the HITS measure for a cluster as a feature for a machine learning classification |
Introduction | Section 3 also shows how to automatically extract and collect counts for context patterns, and how to combine the information using a machine learned classifier. |
Related Work | In particular, note the pioneering work of Paice and Husk (1987), the inclusion of non-referential it detection in a full anaphora resolution system by Lappin and Leass (1994), and the machine learning approach of Evans (2001). |
Related Work | Although machine learned systems can flexibly balance the various indicators and contra-indicators of non-referentiality, a particular feature is only useful if it is relevant to an example in limited labelled training data. |