Abstract | In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. |
Abstract | Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. |
Background and Motivation | Our approach differs from the early work, in that, we combine a generative hierarchical model and regression model to score sentences in new documents, eliminating the need for building a generative model for new document clusters. |
Experiments and Discussions | Later, we build a regression model with the same features as our HybHSum to create a summary. |
Experiments and Discussions | We keep the parameters and the features of the regression model of hierarchical HybHSum intact for consistency. |
Introduction | In this paper, we present a novel approach that formulates MDS as a prediction problem based on a two-step hybrid model: a generative model for hierarchical topic discovery and a regression model for inference. |
Introduction | We construct a hybrid learning algorithm by extracting salient features to characterize summary sentences, and implement a regression model for inference (Fig.3). |
Introduction | Our aim is to find features that can best represent summary sentences as described in § 5, — implementation of a feasible inference method based on a regression model to enable scoring of sentences in test document clusters without retraining, (which has not been investigated in generative summarization models) described in § 5.2. |
Regression Model | We build a regression model using sentence scores as output and selected salient features as input variables described below: |
Regression Model | (4), we train a regression model . |
Regression Model | Once the SVR model is trained, we use it to predict the scores of ntest number of sentences in test (unseen) document clusters, Otest = {01, “plowstl Our HybHSum captures the sentence characteristics with a regression model using sentences in different document clusters. |
Analysis and discussion | The fitted logistic regression model (black line) has a statistically significant coefficient for response entropy (p < 0.001). |
Analysis and discussion | Figure 5 plots the relationship between the response entropy and the accuracy of our decision procedure, along with a fitted logistic regression model using response entropy to predict whether our system’s inference was correct. |
Methods | A logistic regression model can capture these facts. |
Cognitively Grounded Cost Modeling | Therefore, we learn a linear regression model with time (an operationalization of annotation costs) as the dependent variable. |
Cognitively Grounded Cost Modeling | We learned a simple linear regression model with the annotation time as dependent variable and the features described above as independent variables. |
Summary and Conclusions | This optimization may include both exploration of additional features (such as domain-specific ones) as well as experimentation with other, presumably nonlinear, regression models . |