Experiments | Generative model |
Experiments | We found that the generative model gets confused by punctuation and tends to predict that periods at the end of sentences are the parents of words in the sentence. |
Experiments | We call the generic model described above “no-rules” to distinguish it from the language-specific constraints we introduce in the sequel. |
Parsing Models | We explored two parsing models: a generative model used by several authors for unsupervised induction and a discriminative model used for fully supervised training. |
Parsing Models | We also used a generative model based on dependency model with valence (Klein and Manning, 2004). |
Experiment | We observe that the features of our word generation model is more effective than those of the topic association model. |
Experiment | Among the features of the word generation model , the most improvement was achieved with BM 25, improving the MAP by 2.27%. |
Experiment | Since BM25 performs the best among the word generation models , its combination with other features was investigated. |
Term Weighting and Sentiment Analysis | 3.2.3 Word Generation Model |
Term Weighting and Sentiment Analysis | Our word generation model p(w | d) evaluates the prominence and the discriminativeness of a word |
Term Weighting and Sentiment Analysis | Therefore, we estimate the word generation model with popular IR models’ the relevance scores of a document d given 212 as a query.5 |
Abstract | We present a novel generative model that directly models the heuristic labeling process of distant supervision. |
Conclusion | Our generative model directly models the labeling process of DS and predicts patterns that are wrongly labeled with a relation. |
Experiments | Experiment 1 aimed to evaluate the performance of our generative model itself, which predicts whether a pattern expresses a relation, given a labeled corpus created with the DS assumption. |
Experiments | In our method, we trained a classifier with a labeled corpus cleaned by Algorithm 1 using the negative pattern list predicted by the generative model . |
Experiments | While our generative model does not use unlabeled examples as negative ones in detecting wrong labels, classifier-based approaches including MultiR do, suffering from false negatives. |
Generative Model | We now describe our generative model , which predicts whether a pattern expresses relation 7“ or not via hidden variables. |
Introduction | 0 To make the pattern prediction, we propose a generative model that directly models the process of automatic labeling in DS. |
Introduction | 0 Our variational inference for our generative model lets us automatically calibrate parameters for each relation, which are sensitive to the performance (see Section 6). |
Related Work | In our approach, parameters are calibrated for each relation by maximizing the likelihood of our generative model . |
Wrong Label Reduction | In the first step, we introduce the novel generative model that directly models DS’s labeling process and make the prediction (see Section 5). |
Conclusion | Experimental results show our approach discovers precise relation clusters and outperforms a generative model approach and a clustering method which does not address sense disambiguation. |
Evaluations | The generative model approach with 300 topics achieves similar precision to the hierarchical clustering approach. |
Evaluations | With more topics, the precision increases, however, the recall of the generative model is much lower than those of other approaches. |
Evaluations | The generative model approach produces more coherent clusters when the number of relation topics increases. |
Experiments | We compare our approach against several baseline systems, including a generative model approach and variations of our own approach. |
Experiments | Rel-LDA: Generative models have been successfully applied to unsupervised relation extraction (Rink and Harabagiu, 2011; Yao et al., 2011). |
Introduction | We compare our approach with several baseline systems, including a generative model approach, a clustering method that does not disambiguate between senses, and our approach with different features. |
Our Approach | The two theme features are extracted from generative models , and each is a topic number. |
Related Work | There has been considerable interest in unsupervised relation discovery, including clustering approach, generative models and many other approaches. |
Related Work | Our approach employs generative models for path sense disambiguation, which achieves better performance than directly applying generative models to unsupervised relation discovery. |
Background | The baseline generative model we use for reranking employs the unsupervised PCFG induction approach introduced by Kim and Mooney (2012). |
Background | Our proposed reranking model is used to discriminatively reorder the top parses produced by this generative model . |
Introduction | Since their system employs a generative model , discriminative reranking (Collins, 2000) could p0-tentially improve its performance. |
Introduction | By training a discriminative classifier that uses global features of complete parses to identify correct interpretations, a reranker can significantly improve the accuracy of a generative model . |
Modified Reranking Algorithm | In reranking, a baseline generative model is first trained and generates a set of candidate outputs for each training example. |
Modified Reranking Algorithm | The approach requires three subcomponents: l) a GEN function that returns the list of top n candidate parse trees for each NL sentence produced by the generative model , 2) a feature function (I) that maps a NL sentence, 6, and a parse tree, y, into a real-valued feature vector (19(6, 3/) 6 Rd, and 3) a reference parse tree that is compared to the highest-scoring parse tree during training. |
Related Work | Discriminative reranking is a common machine learning technique to improve the output of generative models . |
Reranking Features | This section describes the features (I) extracted from parses produced by the generative model and used to rerank the candidates. |
Reranking Features | Certainty assigned by the base generative model . |
Abstract | Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. |
Analysis | We have presented a novel generative model for bilingual lexicon induction and presented results under a variety of data conditions (section 6.1) and languages (section 6.3) showing that our system can produce accurate lexicons even in highly adverse conditions. |
Bilingual Lexicon Induction | 2.1 Generative Model |
Bilingual Lexicon Induction | We propose the following generative model over matchings m and word types (5,13), which we call matching canonical correlation analysis (MCCA). |
Conclusion | We have presented a generative model for bilingual lexicon induction based on probabilistic CCA. |
Introduction | We define a generative model over (1) a source lexicon, (2) a target lexicon, and (3) a matching between them (section 2). |
Our Approach | 4.2 Generative model for Solution Posts |
Our Approach | Our generative model models the reply part of a (p, r) pair (in which r is a solution) as being generated from the statistical models in {83, 73} as follows. |
Our Approach | The generative model above is similar to the proposal in (Deepak et al., 2012), adapted suitably for our scenario. |
Related Work | Translation models were also seen to be useful in segmenting incident reports into the problem and solution parts (Deepak et al., 2012); we will use an adaptation of the generative model presented therein, for our solution extraction formulation. |
Abstract | In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. |
Conclusion and Future Work | In this paper, we developed unsupervised methods based on generative models for mining refinements to online instructions from reviews. |
Introduction | Motivated by this, we propose a generative model for solving these tasks jointly without labeled data. |
Models | To identify refinements without labeled data, we propose a generative model of reviews (or more generally documents) with latent variables. |
Models | Foulds and Smyth (2011) propose a generative model for MIL in which the generation of the bag label y is conditioned on the instance labels 2. |
Related Work | We propose a generative model that makes predictions at both the review and review segment level. |
Abstract | Inspired by recent work in summarization, we propose extractive and abstractive caption generation models . |
Abstractive Caption Generation | Despite its simplicity, the caption generation model in (7) has a major drawback. |
Abstractive Caption Generation | After integrating the attachment probabilities into equation (12), the caption generation model becomes: |
Conclusions | Rather than adopting a two-stage approach, where the image processing and caption generation are carried out sequentially, a more general model should integrate the two steps in a unified framework. |
Experimental Setup | In this section we discuss our experimental design for assessing the performance of the caption generation models presented above. |
Image Annotation | It is important to note that the caption generation models we propose are not especially tied |
Comparison With Related Work | The training set is very small, and it is a known fact that generative models tend to work better for small datasets and discriminative models tend to work better for larger datasets (Ng and Jordan, 2002). |
Experiments | For both WSJ 15 and WSJ40, we trained a generative model ; a discriminative model, which used lexicon features, but no grammar features other than the rules themselves; and a feature-based model which had access to all features. |
Experiments | The discriminatively trained generative model (discriminative in Table 3) took approximately 12 minutes per pass through the data, while the feature-based model (feature-based in Table 3) took 35 minutes per pass through the data. |
Experiments | In Figure 3 we show for an example from section 22 the parse trees produced by our generative model and our feature-based discriminative model, and the correct parse. |
Introduction | Although they take much longer to train than generative models , they typically produce higher performing systems, in large part due to the ability to incorporate arbitrary, potentially overlapping features. |
Experiments | Table 2: Perplexity of several generative models on Section 0 of the WSJ. |
Experiments | Our model outperforms all other generative models , though the improvement over the 71- gram model is not statistically significant. |
Experiments | We would like to use our model to make grammaticality judgements, but as a generative model it can only provide us with probabilities. |
Introduction | We present a new, generative model specialized to transcribing printing-press era documents. |
Model | We take a generative modeling approach inspired by the overall structure of the historical printing process. |
Model | Our generative model , which is depicted in Figure 3, reflects this process. |
Related Work | In the NLP community, generative models have been developed specifically for correcting outputs of OCR systems (Kolak et al., 2003), but these do not deal directly with images. |
Results and Analysis | As noted earlier, one strength of our generative model is that we can make the values of certain pixels unobserved in the model, and let inference fill them in. |
Proposed Methods | There are many ways to construct the above mentioned three componen mod ls, i.e., the sentence generative model FED | 513 , the sentence prior model P(Sj), and the loss function L(S,.,Sj). |
Proposed Methods | 4.1 Sentence generative model |
Proposed Methods | In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w) |
Abstract | We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled. |
Discussion | Theoretically, the main difference between these two instantiations of our general model is that EEM3 makes much stronger simplifying indepence assumptions than EEM1. |
Introduction | Our aim in this paper is to define and evaluate generative models for expanding queries using external collections. |
Related Work | As will become clear in §4, Diaz and Metzler’s approach is an instantiation of our general model for external expansion. |
Related Work | We are driven by the same motivation, but where they considered rank-based result combinations and simple mixtures of query models, we take a more principled and structured approach, and develop four versions of a generative model for query expansion using external collections. |
Abstract | To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. |
Conclusion | We have presented a generative model of correspondences between a world state and an unsegmented stream of text. |
Generative Model | To learn the correspondence between a text w and a world state s, we propose a generative model p(w | s) with latent variables specifying this correspondence. |
Generative Model | We used a simple generic model of rendering string fields: Let U) be a word chosen uniformly from those in v. |
Introduction | To cope with these challenges, we propose a probabilistic generative model that treats text segmentation, fact identification, and alignment in a single unified framework. |
Conclusion | A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model . |
Introduction | We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework. |
Model | JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions. |
Model | Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary. |
Abstract | We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. |
Model | We combine these two kinds of information into a Bayesian generative model in which the code quality of the specification tree is captured by the prior probability P (t) and the feature observations are encoded in the likelihood probability P (w|t). |
Model | We assume the generative model operates by first generating the model parameters from a set of Dirichlet distributions. |
Model | 0 Generating Model Parameters: For every pair of feature type f and phrase tag 2, draw a multinomial distribution parameter 63 from a Dirichlet prior P(6§;). |
Abstract | As an illustrative case, we study a generative model for dependency parsing. |
Discussion | In principle, our branch-and-bound method can approach e-optimal solutions to Viterbi training of locally normalized generative models , including the NP-hard case of grammar induction with the DMV. |
The Constrained Optimization Task | Other locally normalized log-linear generative models (Berg-Kirkpatrick et al., 2010) would have a similar formulation. |
The Constrained Optimization Task | This generative model defines a joint distribution over the sentences and their dependency trees. |
Background and related work | They propose a pipeline architecture involving two separate generative models , one for word-segmentation and one for phonological variation. |
Background and related work | This permits us to develop a joint generative model for both word segmentation and variation which we plan to extend to handle more phenomena in future work. |
Conclusion and outlook | A major advantage of our generative model is the ease and transparency with which its assumptions can be modified and extended. |
The computational model | One of the advantages of an explicitly defined generative model such as ours is that it is straightforward to gradually extend it by adding more cues, as we point out in the discussion. |
Evaluation with Native-Speakers | With respect to H, our discriminative models achieve from 0.12 to 0.2 higher agreement than baselines, indicating that the discriminative models can generate sound distractors more effectively than generative models . |
Evaluation with Native-Speakers | The lower H on generative models may be because the distractors are semantically too close to the target (correct answer) as following examples: |
Evaluation with Native-Speakers | As a result, the quiz from generative models is not reliable since both published and issued are correct. |
Proposed Method | We rank the candidates by a generative model to consider the surrounding context (e.g. |
Background | The model takes as its starting point two probabilistic models of syntax that have been developed for CCG parsing, Hockenmaier & Steed-man’s (2002) generative model and Clark & Cur-ran’s (2007) normal-form model. |
Introduction | With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model , but not with their averaged perceptron model. |
Simple Reranking | The first one is the baseline generative model (hereafter, generative model ) used in training the averaged perceptron model. |
Simple Reranking | Simple ranking with the Berkeley parser of the generative model’s n-best realizations raised the BLEU score from 85.55 to 86.07, well below the averaged perceptron model’s BLEU score of 87.93. |
Introduction | SAGE (Eisenstein et al., 2011a), a recently proposed sparse additive generative model of language, addresses many of the drawbacks of LDA. |
Prediction Experiments | In the second experiment, in addition to the linear kernel SVM, we also compare our SME model to a state-of-the-art sparse generative model of text (Eisenstein et al., 2011a), and vary the size of input vocabulary W exponentially from 29 to the full size of our training vocabulary4. |
Prediction Experiments | In this experiment, we compare SME with a state-of-the-art sparse generative model : SAGE (Eisenstein et al., 2011a). |
Prediction Experiments | Unlike hierarchical Dirichlet processes (Teh et al., 2006), in parametric Bayesian generative models , the number of topics K is often set manually, and can influence the model’s accuracy significantly. |
Abstract | Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. |
Introduction | In other words, these approaches optimize generative models using discriminative criteria. |
Introduction | We propose a general, flexible discriminative approach to pronunciation modeling, rather than dis-criminatively optimizing a generative model . |
Introduction | For generative models , phonetic error rate of generated pronunciations (Venkataramani and Byme, 2001) and |
Abstract | Our generative model deterministically maps a POS sequence to a bracketing via an undirected |
Abstract | The complete generative model that we follow is then: |
Abstract | Our learning algorithm focuses on recovering the undirected tree based for the generative model that was described above. |
Experimental Results | Two representative methods were used as baselines: the generative model proposed by (Brill and Moore, 2000) referred to as generative and the logistic regression model proposed by (Okazaki et al., 2008) |
Experimental Results | Usually a discriminative model works better than a generative model , and that seems to be what happens with small k’s. |
Introduction | In spelling error correction, Brill and Moore (2000) proposed employing a generative model for candidate generation and a hierarchy of trie structures for fast candidate retrieval. |
Related Work | For example, Brill and Moore (2000) developed a generative model including contextual substitution rules; and Toutanova and Moore (2002) further improved the model by adding pronunciation factors into the model. |
Introduction | This is achieved by constructing a generative model that includes phrases at many levels of granularity, from minimal phrases all the way up to full sentences. |
Phrase Extraction | As has been noted in previous works, (Koehn et al., 2003; DeNero et al., 2006) exhaustive phrase extraction tends to outperform approaches that use syntax or generative models to limit phrase boundaries. |
Phrase Extraction | (2006) state that this is because generative models choose only a single phrase segmentation, and thus throw away many good phrase pairs that are in conflict with this segmentation. |
Related Work | While they take a supervised approach based on discriminative methods, we present a fully unsupervised generative model . |
Conclusion | We presented a new generative model of word lists that automatically finds cognate groups from scrambled vocabulary lists. |
Introduction | In this paper, we present a new generative model for the automatic induction of cognate groups given only (1) a known family tree of languages and (2) word lists from those languages. |
Model | In this section, we describe a new generative model for vocabulary lists in multiple related languages given the phylogenetic relationship between the languages (their family tree). |
Model | Figure 1(a) graphically describes our generative model for three Romance languages: Italian, Portuguese, and Spanish.1 In each cognate group, each word W5 is generated from its parent according to a conditional distribution with parameter (pg, which is specific to that edge in the tree, but shared between all cognate groups. |
Modeling Multiparty Discussions | These topics are part of a generative model posited to have produced a corpus. |
Modeling Multiparty Discussions | In this section, we develop SITS, a generative model of multiparty discourse that jointly discovers topics and speaker-specific topic shifts from an unannotated corpus (Figure la). |
Topic Segmentation as a Social Process | Topic segmentation approaches range from simple heuristic methods based on lexical similarity (Morris and Hirst, 1991; Hearst, 1997) to more intricate generative models and supervised methods (Georgescul et al., 2006; Purver et al., 2006; Gruber et al., 2007; Eisenstein and Barzilay, 2008), which have been shown to outperform the established heuristics. |
Abstract | This model alone improves on previous generative models by 77%. |
Introduction | Our model outperforms the generative models of previous work by 77%. |
Timestamp Classifiers | We thus begin with a bag-of-words approach, reproducing the generative model used by both de J ong (2005) and Kanhabua and Norvag (2008; 2009). |
Introduction | Some techniques that have been used are Markov Random Fields (Poon and Domingos, 2009) and Bayesian generative models (Titov and Klemen-tiev, 2011). |
Unsupervised relational pattern learning | Figure 2: Plate diagram of the generative model used. |
Unsupervised relational pattern learning | Generative model Once these collections are built, we use the generative model from Figure 2 to learn the probability that a dependency path is conveying some relation between the entities it connects. |
Abstract | We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. |
Introduction | Second, the contextualization process allows the semantic vectors to implicitly encode disambiguated word sense and syntactic information, without further adding to the complexity of the generative model . |
Related Work | Other related generative models include topic models and structured versions thereof (Blei et al., 2003; Gruber et al., 2007; Wallach, 2008). |
Enriched Two-Tiered Topic Model | Once the level and conditional path is drawn (see level generation for ETTM above) the rest of the generative model is same as TTM. |
Introduction | In this paper, we introduce a series of new generative models for multiple-documents, based on a discovery of hierarchical topics and their correlations to extract topically coherent sentences. |
Multi-Document Summarization Models | we utilize the advantages of previous topic models and build an unsupervised generative model that can associate each word in each document with three random variables: a sentence S, a higher-level topic H, and a lower-level topic T, in an analogical way to PAM models (Li and McCallum, 2006), i.e., a directed acyclic graph (DAG) representing mixtures of hierarchical structure, where super-topics are multi-nomials over subtopics at lower levels in the DAG. |
Experimental Evaluation | To investigate the generative models , we replace the two phrase translation probabilities and keep the other features identical to the baseline. |
Phrase Model Training | Additionally we consider smoothing by different kinds of interpolation of the generative model with the state-of-the-art heuristics. |
Related Work | For a generative model , (DeNero et al., 2006) gave a detailed analysis of the challenges and arising problems. |
Experiments | These similarity measures were shown to outperform the generative model of Rooth et al. |
Previous Work | On the other hand, generative models produce complete probability distributions of the data, and hence can be integrated with other systems and tasks in a more principled manner (see Sections 4.2.2 and 4.3.1). |
Topic Models for Selectional Prefs. | In the generative model for our data, each relation T has a corresponding multinomial over topics 67., drawn from a Dirichlet. |
Generative state tracking | In contrast to generative models , discriminative approaches to dialog state tracking directly predict the correct state hypothesis by leveraging discrim-inatively trained conditional models of the form (9(9) 2 P(g| f), where f are features extracted from various sources, e.g. |
Introduction | (2010); Thomson and Young (2010)) use generative models that capture how the SLU results are generated from hidden dialog states. |
Introduction | As an illustration, in Figure 1, a generative model might fail to assign the highest score to the correct hypothesis (61C) after the second turn. |
Conclusions and future work | Another potential direction for system improvement would be an integration of our generative model with Bergsma et al.’s (2008) discriminative model — this could be done in a number of ways, including using the induced classes of a topic model as features for a discriminative classifier or using the discriminative classifier to produce additional high-quality training data from noisy unparsed text. |
Introduction | Advantages of these models include a well-defined generative model that handles sparse data well, the ability to jointly induce semantic classes and predicate-specific distributions over those classes, and the enhanced statistical strength achieved by sharing knowledge across predicates. |
Results | For frequent predicate-argument pairs (Seen datasets), Web counts are clearly better; however, the BNC counts are unambiguously superior to LDA and ROOTH-LDA (whose predictions are based entirely on the generative model even for observed items) for the Seen verb-object data only. |
Abstract | In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. |
Background and Motivation | Our approach differs from the early work, in that, we combine a generative hierarchical model and regression model to score sentences in new documents, eliminating the need for building a generative model for new document clusters. |
Introduction | In this paper, we present a novel approach that formulates MDS as a prediction problem based on a two-step hybrid model: a generative model for hierarchical topic discovery and a regression model for inference. |
A Sentence Trimmer with CRFs | To address the issue, rather than resort to statistical generation models as in the previous literature (Cohn and Lapata, 2007; Galley and McKeown, 2007), we pursue a particular rule-based approach we call a ‘dependency truncation,’ which as we will see, gives us a greater control over the form that compression takes. |
Abstract | The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. |
Conclusions | This paper introduced a novel approach to sentence compression in Japanese, which combines a syntactically motivated generation model and CRFs, in or- |
Abstract | A generative model of mention generation is used to guide mention resolution. |
Introduction | Our principal contributions are the approaches we take to evidence generation (leveraging three ways of linking to other emails where evidence might be found: reply chains, social interaction, and topical similarity) and our approach to choosing among candidates (based on a generative model of reference production). |
Related Work | Similarly, approaches in unstructured data (e.g., text) have involved using clustering techniques over biographical facts (Mann and Yarowsky, 2003), within-document resolution (Blume, 2005), and dis-criminative unsupervised generative models (Li et al., 2005). |