This is achieved by constructing a generative model that includes phrases at many levels of granularity, from minimal phrases all the way up to full sentences.
As has been noted in previous works, (Koehn et al., 2003; DeNero et al., 2006) exhaustive phrase extraction tends to outperform approaches that use syntax or generative models to limit phrase boundaries.
(2006) state that this is because generative models choose only a single phrase segmentation, and thus throw away many good phrase pairs that are in conflict with this segmentation.
While they take a supervised approach based on discriminative methods, we present a fully unsupervised generative model .
Two representative methods were used as baselines: the generative model proposed by (Brill and Moore, 2000) referred to as generative and the logistic regression model proposed by (Okazaki et al., 2008)
Usually a discriminative model works better than a generative model , and that seems to be what happens with small k’s.
In spelling error correction, Brill and Moore (2000) proposed employing a generative model for candidate generation and a hierarchy of trie structures for fast candidate retrieval.
For example, Brill and Moore (2000) developed a generative model including contextual substitution rules; and Toutanova and Moore (2002) further improved the model by adding pronunciation factors into the model.
|Enriched Two-Tiered Topic Model|
Once the level and conditional path is drawn (see level generation for ETTM above) the rest of the generative model is same as TTM.
In this paper, we introduce a series of new generative models for multiple-documents, based on a discovery of hierarchical topics and their correlations to extract topically coherent sentences.
|Multi-Document Summarization Models|
we utilize the advantages of previous topic models and build an unsupervised generative model that can associate each word in each document with three random variables: a sentence S, a higher-level topic H, and a lower-level topic T, in an analogical way to PAM models (Li and McCallum, 2006), i.e., a directed acyclic graph (DAG) representing mixtures of hierarchical structure, where super-topics are multi-nomials over subtopics at lower levels in the DAG.