Abstract | Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. |
Analysis | We have presented a novel generative model for bilingual lexicon induction and presented results under a variety of data conditions (section 6.1) and languages (section 6.3) showing that our system can produce accurate lexicons even in highly adverse conditions. |
Bilingual Lexicon Induction | 2.1 Generative Model |
Bilingual Lexicon Induction | We propose the following generative model over matchings m and word types (5,13), which we call matching canonical correlation analysis (MCCA). |
Conclusion | We have presented a generative model for bilingual lexicon induction based on probabilistic CCA. |
Introduction | We define a generative model over (1) a source lexicon, (2) a target lexicon, and (3) a matching between them (section 2). |
Comparison With Related Work | The training set is very small, and it is a known fact that generative models tend to work better for small datasets and discriminative models tend to work better for larger datasets (Ng and Jordan, 2002). |
Experiments | For both WSJ 15 and WSJ40, we trained a generative model ; a discriminative model, which used lexicon features, but no grammar features other than the rules themselves; and a feature-based model which had access to all features. |
Experiments | The discriminatively trained generative model (discriminative in Table 3) took approximately 12 minutes per pass through the data, while the feature-based model (feature-based in Table 3) took 35 minutes per pass through the data. |
Experiments | In Figure 3 we show for an example from section 22 the parse trees produced by our generative model and our feature-based discriminative model, and the correct parse. |
Introduction | Although they take much longer to train than generative models , they typically produce higher performing systems, in large part due to the ability to incorporate arbitrary, potentially overlapping features. |
Abstract | A generative model of mention generation is used to guide mention resolution. |
Introduction | Our principal contributions are the approaches we take to evidence generation (leveraging three ways of linking to other emails where evidence might be found: reply chains, social interaction, and topical similarity) and our approach to choosing among candidates (based on a generative model of reference production). |
Related Work | Similarly, approaches in unstructured data (e.g., text) have involved using clustering techniques over biographical facts (Mann and Yarowsky, 2003), within-document resolution (Blume, 2005), and dis-criminative unsupervised generative models (Li et al., 2005). |
A Sentence Trimmer with CRFs | To address the issue, rather than resort to statistical generation models as in the previous literature (Cohn and Lapata, 2007; Galley and McKeown, 2007), we pursue a particular rule-based approach we call a ‘dependency truncation,’ which as we will see, gives us a greater control over the form that compression takes. |
Abstract | The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. |
Conclusions | This paper introduced a novel approach to sentence compression in Japanese, which combines a syntactically motivated generation model and CRFs, in or- |