Abstract | One of the main obstacles to producing high quality joint models is the lack of jointly annotated data. |
Abstract | Joint modeling of multiple natural language processing tasks outperforms single-task models learned from the same data, but still under-performs compared to single-task models learned on the more abundant quantities of available single-task annotated data. |
Abstract | In this paper we present a novel model which makes use of additional single-task annotated data to improve the performance of a joint model . |
Introduction | Joint models can be particularly useful for producing analyses of sentences which are used as input for higher-level, more semantically-oriented systems, such as question answering and machine translation. |
Introduction | However, designing joint models which actually improve performance has proven challenging. |
Introduction | There have been some recent successes with joint modeling . |
Abstract | In evaluations on various highly—inflected languages, this joint model outperforms both a baseline tagger in morphological disambiguation, and a pipeline parser in head selection. |
Baselines | To ensure a meaningful comparison with the joint model , our two baselines are both implemented in the same graphical model framework, and trained with the same machine-leaming algorithm. |
Baselines | Roughly speaking, they divide up the variables and factors of the joint model and train them separately. |
Baselines | The tagger is a graphical model with the WORD and TAG variables, connected by the local factors TAG-UNIGRAM, TAG-BIGRAM, and TAG-CONSISTENCY, all used in the joint model (§3). |
Experimental Results | We compare the performance of the pipeline model (§4) and the joint model (§3) on morphological disambiguation and unlabeled dependency parsing. |
Experimental Setup | The output of the joint model is the assignment to the TAG and LINK variables. |
Experimental Setup | In principle, the joint model should consider every possible combination of morphological attributes for every word. |
Introduction | After a description of previous work (§2), the joint model (§3) will be contrasted with the baseline pipeline model (§4). |
Joint Model | If fully implemented in our joint model , these features would necessitate two separate families of link factors: 0(n3m3) factors for the POS trigrams, and 0(n2m4) factors for the POS 4-grams. |
Previous Work | Goldberg and Tsarfaty (2008) propose a generative joint model . |
Abstract | We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese. |
Abstract | Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models. |
Abstract | We also perform comparison experiments with the partially joint models . |
Introduction | Based on these observations, we aim at building a joint model that simultaneously processes word segmentation, POS tagging, and dependency parsing, trying to capture global interaction among |
Introduction | We also perform comparison experiments with partially joint models , and investigate the tradeoff between the running speed and the model performance. |
Model | (2011), we build our joint model to solve word segmentation, POS tagging, and dependency parsing within a single framework. |
Model | In our joint model , the early update is invoked by mistakes in any of word segmentation, POS tagging, or dependency parsing. |
Model | The list of the features used in our joint model is presented in Table 1, where $01—$05, W01—W21, and T01—05 are taken from Zhang and Clark (2010), and P01—P28 are taken from Huang and Sagae (2010). |
Related Works | In contrast, we built a joint model based on a dependency-based framework, with a rich set of structural features. |
Related Works | Because we found that even an incremental approach with beam search is intractable if we perform the word-based decoding, we take a character-based approach to produce our joint model . |
Abstract | This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. |
Abstract | The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. |
Corporate Acquisitions | Unfortunately, we can not directly compare against a generative joint model evaluated on this dataset (Haghighi and Klein, 2010).7 The best results per attribute are shown in boldface. |
Introduction | In this paper, we present a joint modeling and learning approach for the combined tasks of mention detection, unification, and template filling, as described above. |
Introduction | We also demonstrate, through ablation studies on the feature set, the need for joint modeling and the relative importance of the different types of joint constraints. |
Seminar Extraction Task | An important question to be addressed in evaluation is to what extent the joint modeling approach contributes to performance. |
Seminar Extraction Task | This is largely due to erroneous assignments of named entities of other types (mainly, person) as titles; such errors are avoided in the full joint model , where tuple validity is enforced. |
Seminar Extraction Task | As argued before, joint modeling is especially important for irregular fields, such as title; we provide first results on this field. |
Summary and Future Work | This approach allows for joint modeling of interdependen-cies at all levels and across fields. |
Summary and Future Work | Finally, it is worth exploring scaling the approach to unrestricted event extraction, and jointly model extracting more than one relation per document. |
Abstract | In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilingual scenario: each hidden state (POS tag) of a source-side dependency tree emits a source word together with its aligned target word, either jointly ( joint model ), or independently (independent model). |
Abstract | Our independent model gains over 1 point in BLEU by resolving the sparseness problem introduced in the joint model . |
Bilingual Infinite Tree Model | This paper proposes two types of models that differ in their processes for generating observations: the joint model and the independent model. |
Bilingual Infinite Tree Model | 3.1 Joint Model |
Bilingual Infinite Tree Model | The joint model is a simple application of the infinite tree model under a bilingual scenario. |
Introduction | We investigate two types of models: (i) a joint model and (ii) an independent model. |
Introduction | In the joint model , each hidden state jointly emits both a source word and its aligned target word as an observation. |
Related Work | Figure 4: An Example of the Joint Model |
A Joint Model with Unlabeled Parallel Text | 3.2 The Joint Model |
A Joint Model with Unlabeled Parallel Text | Since previous work (Banea et al., 2008; 2010; Wan, 2009) has shown that it could be useful to automatically translate the labeled data from the source language into the target language, we can further incorporate such translated labeled data into the joint model by adding the following component into Equation 6: |
Conclusion | In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language. |
Experimental Setup 4.1 Data Sets and Preprocessing | In our experiments, the proposed joint model is compared with the following baseline methods. |
Introduction | In Section 3, the proposed joint model is described. |
Related Work | Another notable approach is the work of Boyd-Graber and Resnik (2010), which presents a generative model --- supervised multilingual latent Dirichlet allocation --- that jointly models topics that are consistent across languages, and employs them to better predict sentiment ratings. |
Results and Analysis | We first compare the proposed joint model (Joint) with the baselines in Table 2. |
Results and Analysis | Overall, the unlabeled parallel data improves classification accuracy for both languages when using our proposed joint model and Co-SVM. |
Results and Analysis | The joint model makes better use of the unlabeled parallel data than Co-SVM or TSVMs presumably because of its attempt to jointly optimize the two monolingual models via soft (probabilistic) assignments of the unlabeled instances to classes in each iteration, instead of the hard assignments in Co-SVM and TSVMs. |
Approaches | Figure 2 shows the factor graph for this joint model . |
Discussion and Future Work | We find that we can outperform prior work in the low-resource setting by coupling the selection of feature templates based on information gain with a joint model that marginalizes over latent syntax. |
Discussion and Future Work | Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision. |
Experiments | This highlights an important advantage of the pipeline trained model: the features can consider any part of the syntax (e. g., arbitrary sub-trees), whereas the joint model is limited to those features over which it can efficiently marginalize (e.g., short dependency paths). |
Experiments | In the low-resource setting of the CoNLL-2009 Shared task without syntactic supervision, our joint model (Joint) with marginalized syntax obtains state-of-the-art results with features IGC described in § 4.2. |
Experiments | These results begin to answer a key research question in this work: The joint models outperform the pipeline models in the low-resource setting. |
Introduction | 0 Comparison of pipeline and joint models for SRL. |
Introduction | The joint models use a non-loopy conditional random field (CRF) with a global factor constraining latent syntactic edge variables to form a tree. |
Introduction | Even at the expense of no dependency path features, the joint models best pipeline-trained models for state-of-the-art performance in the low-resource setting (§ 4.4). |
Related Work | In both pipeline and joint models , we use features adapted from state-of-the-art approaches to SRL. |
Abstract | This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. |
Abstract | An inductive character-based joint model is obtained eventually. |
Introduction | In the past years, several proposed supervised joint models (Ng and Low, 2004; Zhang and Clark, 2008; Jiang et al., 2009; Zhang and Clark, 2010) achieved reasonably accurate results, but the outstanding problem among these models is that they rely heavily on a large amount of labeled data, i.e., segmented texts with POS tags. |
Method | It is directed to maximize the conditional likelihood of hidden states with the derived label distributions on unlabeled data, i.e., p(y, vlzc), where y and v are jointly modeled but |
Method | Firstly, as expected, for the two supervised baselines, the joint model outperforms the pipeline one, especially on segmentation. |
Method | This outcome verifies the commonly accepted fact that the joint model can substantially improve the pipeline one, since POS tags provide additional information to word segmentation (Ng and Low, 2004). |
Related Work | The state-of-the-art joint models include reranking approaches (Shi and Wang, 2007), hybrid approaches (Nakagawa and Uchimoto, 2007; Jiang et al., 2008; Sun, 2011), and single-model approaches (Ng and Low, 2004; Zhang and Clark, 2008; Kruengkrai et al., 2009; Zhang and Clark, 2010). |
Bottom-up tree-building | However, the major distinction between our models and theirs is that we do not jointly model the structure and the relation; rather, we use two linear- |
Bottom-up tree-building | Although joint modeling has shown to be effective in various NLP and computer Vision applications (Sutton et al., 2007; Yang et al., 2009; Wojek and Schiele, 2008), our choice of using two separate models is for the following reasons: |
Bottom-up tree-building | Then, in the tree-building process, we will have to deal with the situations where the joint model yields conflicting predictions: it is possible that the model predicts Sj = 1 and RJ- 2 NO-REL, or Vice versa, and we will have to decide which node to trust (and thus in some sense, the structure and the relation is no longer jointly modeled ). |
Related work | 2.2 Joty et al.’s joint model |
Related work | Second, they jointly modeled the structure and the relation for a given pair of discourse units. |
Related work | The strength of J oty et al.’s model is their joint modeling of the structure and the relation, such that information from each aspect can interact with the other. |
Abstract | Experiments on Automatic Content Extraction (ACE)1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system. |
Conclusions and Future Work | In addition, we aim to incorporate other IE components such as event extraction into the joint model . |
Experiments | We compare our proposed method (Joint w/ Global) with the pipelined system (Pipeline), the joint model with only local features (Joint w/ Local), and two human annotators who annotated 73 documents in ACE’OS corpus. |
Experiments | Our joint model correctly identified the entity mentions and their relation. |
Experiments | Figure 7 shows the details when the joint model is applied to this sentence. |
Introduction | This is the first work to incrementally predict entity mentions and relations using a single joint model (Section 3). |
Abstract | We present a joint model for Chinese word segmentation and new word detection. |
Abstract | We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling . |
Introduction | In this paper, we present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling of Chinese word segmentation (CWS) and new word detection (NWD). |
Introduction | While most of the state-of-the-art CWS systems used semi-Markov conditional random fields or latent variable conditional random fields, we simply use a single first-order conditional random fields (CRFs) for the joint modeling . |
Introduction | 0 We propose a joint model for Chinese word segmentation and new word detection. |
System Architecture | 3.1 A Joint Model Based on CRFs |
System Architecture | In this paper, we presented a joint model for Chinese word segmentation and new word detection. |
System Architecture | We presented new features, including word-based features and enriched edge features, for the joint modeling . |
Abstract | We describe a joint model for understanding user actions in natural language utterances. |
Background | Only recent research has focused on the joint modeling of SLU (Jeong and Lee, 2008; Wang, 2010) taking into account the dependencies at learning time. |
Background | Our joint model can discover domain D, and user’s act A as higher layer latent concepts of utterances in relation to lower layer latent semantic topics (slots) S such as named-entities (”New York”) or context bearing non-named entities (”vegan ”). |
Data and Approach Overview | Here we define several abstractions of our joint model as depicted in Fig. |
Experiments | * Tri—CRF: We used Triangular Chain CRF (J eong and Lee, 2008) as our supervised joint model baseline. |
Experiments | We evaluate the performance of our joint model on two experiments using two metrics. |
Experiments | The results show that our joint modeling approach has an advantage over the other joint models (i.e., Tri—CRF) in that it can leverage unlabeled NL utterances. |
Introduction | Recent work on SLU (Jeong and Lee, 2008; Wang, 2010) presents joint modeling of two components, i.e., the domain and slot or dialog act and slot components together. |
Abstract | Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. |
Introduction | Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window. |
Introduction | Unlike previous approaches to joint modeling (Le et al., 2012), our feature can be easily integrated into any statistical machine translation (SMT) decoder, which leads to substantially larger improvements than k-best rescoring only. |
Model Variations | Although there has been a substantial amount of past work in lexicalized joint models (Marino et al., 2006; Crego and Yvon, 2010), nearly all of these papers have used older statistical techniques such as Kneser-Ney or Maximum Entropy. |
Model Variations | This is consistent with our rescoring-only result, which indicates that k-best rescoring is too shallow to take advantage of the power of a joint model . |
Model Variations | We have described a novel formulation for a neural network-based machine translation joint model , along with several simple variations of this model. |
Neural Network Joint Model (NNJ M) | To make this a joint model , we also condition on source context vector 81-: |
Abstract | We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model. |
Abstract | We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s. |
Background and related work | However, as they point out, combining the segmentation and the variation model into one joint model is not straightforward and usual inference procedures are infeasible, which requires the use of several heuristics. |
Background and related work | They do not aim for a joint model that also handles word segmentation, however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data. |
Conclusion and outlook | We presented a joint model for word segmentation and the learning of phonological rule probabilities from a corpus of transcribed speech. |
The computational model | Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation. |
The computational model | (2009) segmentation models, exact inference is infeasible for our joint model . |
Abstract | In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation. |
Evaluation | To illustrate how the model described above can learn geographically-informed semantic representations of words, table 1 displays the terms with the highest cosine similarity to wicked in Kansas and Massachusetts after running our joint model on the full 1.1 billion words of Twitter data; while wicked in Kansas is close to other evaluative terms like evil and pure and religious terms like gods and spirit, in Massachusetts it is most similar to other intensifiers like super, ridiculously and insanely. |
Evaluation | As one concrete example of these differences between individual data points, the cosine similarity between city and seattle in the —GEO model is 0.728 (seattle is ranked as the 188th most similar term to city overall); in the INDIVIDUAL model using only tweets from Washington state, 6WA(city,seattle) = 0.780 (rank #32); and in the JOINT model , using information from the entire United States with deviations for Washington, 6WA(city, seattle) = 0858 (rank #6). |
Evaluation | While the two models that include geographical information naturally outperform the model that does not, the JOINT model generally far outperforms the INDIVIDUAL models trained on state-specific subsets of the data.1 A model that can exploit all of the information in the data, learning core vector-space representations for all words along with deviations for each contextual variable, is able to learn more geographically-informed representations for this task than strict geographical models alone. |
Model | A joint model has three a priori advantages over independent models: (i) sharing data across variable values encourages representations across those values to be similar; e.g., while city may be closer to Boston in Massachusetts and Chicago in Illinois, in both places it still generally connotes a municipality; (ii) such sharing can mitigate data sparseness for less-witnessed areas; and (iii) with a joint model , all representations are guaranteed to |
Conclusion | In addition, the joint model is efficient enough for practical use. |
Experiments | 3'Other evaluation metrics are also proposed by (Zheng et al., 2011a) which is only suitable for their system since our system uses a joint model |
Experiments | The selection of K also directly guarantees the running time of the joint model . |
Experiments | using the proposed joint model are shown in Table 3 and Table 4. |
Pinyin Input Method Model | To make typo correction better, we consider to integrate it with FTC conversion using a joint model . |
Related Works | As we will propose a joint model |
Conclusions and Future Work | A natural extension of our unified framework is to construct a joint model in which the predictions for all three tasks inform each other at all stages of the prediction process. |
Introduction | (2012) presented a joint model for inducing simple syntactic frames and VCs. |
Introduction | (2012) introduced a joint model for SCF and SP acquisition. |
Previous Work | Joint Modeling A small number of works have recently investigated joint approaches to SCFs, SPs and VCs. |
Previous Work | Although evaluation of these recent joint models has been partial, the results have been encouraging and fur- |
The Unified Framework | DPPs are particularly suitable for joint modeling as they come with various simple and intuitive ways to combine individual model kernel matrices into a joint kernel. |
Introduction | Joint models of sentence extraction and compression have a great benefit in that they have a large degree of freedom as far as controlling redundancy goes. |
Introduction | In contrast, conventional two-stage approaches (Za-jic et al., 2006), which first generate candidate compressed sentences and then use them to generate a summary, have less computational complexity than joint models . |
Introduction | Joint models can prune unimportant or redundant descriptions without resorting to enumeration. |
Joint Model of Extraction and Compression | Therefore, the joint model can extract an arbitrarily compressed sentence as a subtree without enumerating all candidates. |
Joint Model of Extraction and Compression | The joint model can remove the redundant part as well as the irrelevant part of a sentence, because the model simultaneously extracts and compresses sentences. |
Joint Model of Extraction and Compression | In this joint model , we generate a compressed sentence by extracting an arbitrary subtree from a dependency tree of a sentence. |
Abstract | In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. |
Conclusion | Our approach, which we formulate as a Two-Neighbor Orientation model, includes the joint modeling of two orientation decisions and the modeling of the maximal span of the reordered chunks through the concept of Maximal Orientation Span. |
Introduction | Then, we jointly model the orientations of chunks that immediately precede and follow the anchors (hence, the name “two-neighbor”) along with the maximal span of these chunks, to which we refer as Maximal Orientation Span (MOS). |
Introduction | To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system, which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation. |
Two-Neighbor Orientation Model | Our Two-Neighbor Orientation model (TNO) designates A C A(@) as anchors and jointly models the orientation of chunks that appear immediately to the left and to the right of the anchors as well as the identities of these chunks. |
Analysis using a joint model | According to our joint model , these effects still hold even after controlling for other features. |
Analysis using a joint model | Our joint model controls for the first two of these factors, suggesting that the third factor or some other explanation must account for the remaining differences between males and females. |
Analysis using a joint model | In the joint model , we see the same effect of pitch mean and an even stronger effect for intensity, with the predicted odds of an error dramatically higher for extreme intensity values. |
Conclusion | Using IWER, we analyzed the effects of various word-level lexical and prosodic features, both individually and in a joint model . |
Abstract | We jointly model the interplay between latent user intents that govern queries and unobserved entity types, leveraging observed signals from query formulations and document clicks. |
Conclusion | Jointly modeling the interplay between the underlying user intents and entity types in web search queries shows significant improvements over the current state of the art on the task of resolving entity types in head queries. |
Evaluation Methodology | In order to learn type distributions by jointly modeling user intents and a large number of types, we require a large set of training examples containing tagged entities and their potential types. |
Introduction | We show that jointly modeling user intent and entity type significantly outperforms the current state of the art on the task of entity type resolution in queries. |
Related Work | Our models also expand upon theirs by jointly modeling |
Introduction | We propose a novel joint event extraction algorithm to predict the triggers and arguments simultaneously, and use the structured perceptron (Collins, 2002) to train the joint model . |
Joint Framework for Event Extraction | Unfortunately, it is intractable to perform the exact search in our framework because: (1) by jointly modeling the trigger labeling and argument labeling, the search space becomes much more complex. |
Related Work | To the best of our knowledge, our work is the first attempt to jointly model these two ACE event subtasks. |
Related Work | There has been some previous work on joint modeling for biomedical events (Riedel and McCallum, 2011a; Riedel et al., 2009; McClosky et al., 2011; Riedel and McCallum, 2011b). |
Introduction | Our models also jointly model both aspects and aspect specific sentiments. |
Introduction | Our models are related to topic models in general (Blei et al., 2003) and joint models of aspects and sentiments in sentiment analysis in specific (e.g., Zhao et al., 2010). |
Introduction | First of all, we jointly model aspect and sentiment, while DF-LDA is only for topics/aspects. |
Experiments and Results | Finally, the Joint model is the combined document and year mention classifiers as described in Section 4.3. |
Experiments and Results | Table 4 shows the F1 scores of the Joint model by year. |
Experiments and Results | Table 4: Yearly results for the Joint model . |
Learning Time Constraints | Finally, given the document classifiers of Section 3 and the constraint classifier just defined in Section 4, we create a joint model combining the two with the following linear interpolation: |
Experiments | We can see that both character-level joint models outperform the pipelined system; our model with annotated word structures gives an improvement of 0.97% in tagging accuracy and 2.17% in phrase-structure parsing accuracy. |
Experiments | The results also demonstrate that the annotated word structures are highly effective for syntactic parsing, giving an absolute improvement of 0.82% in phrase-structure parsing accuracy over the joint model with flat word structures. |
Experiments | (2011), which additionally uses the Chinese Gigaword Corpus; Li ’11 denotes a generative model that can perform word segmentation, POS tagging and phrase-structure parsing jointly (Li, 2011); Li+ ’12 denotes a unified dependency parsing model that can perform joint word segmentation, POS tagging and dependency parsing (Li and Zhou, 2012); Li ’11 and Li+ ’12 exploited annotated morphological-level word structures for Chinese; Hatori+ ’12 denotes an incremental joint model for word segmentation, POS tagging and dependency parsing (Hatori et al., 2012); they use external dictionary resources including HowNet Word List and page names from the Chinese Wikipedia; Qian+ ’12 denotes a joint segmentation, POS tagging and parsing system using a unified framework for decoding, incorporating a word segmentation model, a POS tagging model and a phrase-structure parsing model together (Qian and Liu, 2012); their word segmentation model is a combination of character-based model and word-based model. |
Related Work | Their work demonstrates that a joint model can improve the performance of the three tasks, particularly for POS tagging and dependency parsing. |
Abstract | We learn a joint model of sentence extraction and compression for multi-document summarization. |
Efficient Prediction | By solving the following ILP we can compute the arg max required for prediction in the joint model: |
Experiments | Figure 4: Example summaries produced by our learned joint model of extraction and compression. |
Joint Model | Learning weights for Objective 2 where Y(x) is the set of compressive summaries, and C (y) the set of broken edges that produce subtree deletions, gives our LEARNED COMPRESSIVE system, which is our joint model of extraction and compression. |
Character-Level Dependency Tree | (2012) proposed a joint model for Chinese word segmentation, POS-tagging and dependency parsing, studying the influence of joint model and character features for parsing, Their model is extended from the arc-standard transition-based model, and can be regarded as an alternative to the arc-standard model of our work when pseudo intra-word dependencies are used. |
Character-Level Dependency Tree | (2012) investigate a joint model using pseudo intra-word dependencies. |
Character-Level Dependency Tree | To our knowledge, we are the first to apply the arc-eager system to joint models and achieve comparative performances to the arc-standard model. |
FrameNet — Wiktionary Alignment | In Table 2, we report on the results of the best single models and the best joint model . |
FrameNet — Wiktionary Alignment | For the joint model , we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms, synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739. |
FrameNet — Wiktionary Alignment | The BEST JOINT model performs well on nouns, slightly better on adjectives, and worse on verbs, see Table 2. |
Joint Translation Model | Phrase-pairs are emitted jointly and the over-11 probabilistic SCFG is a joint model over parallel trings. |
Joint Translation Model | By splitting the joint model in a hierarchical structure model and a lexical emission one we facilitate estimating the two models separately. |
Related Work | We show that a translation system based on such a joint model can perform competitively in comparison with conditional probability models, when it is augmented with a rich latent hierarchical structure trained adequately to avoid overfitting. |
Previous Work | It also focuses on jointly modeling the generation of both predicate and argument, and evaluation is performed on a set of human-plausibility judgments obtaining impressive results against Keller and Lapata’s (2003) Web hit-count based system. |
Topic Models for Selectional Prefs. | One weakness of IndependentLDA is that it doesn’t jointly model a1 and a2 together. |
Topic Models for Selectional Prefs. | On the one hand, J ointLDA jointly models the generation of both arguments in an extracted tuple. |
A Distributional Model for Argument Classification | 3.2 A Joint Model for Argument Classification |
Related Work | It incorporates strong dependencies within a comprehensive statistical joint model with a rich set of features over multiple argument phrases. |
Related Work | First local models are applied to produce role labels over individual arguments, then the joint model is used to decide the entire argument sequence among the set of the n-best competing solutions. |
Corpus Details | gender/age) based on the prior and joint modeling of the partner speaker’s gender/age in the same discourse. |
Corpus Details | We employ several varieties of classifier stacking and joint modeling to be effectively sensitive to these differences. |
Corpus Details | A novel partner-sensitve model shows performance gains from the joint modeling of speaker attributes along with partner speaker attributes, given the differences in lexical usage and discourse style such as observed between same-gender and mixed-gender conversations. |