Index of papers in Proc. ACL 2014 that mention
  • joint model
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
Figure 2 shows the factor graph for this joint model .
Discussion and Future Work
We find that we can outperform prior work in the low-resource setting by coupling the selection of feature templates based on information gain with a joint model that marginalizes over latent syntax.
Discussion and Future Work
Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
Experiments
This highlights an important advantage of the pipeline trained model: the features can consider any part of the syntax (e. g., arbitrary sub-trees), whereas the joint model is limited to those features over which it can efficiently marginalize (e.g., short dependency paths).
Experiments
In the low-resource setting of the CoNLL-2009 Shared task without syntactic supervision, our joint model (Joint) with marginalized syntax obtains state-of-the-art results with features IGC described in § 4.2.
Experiments
These results begin to answer a key research question in this work: The joint models outperform the pipeline models in the low-resource setting.
Introduction
0 Comparison of pipeline and joint models for SRL.
Introduction
The joint models use a non-loopy conditional random field (CRF) with a global factor constraining latent syntactic edge variables to form a tree.
Introduction
Even at the expense of no dependency path features, the joint models best pipeline-trained models for state-of-the-art performance in the low-resource setting (§ 4.4).
Related Work
In both pipeline and joint models , we use features adapted from state-of-the-art approaches to SRL.
joint model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Bottom-up tree-building
However, the major distinction between our models and theirs is that we do not jointly model the structure and the relation; rather, we use two linear-
Bottom-up tree-building
Although joint modeling has shown to be effective in various NLP and computer Vision applications (Sutton et al., 2007; Yang et al., 2009; Wojek and Schiele, 2008), our choice of using two separate models is for the following reasons:
Bottom-up tree-building
Then, in the tree-building process, we will have to deal with the situations where the joint model yields conflicting predictions: it is possible that the model predicts Sj = 1 and RJ- 2 NO-REL, or Vice versa, and we will have to decide which node to trust (and thus in some sense, the structure and the relation is no longer jointly modeled ).
Related work
2.2 Joty et al.’s joint model
Related work
Second, they jointly modeled the structure and the relation for a given pair of discourse units.
Related work
The strength of J oty et al.’s model is their joint modeling of the structure and the relation, such that information from each aspect can interact with the other.
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Abstract
Experiments on Automatic Content Extraction (ACE)1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.
Conclusions and Future Work
In addition, we aim to incorporate other IE components such as event extraction into the joint model .
Experiments
We compare our proposed method (Joint w/ Global) with the pipelined system (Pipeline), the joint model with only local features (Joint w/ Local), and two human annotators who annotated 73 documents in ACE’OS corpus.
Experiments
Our joint model correctly identified the entity mentions and their relation.
Experiments
Figure 7 shows the details when the joint model is applied to this sentence.
Introduction
This is the first work to incrementally predict entity mentions and relations using a single joint model (Section 3).
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Abstract
Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window.
Introduction
Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window.
Introduction
Unlike previous approaches to joint modeling (Le et al., 2012), our feature can be easily integrated into any statistical machine translation (SMT) decoder, which leads to substantially larger improvements than k-best rescoring only.
Model Variations
Although there has been a substantial amount of past work in lexicalized joint models (Marino et al., 2006; Crego and Yvon, 2010), nearly all of these papers have used older statistical techniques such as Kneser-Ney or Maximum Entropy.
Model Variations
This is consistent with our rescoring-only result, which indicates that k-best rescoring is too shallow to take advantage of the power of a joint model .
Model Variations
We have described a novel formulation for a neural network-based machine translation joint model , along with several simple variations of this model.
Neural Network Joint Model (NNJ M)
To make this a joint model , we also condition on source context vector 81-:
joint model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Dyer, Chris and Smith, Noah A.
Abstract
In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation.
Evaluation
To illustrate how the model described above can learn geographically-informed semantic representations of words, table 1 displays the terms with the highest cosine similarity to wicked in Kansas and Massachusetts after running our joint model on the full 1.1 billion words of Twitter data; while wicked in Kansas is close to other evaluative terms like evil and pure and religious terms like gods and spirit, in Massachusetts it is most similar to other intensifiers like super, ridiculously and insanely.
Evaluation
As one concrete example of these differences between individual data points, the cosine similarity between city and seattle in the —GEO model is 0.728 (seattle is ranked as the 188th most similar term to city overall); in the INDIVIDUAL model using only tweets from Washington state, 6WA(city,seattle) = 0.780 (rank #32); and in the JOINT model , using information from the entire United States with deviations for Washington, 6WA(city, seattle) = 0858 (rank #6).
Evaluation
While the two models that include geographical information naturally outperform the model that does not, the JOINT model generally far outperforms the INDIVIDUAL models trained on state-specific subsets of the data.1 A model that can exploit all of the information in the data, learning core vector-space representations for all words along with deviations for each contextual variable, is able to learn more geographically-informed representations for this task than strict geographical models alone.
Model
A joint model has three a priori advantages over independent models: (i) sharing data across variable values encourages representations across those values to be similar; e.g., while city may be closer to Boston in Massachusetts and Chicago in Illinois, in both places it still generally connotes a municipality; (ii) such sharing can mitigate data sparseness for less-witnessed areas; and (iii) with a joint model , all representations are guaranteed to
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Conclusion
In addition, the joint model is efficient enough for practical use.
Experiments
3'Other evaluation metrics are also proposed by (Zheng et al., 2011a) which is only suitable for their system since our system uses a joint model
Experiments
The selection of K also directly guarantees the running time of the joint model .
Experiments
using the proposed joint model are shown in Table 3 and Table 4.
Pinyin Input Method Model
To make typo correction better, we consider to integrate it with FTC conversion using a joint model .
Related Works
As we will propose a joint model
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Character-Level Dependency Tree
(2012) proposed a joint model for Chinese word segmentation, POS-tagging and dependency parsing, studying the influence of joint model and character features for parsing, Their model is extended from the arc-standard transition-based model, and can be regarded as an alternative to the arc-standard model of our work when pseudo intra-word dependencies are used.
Character-Level Dependency Tree
(2012) investigate a joint model using pseudo intra-word dependencies.
Character-Level Dependency Tree
To our knowledge, we are the first to apply the arc-eager system to joint models and achieve comparative performances to the arc-standard model.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: