Index of papers in Proc. ACL that mention
  • joint model
Finkel, Jenny Rose and Manning, Christopher D.
Abstract
One of the main obstacles to producing high quality joint models is the lack of jointly annotated data.
Abstract
Joint modeling of multiple natural language processing tasks outperforms single-task models learned from the same data, but still under-performs compared to single-task models learned on the more abundant quantities of available single-task annotated data.
Abstract
In this paper we present a novel model which makes use of additional single-task annotated data to improve the performance of a joint model .
Introduction
Joint models can be particularly useful for producing analyses of sentences which are used as input for higher-level, more semantically-oriented systems, such as question answering and machine translation.
Introduction
However, designing joint models which actually improve performance has proven challenging.
Introduction
There have been some recent successes with joint modeling .
joint model is mentioned in 49 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Naradowsky, Jason and Smith, David A.
Abstract
In evaluations on various highly—inflected languages, this joint model outperforms both a baseline tagger in morphological disambiguation, and a pipeline parser in head selection.
Baselines
To ensure a meaningful comparison with the joint model , our two baselines are both implemented in the same graphical model framework, and trained with the same machine-leaming algorithm.
Baselines
Roughly speaking, they divide up the variables and factors of the joint model and train them separately.
Baselines
The tagger is a graphical model with the WORD and TAG variables, connected by the local factors TAG-UNIGRAM, TAG-BIGRAM, and TAG-CONSISTENCY, all used in the joint model (§3).
Experimental Results
We compare the performance of the pipeline model (§4) and the joint model (§3) on morphological disambiguation and unlabeled dependency parsing.
Experimental Setup
The output of the joint model is the assignment to the TAG and LINK variables.
Experimental Setup
In principle, the joint model should consider every possible combination of morphological attributes for every word.
Introduction
After a description of previous work (§2), the joint model (§3) will be contrasted with the baseline pipeline model (§4).
Joint Model
If fully implemented in our joint model , these features would necessitate two separate families of link factors: 0(n3m3) factors for the POS trigrams, and 0(n2m4) factors for the POS 4-grams.
Previous Work
Goldberg and Tsarfaty (2008) propose a generative joint model .
joint model is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Abstract
We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese.
Abstract
Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models.
Abstract
We also perform comparison experiments with the partially joint models .
Introduction
Based on these observations, we aim at building a joint model that simultaneously processes word segmentation, POS tagging, and dependency parsing, trying to capture global interaction among
Introduction
We also perform comparison experiments with partially joint models , and investigate the tradeoff between the running speed and the model performance.
Model
(2011), we build our joint model to solve word segmentation, POS tagging, and dependency parsing within a single framework.
Model
In our joint model , the early update is invoked by mistakes in any of word segmentation, POS tagging, or dependency parsing.
Model
The list of the features used in our joint model is presented in Table 1, where $01—$05, W01—W21, and T01—05 are taken from Zhang and Clark (2010), and P01—P28 are taken from Huang and Sagae (2010).
Related Works
In contrast, we built a joint model based on a dependency-based framework, with a rich set of structural features.
Related Works
Because we found that even an incremental approach with beam search is intractable if we perform the word-based decoding, we take a character-based approach to produce our joint model .
joint model is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Minkov, Einat and Zettlemoyer, Luke
Abstract
This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events.
Abstract
The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields.
Corporate Acquisitions
Unfortunately, we can not directly compare against a generative joint model evaluated on this dataset (Haghighi and Klein, 2010).7 The best results per attribute are shown in boldface.
Introduction
In this paper, we present a joint modeling and learning approach for the combined tasks of mention detection, unification, and template filling, as described above.
Introduction
We also demonstrate, through ablation studies on the feature set, the need for joint modeling and the relative importance of the different types of joint constraints.
Seminar Extraction Task
An important question to be addressed in evaluation is to what extent the joint modeling approach contributes to performance.
Seminar Extraction Task
This is largely due to erroneous assignments of named entities of other types (mainly, person) as titles; such errors are avoided in the full joint model , where tuple validity is enforced.
Seminar Extraction Task
As argued before, joint modeling is especially important for irregular fields, such as title; we provide first results on this field.
Summary and Future Work
This approach allows for joint modeling of interdependen-cies at all levels and across fields.
Summary and Future Work
Finally, it is worth exploring scaling the approach to unrestricted event extraction, and jointly model extracting more than one relation per document.
joint model is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu
Abstract
In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilingual scenario: each hidden state (POS tag) of a source-side dependency tree emits a source word together with its aligned target word, either jointly ( joint model ), or independently (independent model).
Abstract
Our independent model gains over 1 point in BLEU by resolving the sparseness problem introduced in the joint model .
Bilingual Infinite Tree Model
This paper proposes two types of models that differ in their processes for generating observations: the joint model and the independent model.
Bilingual Infinite Tree Model
3.1 Joint Model
Bilingual Infinite Tree Model
The joint model is a simple application of the infinite tree model under a bilingual scenario.
Introduction
We investigate two types of models: (i) a joint model and (ii) an independent model.
Introduction
In the joint model , each hidden state jointly emits both a source word and its aligned target word as an observation.
Related Work
Figure 4: An Example of the Joint Model
joint model is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
3.2 The Joint Model
A Joint Model with Unlabeled Parallel Text
Since previous work (Banea et al., 2008; 2010; Wan, 2009) has shown that it could be useful to automatically translate the labeled data from the source language into the target language, we can further incorporate such translated labeled data into the joint model by adding the following component into Equation 6:
Conclusion
In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.
Experimental Setup 4.1 Data Sets and Preprocessing
In our experiments, the proposed joint model is compared with the following baseline methods.
Introduction
In Section 3, the proposed joint model is described.
Related Work
Another notable approach is the work of Boyd-Graber and Resnik (2010), which presents a generative model --- supervised multilingual latent Dirichlet allocation --- that jointly models topics that are consistent across languages, and employs them to better predict sentiment ratings.
Results and Analysis
We first compare the proposed joint model (Joint) with the baselines in Table 2.
Results and Analysis
Overall, the unlabeled parallel data improves classification accuracy for both languages when using our proposed joint model and Co-SVM.
Results and Analysis
The joint model makes better use of the unlabeled parallel data than Co-SVM or TSVMs presumably because of its attempt to jointly optimize the two monolingual models via soft (probabilistic) assignments of the unlabeled instances to classes in each iteration, instead of the hard assignments in Co-SVM and TSVMs.
joint model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
Figure 2 shows the factor graph for this joint model .
Discussion and Future Work
We find that we can outperform prior work in the low-resource setting by coupling the selection of feature templates based on information gain with a joint model that marginalizes over latent syntax.
Discussion and Future Work
Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
Experiments
This highlights an important advantage of the pipeline trained model: the features can consider any part of the syntax (e. g., arbitrary sub-trees), whereas the joint model is limited to those features over which it can efficiently marginalize (e.g., short dependency paths).
Experiments
In the low-resource setting of the CoNLL-2009 Shared task without syntactic supervision, our joint model (Joint) with marginalized syntax obtains state-of-the-art results with features IGC described in § 4.2.
Experiments
These results begin to answer a key research question in this work: The joint models outperform the pipeline models in the low-resource setting.
Introduction
0 Comparison of pipeline and joint models for SRL.
Introduction
The joint models use a non-loopy conditional random field (CRF) with a global factor constraining latent syntactic edge variables to form a tree.
Introduction
Even at the expense of no dependency path features, the joint models best pipeline-trained models for state-of-the-art performance in the low-resource setting (§ 4.4).
Related Work
In both pipeline and joint models , we use features adapted from state-of-the-art approaches to SRL.
joint model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging.
Abstract
An inductive character-based joint model is obtained eventually.
Introduction
In the past years, several proposed supervised joint models (Ng and Low, 2004; Zhang and Clark, 2008; Jiang et al., 2009; Zhang and Clark, 2010) achieved reasonably accurate results, but the outstanding problem among these models is that they rely heavily on a large amount of labeled data, i.e., segmented texts with POS tags.
Method
It is directed to maximize the conditional likelihood of hidden states with the derived label distributions on unlabeled data, i.e., p(y, vlzc), where y and v are jointly modeled but
Method
Firstly, as expected, for the two supervised baselines, the joint model outperforms the pipeline one, especially on segmentation.
Method
This outcome verifies the commonly accepted fact that the joint model can substantially improve the pipeline one, since POS tags provide additional information to word segmentation (Ng and Low, 2004).
Related Work
The state-of-the-art joint models include reranking approaches (Shi and Wang, 2007), hybrid approaches (Nakagawa and Uchimoto, 2007; Jiang et al., 2008; Sun, 2011), and single-model approaches (Ng and Low, 2004; Zhang and Clark, 2008; Kruengkrai et al., 2009; Zhang and Clark, 2010).
joint model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Bottom-up tree-building
However, the major distinction between our models and theirs is that we do not jointly model the structure and the relation; rather, we use two linear-
Bottom-up tree-building
Although joint modeling has shown to be effective in various NLP and computer Vision applications (Sutton et al., 2007; Yang et al., 2009; Wojek and Schiele, 2008), our choice of using two separate models is for the following reasons:
Bottom-up tree-building
Then, in the tree-building process, we will have to deal with the situations where the joint model yields conflicting predictions: it is possible that the model predicts Sj = 1 and RJ- 2 NO-REL, or Vice versa, and we will have to decide which node to trust (and thus in some sense, the structure and the relation is no longer jointly modeled ).
Related work
2.2 Joty et al.’s joint model
Related work
Second, they jointly modeled the structure and the relation for a given pair of discourse units.
Related work
The strength of J oty et al.’s model is their joint modeling of the structure and the relation, such that information from each aspect can interact with the other.
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Abstract
Experiments on Automatic Content Extraction (ACE)1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.
Conclusions and Future Work
In addition, we aim to incorporate other IE components such as event extraction into the joint model .
Experiments
We compare our proposed method (Joint w/ Global) with the pipelined system (Pipeline), the joint model with only local features (Joint w/ Local), and two human annotators who annotated 73 documents in ACE’OS corpus.
Experiments
Our joint model correctly identified the entity mentions and their relation.
Experiments
Figure 7 shows the details when the joint model is applied to this sentence.
Introduction
This is the first work to incrementally predict entity mentions and relations using a single joint model (Section 3).
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
Abstract
We present a joint model for Chinese word segmentation and new word detection.
Abstract
We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling .
Introduction
In this paper, we present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling of Chinese word segmentation (CWS) and new word detection (NWD).
Introduction
While most of the state-of-the-art CWS systems used semi-Markov conditional random fields or latent variable conditional random fields, we simply use a single first-order conditional random fields (CRFs) for the joint modeling .
Introduction
0 We propose a joint model for Chinese word segmentation and new word detection.
System Architecture
3.1 A Joint Model Based on CRFs
System Architecture
In this paper, we presented a joint model for Chinese word segmentation and new word detection.
System Architecture
We presented new features, including word-based features and enriched edge features, for the joint modeling .
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Abstract
We describe a joint model for understanding user actions in natural language utterances.
Background
Only recent research has focused on the joint modeling of SLU (Jeong and Lee, 2008; Wang, 2010) taking into account the dependencies at learning time.
Background
Our joint model can discover domain D, and user’s act A as higher layer latent concepts of utterances in relation to lower layer latent semantic topics (slots) S such as named-entities (”New York”) or context bearing non-named entities (”vegan ”).
Data and Approach Overview
Here we define several abstractions of our joint model as depicted in Fig.
Experiments
* Tri—CRF: We used Triangular Chain CRF (J eong and Lee, 2008) as our supervised joint model baseline.
Experiments
We evaluate the performance of our joint model on two experiments using two metrics.
Experiments
The results show that our joint modeling approach has an advantage over the other joint models (i.e., Tri—CRF) in that it can leverage unlabeled NL utterances.
Introduction
Recent work on SLU (Jeong and Lee, 2008; Wang, 2010) presents joint modeling of two components, i.e., the domain and slot or dialog act and slot components together.
joint model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Abstract
Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window.
Introduction
Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window.
Introduction
Unlike previous approaches to joint modeling (Le et al., 2012), our feature can be easily integrated into any statistical machine translation (SMT) decoder, which leads to substantially larger improvements than k-best rescoring only.
Model Variations
Although there has been a substantial amount of past work in lexicalized joint models (Marino et al., 2006; Crego and Yvon, 2010), nearly all of these papers have used older statistical techniques such as Kneser-Ney or Maximum Entropy.
Model Variations
This is consistent with our rescoring-only result, which indicates that k-best rescoring is too shallow to take advantage of the power of a joint model .
Model Variations
We have described a novel formulation for a neural network-based machine translation joint model , along with several simple variations of this model.
Neural Network Joint Model (NNJ M)
To make this a joint model , we also condition on source context vector 81-:
joint model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine
Abstract
We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model.
Abstract
We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s.
Background and related work
However, as they point out, combining the segmentation and the variation model into one joint model is not straightforward and usual inference procedures are infeasible, which requires the use of several heuristics.
Background and related work
They do not aim for a joint model that also handles word segmentation, however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data.
Conclusion and outlook
We presented a joint model for word segmentation and the learning of phonological rule probabilities from a corpus of transcribed speech.
The computational model
Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.
The computational model
(2009) segmentation models, exact inference is infeasible for our joint model .
joint model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Dyer, Chris and Smith, Noah A.
Abstract
In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation.
Evaluation
To illustrate how the model described above can learn geographically-informed semantic representations of words, table 1 displays the terms with the highest cosine similarity to wicked in Kansas and Massachusetts after running our joint model on the full 1.1 billion words of Twitter data; while wicked in Kansas is close to other evaluative terms like evil and pure and religious terms like gods and spirit, in Massachusetts it is most similar to other intensifiers like super, ridiculously and insanely.
Evaluation
As one concrete example of these differences between individual data points, the cosine similarity between city and seattle in the —GEO model is 0.728 (seattle is ranked as the 188th most similar term to city overall); in the INDIVIDUAL model using only tweets from Washington state, 6WA(city,seattle) = 0.780 (rank #32); and in the JOINT model , using information from the entire United States with deviations for Washington, 6WA(city, seattle) = 0858 (rank #6).
Evaluation
While the two models that include geographical information naturally outperform the model that does not, the JOINT model generally far outperforms the INDIVIDUAL models trained on state-specific subsets of the data.1 A model that can exploit all of the information in the data, learning core vector-space representations for all words along with deviations for each contextual variable, is able to learn more geographically-informed representations for this task than strict geographical models alone.
Model
A joint model has three a priori advantages over independent models: (i) sharing data across variable values encourages representations across those values to be similar; e.g., while city may be closer to Boston in Massachusetts and Chicago in Illinois, in both places it still generally connotes a municipality; (ii) such sharing can mitigate data sparseness for less-witnessed areas; and (iii) with a joint model , all representations are guaranteed to
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Conclusion
In addition, the joint model is efficient enough for practical use.
Experiments
3'Other evaluation metrics are also proposed by (Zheng et al., 2011a) which is only suitable for their system since our system uses a joint model
Experiments
The selection of K also directly guarantees the running time of the joint model .
Experiments
using the proposed joint model are shown in Table 3 and Table 4.
Pinyin Input Method Model
To make typo correction better, we consider to integrate it with FTC conversion using a joint model .
Related Works
As we will propose a joint model
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Reichart, Roi and Korhonen, Anna
Conclusions and Future Work
A natural extension of our unified framework is to construct a joint model in which the predictions for all three tasks inform each other at all stages of the prediction process.
Introduction
(2012) presented a joint model for inducing simple syntactic frames and VCs.
Introduction
(2012) introduced a joint model for SCF and SP acquisition.
Previous Work
Joint Modeling A small number of works have recently investigated joint approaches to SCFs, SPs and VCs.
Previous Work
Although evaluation of these recent joint models has been partial, the results have been encouraging and fur-
The Unified Framework
DPPs are particularly suitable for joint modeling as they come with various simple and intuitive ways to combine individual model kernel matrices into a joint kernel.
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Morita, Hajime and Sasano, Ryohei and Takamura, Hiroya and Okumura, Manabu
Introduction
Joint models of sentence extraction and compression have a great benefit in that they have a large degree of freedom as far as controlling redundancy goes.
Introduction
In contrast, conventional two-stage approaches (Za-jic et al., 2006), which first generate candidate compressed sentences and then use them to generate a summary, have less computational complexity than joint models .
Introduction
Joint models can prune unimportant or redundant descriptions without resorting to enumeration.
Joint Model of Extraction and Compression
Therefore, the joint model can extract an arbitrarily compressed sentence as a subtree without enumerating all candidates.
Joint Model of Extraction and Compression
The joint model can remove the redundant part as well as the irrelevant part of a sentence, because the model simultaneously extracts and compresses sentences.
Joint Model of Extraction and Compression
In this joint model , we generate a compressed sentence by extracting an arbitrary subtree from a dependency tree of a sentence.
joint model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Abstract
In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries.
Conclusion
Our approach, which we formulate as a Two-Neighbor Orientation model, includes the joint modeling of two orientation decisions and the modeling of the maximal span of the reordered chunks through the concept of Maximal Orientation Span.
Introduction
Then, we jointly model the orientations of chunks that immediately precede and follow the anchors (hence, the name “two-neighbor”) along with the maximal span of these chunks, to which we refer as Maximal Orientation Span (MOS).
Introduction
To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system, which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation.
Two-Neighbor Orientation Model
Our Two-Neighbor Orientation model (TNO) designates A C A(@) as anchors and jointly models the orientation of chunks that appear immediately to the left and to the right of the anchors as well as the identities of these chunks.
joint model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Goldwater, Sharon and Jurafsky, Dan and Manning, Christopher D.
Analysis using a joint model
According to our joint model , these effects still hold even after controlling for other features.
Analysis using a joint model
Our joint model controls for the first two of these factors, suggesting that the third factor or some other explanation must account for the remaining differences between males and females.
Analysis using a joint model
In the joint model , we see the same effect of pitch mean and an even stronger effect for intensity, with the predicted odds of an error dramatically higher for extreme intensity values.
Conclusion
Using IWER, we analyzed the effects of various word-level lexical and prosodic features, both individually and in a joint model .
joint model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pantel, Patrick and Lin, Thomas and Gamon, Michael
Abstract
We jointly model the interplay between latent user intents that govern queries and unobserved entity types, leveraging observed signals from query formulations and document clicks.
Conclusion
Jointly modeling the interplay between the underlying user intents and entity types in web search queries shows significant improvements over the current state of the art on the task of resolving entity types in head queries.
Evaluation Methodology
In order to learn type distributions by jointly modeling user intents and a large number of types, we require a large set of training examples containing tagged entities and their potential types.
Introduction
We show that jointly modeling user intent and entity type significantly outperforms the current state of the art on the task of entity type resolution in queries.
Related Work
Our models also expand upon theirs by jointly modeling
joint model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng and Huang, Liang
Introduction
We propose a novel joint event extraction algorithm to predict the triggers and arguments simultaneously, and use the structured perceptron (Collins, 2002) to train the joint model .
Joint Framework for Event Extraction
Unfortunately, it is intractable to perform the exact search in our framework because: (1) by jointly modeling the trigger labeling and argument labeling, the search space becomes much more complex.
Related Work
To the best of our knowledge, our work is the first attempt to jointly model these two ACE event subtasks.
Related Work
There has been some previous work on joint modeling for biomedical events (Riedel and McCallum, 2011a; Riedel et al., 2009; McClosky et al., 2011; Riedel and McCallum, 2011b).
joint model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Introduction
Our models also jointly model both aspects and aspect specific sentiments.
Introduction
Our models are related to topic models in general (Blei et al., 2003) and joint models of aspects and sentiments in sentiment analysis in specific (e.g., Zhao et al., 2010).
Introduction
First of all, we jointly model aspect and sentiment, while DF-LDA is only for topics/aspects.
joint model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael
Experiments and Results
Finally, the Joint model is the combined document and year mention classifiers as described in Section 4.3.
Experiments and Results
Table 4 shows the F1 scores of the Joint model by year.
Experiments and Results
Table 4: Yearly results for the Joint model .
Learning Time Constraints
Finally, given the document classifiers of Section 3 and the constraint classifier just defined in Section 4, we create a joint model combining the two with the following linear interpolation:
joint model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Experiments
We can see that both character-level joint models outperform the pipelined system; our model with annotated word structures gives an improvement of 0.97% in tagging accuracy and 2.17% in phrase-structure parsing accuracy.
Experiments
The results also demonstrate that the annotated word structures are highly effective for syntactic parsing, giving an absolute improvement of 0.82% in phrase-structure parsing accuracy over the joint model with flat word structures.
Experiments
(2011), which additionally uses the Chinese Gigaword Corpus; Li ’11 denotes a generative model that can perform word segmentation, POS tagging and phrase-structure parsing jointly (Li, 2011); Li+ ’12 denotes a unified dependency parsing model that can perform joint word segmentation, POS tagging and dependency parsing (Li and Zhou, 2012); Li ’11 and Li+ ’12 exploited annotated morphological-level word structures for Chinese; Hatori+ ’12 denotes an incremental joint model for word segmentation, POS tagging and dependency parsing (Hatori et al., 2012); they use external dictionary resources including HowNet Word List and page names from the Chinese Wikipedia; Qian+ ’12 denotes a joint segmentation, POS tagging and parsing system using a unified framework for decoding, incorporating a word segmentation model, a POS tagging model and a phrase-structure parsing model together (Qian and Liu, 2012); their word segmentation model is a combination of character-based model and word-based model.
Related Work
Their work demonstrates that a joint model can improve the performance of the three tasks, particularly for POS tagging and dependency parsing.
joint model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Gillick, Dan and Klein, Dan
Abstract
We learn a joint model of sentence extraction and compression for multi-document summarization.
Efficient Prediction
By solving the following ILP we can compute the arg max required for prediction in the joint model:
Experiments
Figure 4: Example summaries produced by our learned joint model of extraction and compression.
Joint Model
Learning weights for Objective 2 where Y(x) is the set of compressive summaries, and C (y) the set of broken edges that produce subtree deletions, gives our LEARNED COMPRESSIVE system, which is our joint model of extraction and compression.
joint model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Character-Level Dependency Tree
(2012) proposed a joint model for Chinese word segmentation, POS-tagging and dependency parsing, studying the influence of joint model and character features for parsing, Their model is extended from the arc-standard transition-based model, and can be regarded as an alternative to the arc-standard model of our work when pseudo intra-word dependencies are used.
Character-Level Dependency Tree
(2012) investigate a joint model using pseudo intra-word dependencies.
Character-Level Dependency Tree
To our knowledge, we are the first to apply the arc-eager system to joint models and achieve comparative performances to the arc-standard model.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hartmann, Silvana and Gurevych, Iryna
FrameNet — Wiktionary Alignment
In Table 2, we report on the results of the best single models and the best joint model .
FrameNet — Wiktionary Alignment
For the joint model , we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms, synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739.
FrameNet — Wiktionary Alignment
The BEST JOINT model performs well on nouns, slightly better on adjectives, and worse on verbs, see Table 2.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mylonakis, Markos and Sima'an, Khalil
Joint Translation Model
Phrase-pairs are emitted jointly and the over-11 probabilistic SCFG is a joint model over parallel trings.
Joint Translation Model
By splitting the joint model in a hierarchical structure model and a lexical emission one we facilitate estimating the two models separately.
Related Work
We show that a translation system based on such a joint model can perform competitively in comparison with conditional probability models, when it is augmented with a rich latent hierarchical structure trained adequately to avoid overfitting.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ritter, Alan and Mausam and Etzioni, Oren
Previous Work
It also focuses on jointly modeling the generation of both predicate and argument, and evaluation is performed on a set of human-plausibility judgments obtaining impressive results against Keller and Lapata’s (2003) Web hit-count based system.
Topic Models for Selectional Prefs.
One weakness of IndependentLDA is that it doesn’t jointly model a1 and a2 together.
Topic Models for Selectional Prefs.
On the one hand, J ointLDA jointly models the generation of both arguments in an extracted tuple.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Croce, Danilo and Giannone, Cristina and Annesi, Paolo and Basili, Roberto
A Distributional Model for Argument Classification
3.2 A Joint Model for Argument Classification
Related Work
It incorporates strong dependencies within a comprehensive statistical joint model with a rich set of features over multiple argument phrases.
Related Work
First local models are applied to produce role labels over individual arguments, then the joint model is used to decide the entire argument sequence among the set of the n-best competing solutions.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Garera, Nikesh and Yarowsky, David
Corpus Details
gender/age) based on the prior and joint modeling of the partner speaker’s gender/age in the same discourse.
Corpus Details
We employ several varieties of classifier stacking and joint modeling to be effectively sensitive to these differences.
Corpus Details
A novel partner-sensitve model shows performance gains from the joint modeling of speaker attributes along with partner speaker attributes, given the differences in lexical usage and discourse style such as observed between same-gender and mixed-gender conversations.
joint model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: