Abstract | In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilingual scenario: each hidden state (POS tag) of a source-side dependency tree emits a source word together with its aligned target word, either jointly ( joint model ), or independently (independent model). |
Abstract | Our independent model gains over 1 point in BLEU by resolving the sparseness problem introduced in the joint model . |
Bilingual Infinite Tree Model | This paper proposes two types of models that differ in their processes for generating observations: the joint model and the independent model. |
Bilingual Infinite Tree Model | 3.1 Joint Model |
Bilingual Infinite Tree Model | The joint model is a simple application of the infinite tree model under a bilingual scenario. |
Introduction | We investigate two types of models: (i) a joint model and (ii) an independent model. |
Introduction | In the joint model , each hidden state jointly emits both a source word and its aligned target word as an observation. |
Related Work | Figure 4: An Example of the Joint Model |
Abstract | This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. |
Abstract | An inductive character-based joint model is obtained eventually. |
Introduction | In the past years, several proposed supervised joint models (Ng and Low, 2004; Zhang and Clark, 2008; Jiang et al., 2009; Zhang and Clark, 2010) achieved reasonably accurate results, but the outstanding problem among these models is that they rely heavily on a large amount of labeled data, i.e., segmented texts with POS tags. |
Method | It is directed to maximize the conditional likelihood of hidden states with the derived label distributions on unlabeled data, i.e., p(y, vlzc), where y and v are jointly modeled but |
Method | Firstly, as expected, for the two supervised baselines, the joint model outperforms the pipeline one, especially on segmentation. |
Method | This outcome verifies the commonly accepted fact that the joint model can substantially improve the pipeline one, since POS tags provide additional information to word segmentation (Ng and Low, 2004). |
Related Work | The state-of-the-art joint models include reranking approaches (Shi and Wang, 2007), hybrid approaches (Nakagawa and Uchimoto, 2007; Jiang et al., 2008; Sun, 2011), and single-model approaches (Ng and Low, 2004; Zhang and Clark, 2008; Kruengkrai et al., 2009; Zhang and Clark, 2010). |
Abstract | We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model. |
Abstract | We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s. |
Background and related work | However, as they point out, combining the segmentation and the variation model into one joint model is not straightforward and usual inference procedures are infeasible, which requires the use of several heuristics. |
Background and related work | They do not aim for a joint model that also handles word segmentation, however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data. |
Conclusion and outlook | We presented a joint model for word segmentation and the learning of phonological rule probabilities from a corpus of transcribed speech. |
The computational model | Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation. |
The computational model | (2009) segmentation models, exact inference is infeasible for our joint model . |
Introduction | Joint models of sentence extraction and compression have a great benefit in that they have a large degree of freedom as far as controlling redundancy goes. |
Introduction | In contrast, conventional two-stage approaches (Za-jic et al., 2006), which first generate candidate compressed sentences and then use them to generate a summary, have less computational complexity than joint models . |
Introduction | Joint models can prune unimportant or redundant descriptions without resorting to enumeration. |
Joint Model of Extraction and Compression | Therefore, the joint model can extract an arbitrarily compressed sentence as a subtree without enumerating all candidates. |
Joint Model of Extraction and Compression | The joint model can remove the redundant part as well as the irrelevant part of a sentence, because the model simultaneously extracts and compresses sentences. |
Joint Model of Extraction and Compression | In this joint model , we generate a compressed sentence by extracting an arbitrary subtree from a dependency tree of a sentence. |
Conclusions and Future Work | A natural extension of our unified framework is to construct a joint model in which the predictions for all three tasks inform each other at all stages of the prediction process. |
Introduction | (2012) presented a joint model for inducing simple syntactic frames and VCs. |
Introduction | (2012) introduced a joint model for SCF and SP acquisition. |
Previous Work | Joint Modeling A small number of works have recently investigated joint approaches to SCFs, SPs and VCs. |
Previous Work | Although evaluation of these recent joint models has been partial, the results have been encouraging and fur- |
The Unified Framework | DPPs are particularly suitable for joint modeling as they come with various simple and intuitive ways to combine individual model kernel matrices into a joint kernel. |
Abstract | In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. |
Conclusion | Our approach, which we formulate as a Two-Neighbor Orientation model, includes the joint modeling of two orientation decisions and the modeling of the maximal span of the reordered chunks through the concept of Maximal Orientation Span. |
Introduction | Then, we jointly model the orientations of chunks that immediately precede and follow the anchors (hence, the name “two-neighbor”) along with the maximal span of these chunks, to which we refer as Maximal Orientation Span (MOS). |
Introduction | To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system, which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation. |
Two-Neighbor Orientation Model | Our Two-Neighbor Orientation model (TNO) designates A C A(@) as anchors and jointly models the orientation of chunks that appear immediately to the left and to the right of the anchors as well as the identities of these chunks. |
Introduction | We propose a novel joint event extraction algorithm to predict the triggers and arguments simultaneously, and use the structured perceptron (Collins, 2002) to train the joint model . |
Joint Framework for Event Extraction | Unfortunately, it is intractable to perform the exact search in our framework because: (1) by jointly modeling the trigger labeling and argument labeling, the search space becomes much more complex. |
Related Work | To the best of our knowledge, our work is the first attempt to jointly model these two ACE event subtasks. |
Related Work | There has been some previous work on joint modeling for biomedical events (Riedel and McCallum, 2011a; Riedel et al., 2009; McClosky et al., 2011; Riedel and McCallum, 2011b). |
Experiments | We can see that both character-level joint models outperform the pipelined system; our model with annotated word structures gives an improvement of 0.97% in tagging accuracy and 2.17% in phrase-structure parsing accuracy. |
Experiments | The results also demonstrate that the annotated word structures are highly effective for syntactic parsing, giving an absolute improvement of 0.82% in phrase-structure parsing accuracy over the joint model with flat word structures. |
Experiments | (2011), which additionally uses the Chinese Gigaword Corpus; Li ’11 denotes a generative model that can perform word segmentation, POS tagging and phrase-structure parsing jointly (Li, 2011); Li+ ’12 denotes a unified dependency parsing model that can perform joint word segmentation, POS tagging and dependency parsing (Li and Zhou, 2012); Li ’11 and Li+ ’12 exploited annotated morphological-level word structures for Chinese; Hatori+ ’12 denotes an incremental joint model for word segmentation, POS tagging and dependency parsing (Hatori et al., 2012); they use external dictionary resources including HowNet Word List and page names from the Chinese Wikipedia; Qian+ ’12 denotes a joint segmentation, POS tagging and parsing system using a unified framework for decoding, incorporating a word segmentation model, a POS tagging model and a phrase-structure parsing model together (Qian and Liu, 2012); their word segmentation model is a combination of character-based model and word-based model. |
Related Work | Their work demonstrates that a joint model can improve the performance of the three tasks, particularly for POS tagging and dependency parsing. |
FrameNet — Wiktionary Alignment | In Table 2, we report on the results of the best single models and the best joint model . |
FrameNet — Wiktionary Alignment | For the joint model , we employed the best single PPR configuration, and a COS configuration that uses sense gloss extended by Wiktionary hypernyms, synonyms and FrameNet frame name and frame definition, to achieve the highest score, an F1-score of 0.739. |
FrameNet — Wiktionary Alignment | The BEST JOINT model performs well on nouns, slightly better on adjectives, and worse on verbs, see Table 2. |