Abstract | We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. |
Bilingual NER by Agreement | In order to model this uncertainty, we extend the two previously independent CRF models into a larger undirected graphical model , by introducing a cross-lingual edge factor gb(z', j ) for every pair of word positions (2', j) E A. |
Bilingual NER by Agreement | The way DD algorithms work in decomposing undirected graphical models is analogous to other message passing algorithms such as loopy belief propagation, but DD gives a stronger optimality guarantee upon convergence (Rush et al., 2010). |
Conclusion | We introduced a graphical model that combines two HMM word aligners and two CRF NER taggers into a joint model, and presented a dual decomposition inference method for performing efficient decoding over this model. |
Introduction | In this work, we first develop a bilingual NER model (denoted as BI-NER) by embedding two monolingual CRF-based NER models into a larger undirected graphical model , and introduce additional edge factors based on word alignment (WA). |
Introduction | previous applications of the DD method in NLP, where the model typically factors over two components and agreement is to be sought between the two (Rush et al., 2010; Koo et al., 2010; DeNero and Macherey, 2011; Chieu and Teow, 2012), our method decomposes the larger graphical model into many overlapping components where each alignment edge forms a separate factor. |
Joint Alignment and NER Decoding | We introduce a cross-lingual edge factor C (i, j) in the undirected graphical model for every pair of word indices (2', j), which predicts a binary vari- |
A Multigraph Model | Figure 1: An example graph modeling relations between mentions. |
A Multigraph Model | Many graph models for coreference resolution operate on A = V x V. Our multigraph model allows us to have multiple edges with different labels between mentions. |
A Multigraph Model | In contrast to previous work on similar graph models we do not learn any edge weights from training data. |
Introduction | Our approach belongs to a class of recently proposed graph models for coreference resolution (Cai and Strube, 2010; |
Relations | The graph model described in Section 3 is based on expressing relations between pairs of mentions via edges built from such relations. |
The computational model | Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1). |
The computational model | Figure 2 gives the mathematical description of the graphical model and Table 1 provides a key to the variables of our model. |
The computational model | Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation. |