Abstract | In this work we address both problems by incorporating cross-lingual features and knowledge bases from English using cross—lingual links. |
Abstract | We show the effectiveness of cross-lingual features and resources on a standard dataset as well as on two new test sets that cover both news and microblogs. |
Introduction | To address this problem, we introduce the use of cross-lingual links between a disadvantaged language, Arabic, and a language with good discrim-inative features and large resources, English, to improve Arabic NER. |
Introduction | Cross-lingual links are obtained using Wikipedia cross-language links and a large Machine Translation (MT) phrase table that is true cased, where word casing is preserved during training. |
Introduction | - Using cross-lingual links to exploit orthographic features in other languages. |
Related Work | 2.1 Using cross-lingual Features |
Related Work | If cross-lingual resources are available, such as parallel data, increased training data, better resources, or superior features can be used to improve the processing (ex. |
Related Work | To overcome these two problems, we use cross-lingual features to improve NER using large bilingual resources, and we incorporate confidences to avoid having a binary feature. |
Abstract | This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline. |
Background and Motivation | Cross-lingual annotation projection systems (Pado and Lapata, 2009), for example, propagate information directly via word alignment links. |
Background and Motivation | An alternative approach, known as cross-lingual model transfer, or cross-lingual model adaptation, consists of modifying a source-language model to make it directly applicable to a new language. |
Background and Motivation | (2012) enriches this representation with cross-lingual word clusters, considerably improving the performance. |
Setup | The purpose of the study is not to develop a yet another semantic role labeling system — any existing SRL system can (after some modification) be used in this setup — but to assess the practical applicability of cross-lingual model transfer to this problem, compare it against the alternatives and identify its strong/weak points depending on a particular setup. |
Setup | With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2). |
Setup | This can be achieved either by means of cross-lingual annotation projection (Yarowsky et al., 2001) or by cross-lingual model transfer (Zeman and Resnik, 2008). |
Abstract | To show the usefulness of such a resource, we present a case study of cross-lingual transfer parsing with more reliable evaluation than has been possible before. |
Experiments | One of the motivating factors in creating such a data set was improved cross-lingual transfer evaluation. |
Introduction | First, a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components. |
Introduction | Second, consistent syntactic representations are desirable in the evaluation of unsupervised (Klein and Manning, 2004) or cross-lingual syntactic parsers (Hwa et al., 2005). |
Introduction | In the cross-lingual study of McDonald et al. |
Towards A Universal Treebank | The selected sentences were preprocessed using cross-lingual taggers (Das and Petrov, 2011) and parsers (McDonald et al., 2011). |
Abstract | In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual , is addressed through the means of clustering. |
Clustering for Cross Lingual Sentiment Analysis | 4.3 Approach 3: Cross-Lingual Clustering (XC) |
Clustering for Cross Lingual Sentiment Analysis | (2012) introduced cross-lingual clustering. |
Clustering for Cross Lingual Sentiment Analysis | In cross-lingual clustering, the objective function maximizes the joint likelihood of monolingual and cross-lingual factors. |
Experimental Setup | Cross-lingual clustering for CLSA |
Introduction | Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009). |
Introduction | Instead, language gap for performing CLSA is bridged using linked cluster or cross-lingual clusters (explained in section 4) with the help of unlabelled monolingual corpora. |
Related Work | In situations where labeled data is not present in a language, approaches based on cross-lingual sentiment analysis are used. |
Results | Cross-lingual SA accuracies are presented in Table 3. |
Abstract | In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. |
Experiments | It obtains the values by two steps: finding their counterparts (if available) in English using Wikipedia cross-lingual links and attribute alignments, and translating them into Chinese. |
Introduction | Some translation-based cross-lingual knowledge |
Introduction | The recall of new target infoboxes is highly limited by the number of equivalent cross-lingual articles and the number of existing source infoboxes. |
Introduction | In general, we address the problem of cross-lingual knowledge extraction by using the imbalance between Wikipedias of different languages. |
Our Approach | To generate the training data for the target attribute attrT, we first determine the equivalent cross-lingual attribute attrs. |
Our Approach | Therefore, it is convenient to align the cross-lingual attributes using English Wikipedia as bridge. |
Preliminaries | In this section, we introduce some basic concepts regarding Wikipedia, formally defining the key problem of cross-lingual knowledge extraction and providing an overview of the WikiCiKE framework. |
Preliminaries | We will use “name in TEMPLATE PERSON” to refer to the attribute attrname in the template tp p E R30 N. In this cross-lingual task, we use the source (S) and target (T) languages to denote the languages of auxiliary and target Wikipcdias, rc-spcctivcly. |
Abstract | We introduce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. |
Bilingual NER by Agreement | In order to model this uncertainty, we extend the two previously independent CRF models into a larger undirected graphical model, by introducing a cross-lingual edge factor gb(z', j ) for every pair of word positions (2', j) E A. |
Bilingual NER by Agreement | Initially, each of the cross-lingual edge factors will attempt to assign a pair of tags that has the highest PMI score, but if the monolingual taggers do not agree, a penalty will start accumulating over this pair, until some other pair that agrees better with the monolingual models takes the top spot. |
Bilingual NER by Agreement | Simultaneously, the monolingual models will also be encouraged to agree with the cross-lingual edge factors. |
Joint Alignment and NER Decoding | We introduce a cross-lingual edge factor C (i, j) in the undirected graphical model for every pair of word indices (2', j), which predicts a binary vari- |
Joint Alignment and NER Decoding | One special note is that after each iteration when we consider updates to the dual constraint for entity tags, we only check tag agreements for cross-lingual edge factors that have an alignment assignment value of 1. |
Joint Alignment and NER Decoding | In other words, cross-lingual edges that are not aligned do not affect bilingual NER tagging. |
Experiments | Cross-lingual information transfer |
Introduction | This need can be addressed in part by cross-lingual information access tasks such as entity linking (McNamee et al., 2011; Cassidy et al., 2012), event extraction (Hakkani-Tur et al., 2007), slot filling (Snover et al., 2011) and question answering (Parton et al., 2009; Parton and McKeown, 2010). |
Introduction | A key bottleneck of high-quality cross-lingual information access lies in the performance of Machine Translation (MT). |
Name-aware MT | Traditional name tagging approaches for single languages cannot address this requirement because they were all built on data and resources which are specific to each language without using any cross-lingual features. |
Name-aware MT | We developed a bilingual joint name tagger (Li et al., 2012) based on conditional random fields that incorporates both monolingual and cross-lingual features and conducts joint inference, so that name tagging from two languages can mutually enhance each other and therefore inconsistent results can be corrected simultaneously. |
Name-aware MT Evaluation | However, for cross-lingual information processing applications, we should acknowledge that certain informationally critical words are more important than other common words. |
Related Work | Postprocessing: in a cross-lingual information retrieval or question answering framework, online query names can be utilized to obtain translation and post-edit MT output (Parton et al., 2009; Ma and McKeown, 2009; Parton and McKeown, 2010; Parton et al., 2012). |
Discussion: a Multilingual FrameNet | Another issue that applies to all automatic (and also manual) approaches of cross-lingual FrameNet extension is the restricted cross-language applicability of frames. |
Discussion: a Multilingual FrameNet | Unlike corpus-based approaches for cross-lingual FrameNet extension, our approach does not provide frame-semantic annotations for the example |
Discussion: a Multilingual FrameNet | Example annotations can be additionally obtained via cross-lingual annotation projection (Pado and Lapata, 2009), and the lexical information in FNWKde can be used to guide this process. |
Introduction | Previous cross-lingual transfer of FrameNet used corpus-based approaches, or resource alignment with multilingual expert-built resources, such as EuroWordNet. |
Introduction | To our knowledge, Wiktionary has not been evaluated as an interlingual index for the cross-lingual extension of lexical-semantic resources. |
Related Work | In this vein, Pado and Lapata (2005) propose a cross-lingual FrameNet extension to German and French; J ohansson and Nugues (2005) and J ohansson and Nugues (2006) do this for Spanish and Swedish, and Basili et al. |
Introduction | It can also be viewed as a simplified version of the Cross-Lingual Lexical Substitution (Mihalcea et al., 2010) and Cross-Lingual Word Sense Disambiguation (Lefever and Hoste, 2010) tasks, as defined in SemEval-2010. |
Typical Resource Requirements for Translation Selection | While LSI is more frequently used in information retrieval, the translation knowledge acquisition task can be recast as a cross-lingual indexing task, following (Dumais et al., 1997). |
Typical Resource Requirements for Translation Selection | Basile and Semeraro (2010) also used Wikipedia articles as a parallel corpus for their participation in the SemEval 2010 Cross-Lingual Lexical Substitution task. |
Typical Resource Requirements for Translation Selection | (2011) also tackled the problem of cross-lingual disambiguation for under-resourced language pairs (English—Persian) using Wikipedia articles, by applying the one sense per collocation and one sense per discourse heuristics on a comparable corpus. |
Conclusion | Monolingual and cross-lingual textual entailment in particular would be interesting applications, because they require finding shared meaning on two text fragments. |
Cross Language Text Categorization | 3.2 Raw cross-lingual text categorization |
Cross Language Text Categorization | 3.4 Cross-lingual text categorization in a latent semantic space adding etymology |
Introduction | words) from the source to the target language, word etymologies are a novel source of cross-lingual knowledge. |
Abstract | We present an information theoretic obj ec-tive for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. |
Related Work | (2012) use cross-lingual word clusters to show transfer of linguistic structure. |
Related Work | Also closely related is the technique of cross-lingual annotation projection. |