Features and Similarities | Our feature space consists of both these standard monolingual features and cross-lingual similarities among documents. |
Features and Similarities | The cross-lingual similarities are valuated using different translation mechanisms, e.g., dictionary-based translation or machine translation, or even without any translation at all. |
Features and Similarities | 4.2 Cross-lingual Document Similarities |
Introduction | Our problem setting differs from cross-lingual web search, where the goal is to return machine-translated results from one language in response to a query from another (Lavrenko et al., 2002). |
Learning to Rank Using Bilingual Information | To allow for cross-lingual information, we extend the order of individual documents into that of bilingual document pairs: given two bilingual document pairs, we will write (6(1),C§-1)) > (eg2),c§2)) to indi- |
Learning to Rank Using Bilingual Information | The advantages are twofold: (l) we can treat multiple cross-lingual document similarities the same way as the commonly used query-document features in a uniform manner of learning; (2) with the similarities, the relevance estimation on bilingual document pairs can be enhanced, and this in return can improve the ranking of documents. |
Learning to Rank Using Bilingual Information | ,n. Based on that, we produce a set of bilingual ranking instances 8 = {<I>ij, zij}, where each <I>ij = {xi;yj;sij} is the feature vector of (6,, cj) consisting of three components: xi = f (qe, 61-) is the vector of monolingual relevancy features of 6,, yi = f (qc, cj) is the vector of monolingual relevancy features of 03-, and sij = sim(ei, 03-) is the vector of cross-lingual similarities between ei and 03-, and zij 2 (C(61), C(03)) is the corresponding click counts. |
Abstract | To show the usefulness of such a resource, we present a case study of cross-lingual transfer parsing with more reliable evaluation than has been possible before. |
Experiments | One of the motivating factors in creating such a data set was improved cross-lingual transfer evaluation. |
Introduction | First, a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components. |
Introduction | Second, consistent syntactic representations are desirable in the evaluation of unsupervised (Klein and Manning, 2004) or cross-lingual syntactic parsers (Hwa et al., 2005). |
Introduction | In the cross-lingual study of McDonald et al. |
Towards A Universal Treebank | The selected sentences were preprocessed using cross-lingual taggers (Das and Petrov, 2011) and parsers (McDonald et al., 2011). |
Abstract | In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual , is addressed through the means of clustering. |
Clustering for Cross Lingual Sentiment Analysis | 4.3 Approach 3: Cross-Lingual Clustering (XC) |
Clustering for Cross Lingual Sentiment Analysis | (2012) introduced cross-lingual clustering. |
Clustering for Cross Lingual Sentiment Analysis | In cross-lingual clustering, the objective function maximizes the joint likelihood of monolingual and cross-lingual factors. |
Experimental Setup | Cross-lingual clustering for CLSA |
Introduction | Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009). |
Introduction | Instead, language gap for performing CLSA is bridged using linked cluster or cross-lingual clusters (explained in section 4) with the help of unlabelled monolingual corpora. |
Related Work | In situations where labeled data is not present in a language, approaches based on cross-lingual sentiment analysis are used. |
Results | Cross-lingual SA accuracies are presented in Table 3. |
Abstract | This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline. |
Background and Motivation | Cross-lingual annotation projection systems (Pado and Lapata, 2009), for example, propagate information directly via word alignment links. |
Background and Motivation | An alternative approach, known as cross-lingual model transfer, or cross-lingual model adaptation, consists of modifying a source-language model to make it directly applicable to a new language. |
Background and Motivation | (2012) enriches this representation with cross-lingual word clusters, considerably improving the performance. |
Setup | The purpose of the study is not to develop a yet another semantic role labeling system — any existing SRL system can (after some modification) be used in this setup — but to assess the practical applicability of cross-lingual model transfer to this problem, compare it against the alternatives and identify its strong/weak points depending on a particular setup. |
Setup | With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2). |
Setup | This can be achieved either by means of cross-lingual annotation projection (Yarowsky et al., 2001) or by cross-lingual model transfer (Zeman and Resnik, 2008). |
Abstract | In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. |
Experiments | It obtains the values by two steps: finding their counterparts (if available) in English using Wikipedia cross-lingual links and attribute alignments, and translating them into Chinese. |
Introduction | Some translation-based cross-lingual knowledge |
Introduction | The recall of new target infoboxes is highly limited by the number of equivalent cross-lingual articles and the number of existing source infoboxes. |
Introduction | In general, we address the problem of cross-lingual knowledge extraction by using the imbalance between Wikipedias of different languages. |
Our Approach | To generate the training data for the target attribute attrT, we first determine the equivalent cross-lingual attribute attrs. |
Our Approach | Therefore, it is convenient to align the cross-lingual attributes using English Wikipedia as bridge. |
Preliminaries | In this section, we introduce some basic concepts regarding Wikipedia, formally defining the key problem of cross-lingual knowledge extraction and providing an overview of the WikiCiKE framework. |
Preliminaries | We will use “name in TEMPLATE PERSON” to refer to the attribute attrname in the template tp p E R30 N. In this cross-lingual task, we use the source (S) and target (T) languages to denote the languages of auxiliary and target Wikipcdias, rc-spcctivcly. |
Abstract | In this work we address both problems by incorporating cross-lingual features and knowledge bases from English using cross—lingual links. |
Abstract | We show the effectiveness of cross-lingual features and resources on a standard dataset as well as on two new test sets that cover both news and microblogs. |
Introduction | To address this problem, we introduce the use of cross-lingual links between a disadvantaged language, Arabic, and a language with good discrim-inative features and large resources, English, to improve Arabic NER. |
Introduction | Cross-lingual links are obtained using Wikipedia cross-language links and a large Machine Translation (MT) phrase table that is true cased, where word casing is preserved during training. |
Introduction | - Using cross-lingual links to exploit orthographic features in other languages. |
Related Work | 2.1 Using cross-lingual Features |
Related Work | If cross-lingual resources are available, such as parallel data, increased training data, better resources, or superior features can be used to improve the processing (ex. |
Related Work | To overcome these two problems, we use cross-lingual features to improve NER using large bilingual resources, and we incorporate confidences to avoid having a binary feature. |
Abstract | Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. |
Abstract | In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. |
Cross-Lingual Mixture Model for Sentiment Classification | In this section we present the cross-lingual mixture model (CLMM) for sentiment classification. |
Cross-Lingual Mixture Model for Sentiment Classification | We first formalize the task of cross-lingual sentiment classification. |
Introduction | In this paper we propose a cross-lingual mixture model (CLMM) for cross-lingual sentiment classification. |
Introduction | Besides, CLMM can improve the accuracy of cross-lingual sentiment classification consistently regardless of whether labeled data in the target language are present or not. |
Introduction | This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language. |
Related Work | In this section, we present a brief review of the related work on monolingual sentiment classification and cross-lingual sentiment classification. |
Related Work | 2.2 Cross-Lingual Sentiment Classification |
Related Work | Cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e. g. Chinese) with labeled data in the source |
Abstract | One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. |
Abstract | Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. |
Abstract | Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data. |
Introduction | Although many topic models have been proposed and shown to be useful (see Section 2 for more detailed discussion of related work), most of them share a common deficiency: they are designed to work only for monolingual text data and would not work well for extracting cross-lingual latent topics, i.e. |
Introduction | In this paper, we propose a novel topic model, called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) model, which can be used to mine shared latent topics from unaligned text data in different languages. |
Introduction | However, the goals of their work are different from ours in that their models mainly focus on mining cross-lingual topics of matching word pairs and discovering the correspondence at the vocabulary level. |
Problem Formulation | In general, the problem of cross-lingual topic extraction can be defined as to extract a set of common cross-lingual latent topics covered in text collections in different natural languages. |
Problem Formulation | A cross-lingual latent topic will be represented as a multinomial word distribution over the words in all the languages, i.e. |
Related Work | (Fung, 1995; Franz et al., 1998; Masuichi et al., 2000; Sadat et al., 2003; Gliozzo and Strapparava, 2006)), but most previous work aims at acquiring word translation knowledge or cross-lingual text categorization from comparable corpora. |
Abstract | We introduce XMEANT—a new cross-lingual version of the semantic frame based MT evaluation metric MEAN T—which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references. |
Abstract | However, to go beyond tuning weights in the loglinear SMT model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the MT training pipeline is needed. |
Abstract | We show that cross-lingual XMEANT outperforms monolingual MEANT by (l) replacing the monolingual context vector model in MEANT with simple translation probabilities, and (2) incorporating bracketing ITG constraints. |
Introduction | We show that XMEANT, a new cross-lingual version of MEANT (Lo et al., 2012), correlates with human judgment even more closely than MEANT for evaluating MT adequacy via semantic frames, despite discarding the need for expensive human reference translations. |
Introduction | Our results suggest that MT translation adequacy is more accurately evaluated via the cross-lingual semantic frame similarities of the input and the MT output which may obviate the need for expensive human reference translations. |
Introduction | In order to continue driving MT towards better translation adequacy by deeply integrating semantic frame criteria into the MT training pipeline, it is necessary to have a cross-lingual semantic objective function that assesses the semantic frame similarities of input and output sentences. |
Related Work | Evaluating cross-lingual MT quality is similar to the work of MT quality estimation (QE). |
Related Work | Figure 3: Cross-lingual XMEANT algorithm. |
XMEANT: a cross-lingual MEANT | But whereas MEANT measures lexical similarity using a monolingual context vector model, XMEANT instead substitutes simple cross-lingual lexical translation probabilities. |
XMEANT: a cross-lingual MEANT | To aggregate individual lexical translation probabilities into phrasal similarities between cross-lingual semantic role fillers, we compared two natural approaches to generalizing MEANT’s method of comparing semantic parses, as described below. |
Abstract | The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. |
Cross-Language Structural Correspondence Learning | The resulting classifier fST, which will operate in the cross-lingual setting, is defined as follows: |
Cross-Language Text Classification | One way to overcome this “feature barrier” is to find a cross-lingual representation for documents written in S and ’2', which enables the transfer of classification knowledge between the two languages. |
Cross-Language Text Classification | Intuitively, one can understand such a cross-lingual representation as a concept space that underlies both languages. |
Cross-Language Text Classification | In the following, we will use 6 to denote a map that associates the original |V|-dimensional representation of a document d written in S or ’2' with its cross-lingual representation. |
Experiments | In this study, we are interested in whether the cross-lingual representation induced by CL-SCL captures the difference between positive and negative reviews; by balancing the reviews we ensure that the imbalance does not affect the learned model. |
Experiments | Since SGD is sensitive to feature scaling the projection 6X is post-processed as follows: (1) Each feature of the cross-lingual representation is standardized to zero mean and unit variance, where mean and variance are estimated on D5 U Du. |
Experiments | (2) The cross-lingual document representations are scaled by a constant 04 such that IDsl‘l 2x619. |
Introduction | As we will see, a small number of pivots can capture a sufficiently large part of the correspondences between S and ’2' in order to (1) construct a cross-lingual representation and (2) learn a classifier fly for the task 7 that operates on this representation. |
Introduction | Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation. |
Conclusion | Second, cross-lingual model transfer can be improved with more careful cross-lingual feature projection. |
Experiments | 5.3 Cross-lingual experiments |
Experiments | Figure 2: Cross-lingual experiment: ROC curves for classifiers trained on the English data using a combination of all features, and applied to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi. |
Experiments | Table 5: Cross-lingual experiment: f -scores for classifiers trained on the English data using a combination of all features, and applied, with optimal thresholds, to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi. |
Methodology | To test this hypothesis, we use a cross-lingual model transfer approach: we use bilingual dictionaries to project words from other syntactic constructions found in other languages into English and then apply the English model on the derived conceptual representations. |
Methodology | In addition, this coarse semantic categorization is preserved in translation (Schneider et al., 2013), which makes supersense features suitable for cross-lingual approaches such as ours. |
Methodology | (2013) reveal an interesting cross-lingual property of distributed word representations: there is a strong similarity between the vector spaces across languages that can be easily captured by linear mapping. |
Model and Feature Extraction | In this section we describe a classification model, and provide details on mono- and cross-lingual implementation of features. |
Model and Feature Extraction | 3.3 Cross-lingual feature projection |
Related Work | (2013) propose a cross-lingual detection method that uses only English lexical resources and a dependency parser. |
Related Work | Studies showed that cross-lingual evidence allows one to achieve a state-of-the-art performance in the WSD task, yet, most cross-lingual WSD methods employ parallel corpora (Navigli, 2009). |
Abstract | Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). |
Abstract | In this paper, we present an error analysis of a new cross-lingual task: the SW task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. |
Abstract | The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. |
Introduction | Cross-lingual applications address this need by presenting information in the speaker’s language even when it originally appeared in some other language, using machine |
Introduction | In this paper, we present an evaluation and error analysis of a cross-lingual application that we developed for a government-sponsored evaluation, the 5 W task. |
Introduction | In this paper, we address the cross-lingual 5 W task: given a source-language sentence, return the 5W’s translated (comprehensibly) into the target language. |
Prior Work | The cross-lingual 5W task is closely related to cross-lingual information retrieval and cross-lingual question answering (Wang and Card 2006; Mitamura et al. |
Prior Work | In cross-lingual information extraction (Sudo et al. |
Abstract | Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets. |
Conclusion | Our results in different cross-lingual settings prove the feasibility of the approach, with significant state-of-the-art improvements also on RTE-derived data. |
Experiments and results | 2Recently, a new dataset including “Unknown” pairs has been used in the “Cross-Lingual Textual Entailment for Content Synchronization” task at SemEval—2012 (Negri et al., 2012). |
Experiments and results | (3-way) demonstrates the effectiveness of our approach to capture meaning equivalence and information disparity in cross-lingual texts. |
Experiments and results | Cross-lingual models also significantly outperform pivoting methods. |
Introduction | In this paper we set such problem as an application-oriented, cross-lingual variant of the Textual Entailment (TE) recognition task (Dagan and Glickman, 2004). |
Introduction | (a) Experiments with multidirectional cross-lingual textual entailment. |
Introduction | So far, cross-lingual |
Abstract | We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. |
Conclusion | Coupled with very simple composition functions, vectors learned with this method outperform the state of the art on the task of cross-lingual document classification. |
Corpora | The Europarl corpus v71 (Koehn, 2005) was used during initial development and testing of our approach, as well as to learn the representations used for the Cross-Lingual Document Classification task described in §5.2. |
Experiments | First, we replicate the cross-lingual document classification task of Klementiev et al. |
Experiments | We evaluate our models on the cross-lingual document classification (CLDC, henceforth) task first described in Klementiev et al. |
Experiments | Cross-lingual compositional representations (ADD, BI and their multilingual extensions), I—Matrix (Klementiev et al., 2012) translated (MT) and glossed (Glossed) word baselines, and the majority class baseline. |
Introduction | First, we show that for cross-lingual document classification on the Reuters RCVIRCV2 corpora (Lewis et al., 2004), we outperform the prior state of the art (Klementiev et al., 2012). |
Related Work | (2013) train a cross-lingual encoder, where an autoencoder is used to recreate words in two languages in parallel. |
Abstract | This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. |
Conclusion and Future Work | In this paper, we propose to use the co-training approach to address the problem of cross-lingual sentiment classification. |
Introduction | In this study, we focus on the problem of cross-lingual sentiment classification, which leverages only English training data for supervised sentiment classification of Chinese product reviews, without using any Chinese resources. |
Related Work 2.1 Sentiment Classification | In this study, we focus on improving the corpus-based method for cross-lingual sentiment classification of Chinese product reviews by developing novel approaches. |
Related Work 2.1 Sentiment Classification | Cross-domain text classification can be considered as a more general task than cross-lingual sentiment classification. |
Related Work 2.1 Sentiment Classification | In particular, several previous studies focus on the problem of cross-lingual text classification, which can be considered as a special case of general cross-domain text classification. |
The Co-Training Approach | In the context of cross-lingual sentiment classification, each labeled English review or unlabeled Chinese review has two views of features: English features and Chinese features. |
Abstract | We introduce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. |
Bilingual NER by Agreement | In order to model this uncertainty, we extend the two previously independent CRF models into a larger undirected graphical model, by introducing a cross-lingual edge factor gb(z', j ) for every pair of word positions (2', j) E A. |
Bilingual NER by Agreement | Initially, each of the cross-lingual edge factors will attempt to assign a pair of tags that has the highest PMI score, but if the monolingual taggers do not agree, a penalty will start accumulating over this pair, until some other pair that agrees better with the monolingual models takes the top spot. |
Bilingual NER by Agreement | Simultaneously, the monolingual models will also be encouraged to agree with the cross-lingual edge factors. |
Joint Alignment and NER Decoding | We introduce a cross-lingual edge factor C (i, j) in the undirected graphical model for every pair of word indices (2', j), which predicts a binary vari- |
Joint Alignment and NER Decoding | One special note is that after each iteration when we consider updates to the dual constraint for entity tags, we only check tag agreements for cross-lingual edge factors that have an alignment assignment value of 1. |
Joint Alignment and NER Decoding | In other words, cross-lingual edges that are not aligned do not affect bilingual NER tagging. |
Abstract | In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over leaming-to-rank with either only dense or only sparse features, and over very competitive baselines that combine state-of-the-art machine translation and retrieval. |
Conclusion | Special domains such as patents or Wikipedia offer the possibility to extract cross-lingual relevance data from citation and link graphs. |
Conclusion | These data can be used to directly optimizing cross-lingual ranking models. |
Introduction | (2010) show that for the domain of Wikipedia, learning a sparse matrix of word associations between the query and document vocabularies from relevance rankings is useful in monolingual and cross-lingual retrieval. |
Introduction | (2013) apply the idea of learning a sparse matrix of bilingual phrase associations from relevance rankings to cross-lingual retrieval in the patent domain. |
Related Work | Both approaches work in a cross-lingual setting, the former on Wikipedia data, the latter on patents. |
Related Work | features, and by evaluating both approaches on cross-lingual retrieval for patents and Wikipedia. |
Experiments | Cross-lingual information transfer |
Introduction | This need can be addressed in part by cross-lingual information access tasks such as entity linking (McNamee et al., 2011; Cassidy et al., 2012), event extraction (Hakkani-Tur et al., 2007), slot filling (Snover et al., 2011) and question answering (Parton et al., 2009; Parton and McKeown, 2010). |
Introduction | A key bottleneck of high-quality cross-lingual information access lies in the performance of Machine Translation (MT). |
Name-aware MT | Traditional name tagging approaches for single languages cannot address this requirement because they were all built on data and resources which are specific to each language without using any cross-lingual features. |
Name-aware MT | We developed a bilingual joint name tagger (Li et al., 2012) based on conditional random fields that incorporates both monolingual and cross-lingual features and conducts joint inference, so that name tagging from two languages can mutually enhance each other and therefore inconsistent results can be corrected simultaneously. |
Name-aware MT Evaluation | However, for cross-lingual information processing applications, we should acknowledge that certain informationally critical words are more important than other common words. |
Related Work | Postprocessing: in a cross-lingual information retrieval or question answering framework, online query names can be utilized to obtain translation and post-edit MT output (Parton et al., 2009; Ma and McKeown, 2009; Parton and McKeown, 2010; Parton et al., 2012). |
Abstract | We present a nonparametric Bayesian model that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morpheme patterns, or abstract morphemes. |
Conclusions and Future Work | We started out by posing two questions: (i) Can we exploit cross-lingual patterns to improve unsupervised analysis? |
Introduction | In this paper we investigate two questions: (i) Can we exploit cross-lingual correspondences to improve unsupervised language |
Introduction | Our results indicate that cross-lingual patterns can indeed be exploited successfully for the task of unsupervised morphological segmentation. |
Model | The model is fully unsupervised and is driven by a preference for stable and high frequency cross-lingual morpheme patterns. |
Model | Our model utilizes a Dirichlet process prior for each language, as well as for the cross-lingual links (abstract morphemes). |
Multilingual Morphological Segmentation | In the following section, we describe a model that can model both generic cross-lingual patterns (fy and 19-), as well as cognates between related languages (ktb for Hebrew and Arabic). |
Abstract | Our algorithm introduces nonaligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods. |
Conclusion | At the heart of our method is the nonaligned signatures (NAS) context similarity score, used for removing incorrect translations using cross-lingual co-occurrences. |
Conclusion | It would be interesting to further investigate this observation with other sources of lexicons (e.g., obtained from parallel or comparable corpora) and for other tasks, such as cross-lingual word sense disambiguation and information retrieval. |
Previous Work | 2.3 Cross-lingual Co-occurrences in Lexicon Construction |
Previous Work | Rapp (1999) and Fung (1998) discussed semantic similarity estimation using cross-lingual context vector alignment. |
Previous Work | Using cross-lingual co-occurrences to improve a lexicon generated using a pivot language was suggested by Tanaka and Iwasaki (1996). |
Discussion: a Multilingual FrameNet | Another issue that applies to all automatic (and also manual) approaches of cross-lingual FrameNet extension is the restricted cross-language applicability of frames. |
Discussion: a Multilingual FrameNet | Unlike corpus-based approaches for cross-lingual FrameNet extension, our approach does not provide frame-semantic annotations for the example |
Discussion: a Multilingual FrameNet | Example annotations can be additionally obtained via cross-lingual annotation projection (Pado and Lapata, 2009), and the lexical information in FNWKde can be used to guide this process. |
Introduction | Previous cross-lingual transfer of FrameNet used corpus-based approaches, or resource alignment with multilingual expert-built resources, such as EuroWordNet. |
Introduction | To our knowledge, Wiktionary has not been evaluated as an interlingual index for the cross-lingual extension of lexical-semantic resources. |
Related Work | In this vein, Pado and Lapata (2005) propose a cross-lingual FrameNet extension to German and French; J ohansson and Nugues (2005) and J ohansson and Nugues (2006) do this for Spanish and Swedish, and Basili et al. |
Abstract | We test this in a monolingual scenario (paraphrase generation) as well as in a cross-lingual setting (translation by synthesizing adjective-noun phrase vectors in English and generating the equivalent expressions in Italian). |
Evaluation setting | The Italian language vectors for the cross-lingual experiments of Section 6 were trained on 1.6 billion tokens from itWaC.5 A word token is a word-form + POS-tag string. |
Noun phrase translation | This section describes preliminary experiments performed in a cross-lingual setting on the task of composing English AN phrases and generating Italian translations. |
Noun phrase translation | Creation of cross-lingual vector spaces A common semantic space is required in order to map words and phrases across languages. |
Noun phrase translation | Cross-lingual decomposition training Training proceeds as in the monolingual case, this time concatenating the training data sets and estimating a single (de)-composition function for the two languages in the shared semantic space. |
Abstract | To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. |
Conclusions | Experimental results show that our graph-based projection helped to improve the performance of the cross-lingual annotation projection of the semantic relations, and our system outperforms the other systems, which incorporate monolingual external resources. |
Cross-lingual Annotation Projection for Relation Extraction | Cross-lingual annotation projection intends to learn an extractor ft for good performance without significant effort toward building resources for a resource-poor target language L5. |
Cross-lingual Annotation Projection for Relation Extraction | Early studies in cross-lingual annotation projection were accomplished for various natural language processing tasks (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Hwa et al., 2005; Zitouni and Florian, 2008; Pado and Lapata, 2009). |
Introduction | To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were generated by a relation extraction system in a resource-rich source language. |
Abstract | We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization. |
Data and Tools | However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn. |
Experiments | By using IGT Data, not only can we obtain more accurate word alignments, but also extract useful cross-lingual information for the resource-poor language. |
Introduction | We extend this learning framework so that it can be used to transfer cross-lingual knowledge between different languages. |
Our Approach | For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007). |
Abstract | (2010) propose a cross-lingual annotation projection approach which uses parallel corpora to acquire a relation detector on the target language. |
Abstract | SL—CR (Supervised Learning with cross-lingual labeled instances): in addition to monolingual labeled instances (SL—MO), the training data for supervised learning contain labeled instances translated from the other language. |
Abstract | AL-CR (Active Learning with cross-lingual instances): both the manually labeled instances and their translated ones are added to the respective training data. |
Abstract | We study the feasibility of exploiting cross-lingual context to obtain high-quality translation suggestions that improve over statistical language modelling and word-sense disambiguation baselines. |
Experiments & Results | The latter study on cross-lingual WSD finds a positive impact when conducting feature selection per classifier. |
Introduction | The cross-lingual context in our research question may at first seem artificial, but its design explicitly aims at applications related to computer-aided language learning (Laghos and Panayiotis, 2005; Levy, 1997) and computer-aided translation (Barrachina et al., 2009). |
Introduction | The main research question in this research is how to disambiguate an L1 word or phrase to its L2 translation based on an L2 context, and whether such cross-lingual contextual approaches provide added value compared to baseline models that are not context informed or compared to standard language models. |
System | The choice for this algorithm is motivated by the fact that it handles multiple classes with ease, but first and foremost because it has been successfully employed for word sense disambiguation in other studies (Hoste et al., 2002; Decadt et al., 2004), in particular in cross-lingual word sense disambiguation, a task closely resembling our current task (van Gompel and van den Bosch, 2013). |
Conclusions and Future Work | 3) we will apply the BRAE model in other monolingual and cross-lingual tasks. |
Discussions | Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks, such as cross-lingual question answering, since the semantic similarity between phrases in different languages can be calculated accurately. |
Discussions | In addition to the cross-lingual applications, we believe the BRAE model can be applied in many |
Introduction | Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks (e.g. |
Introduction | cross-lingual question answering) and monolingual applications such as textual entailment, question answering and paraphrase detection. |
Introduction | It can also be viewed as a simplified version of the Cross-Lingual Lexical Substitution (Mihalcea et al., 2010) and Cross-Lingual Word Sense Disambiguation (Lefever and Hoste, 2010) tasks, as defined in SemEval-2010. |
Typical Resource Requirements for Translation Selection | While LSI is more frequently used in information retrieval, the translation knowledge acquisition task can be recast as a cross-lingual indexing task, following (Dumais et al., 1997). |
Typical Resource Requirements for Translation Selection | Basile and Semeraro (2010) also used Wikipedia articles as a parallel corpus for their participation in the SemEval 2010 Cross-Lingual Lexical Substitution task. |
Typical Resource Requirements for Translation Selection | (2011) also tackled the problem of cross-lingual disambiguation for under-resourced language pairs (English—Persian) using Wikipedia articles, by applying the one sense per collocation and one sense per discourse heuristics on a comparable corpus. |
Conclusion | Monolingual and cross-lingual textual entailment in particular would be interesting applications, because they require finding shared meaning on two text fragments. |
Cross Language Text Categorization | 3.2 Raw cross-lingual text categorization |
Cross Language Text Categorization | 3.4 Cross-lingual text categorization in a latent semantic space adding etymology |
Introduction | words) from the source to the target language, word etymologies are a novel source of cross-lingual knowledge. |
Abstract | We present an information theoretic obj ec-tive for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. |
Related Work | (2012) use cross-lingual word clusters to show transfer of linguistic structure. |
Related Work | Also closely related is the technique of cross-lingual annotation projection. |
Introduction | One of the main challenges of unsupervised multilingual learning is to exploit cross-lingual patterns discovered in data, while still allowing a wide range of language-specific idiosyncrasies. |
Introduction | For each pair of coupled bilingual constituents, a pair of part-of-speech sequences are drawn jointly from a cross-lingual distribution. |
Related Work | Research in this direction was pioneered by (Wu, 1997), who developed Inversion Transduction Grammars to capture cross-lingual grammar variations such as phrase re-orderings. |
Conclusions | We believe that this property would be useful in transliteration extraction, cross-lingual information retrieval applications. |
Related Work | Denoting the number of cross-lingual mappings that are common in both A and Q as CA0, the number of cross-lingual mappings in A as CA and the number of cross-lingual mappings in Q as Cg, precision Pr is given as CAglCA, recall Be as GAO/CG and F-score as 2P7“ - Rc/(Pr + Re). |
Transliteration alignment entropy | We expect a good alignment to have a sharp cross-lingual mapping with low alignment entropy. |