Index of papers in Proc. ACL that mention
  • cross-lingual
Gao, Wei and Blitzer, John and Zhou, Ming and Wong, Kam-Fai
Features and Similarities
Our feature space consists of both these standard monolingual features and cross-lingual similarities among documents.
Features and Similarities
The cross-lingual similarities are valuated using different translation mechanisms, e.g., dictionary-based translation or machine translation, or even without any translation at all.
Features and Similarities
4.2 Cross-lingual Document Similarities
Introduction
Our problem setting differs from cross-lingual web search, where the goal is to return machine-translated results from one language in response to a query from another (Lavrenko et al., 2002).
Learning to Rank Using Bilingual Information
To allow for cross-lingual information, we extend the order of individual documents into that of bilingual document pairs: given two bilingual document pairs, we will write (6(1),C§-1)) > (eg2),c§2)) to indi-
Learning to Rank Using Bilingual Information
The advantages are twofold: (l) we can treat multiple cross-lingual document similarities the same way as the commonly used query-document features in a uniform manner of learning; (2) with the similarities, the relevance estimation on bilingual document pairs can be enhanced, and this in return can improve the ranking of documents.
Learning to Rank Using Bilingual Information
,n. Based on that, we produce a set of bilingual ranking instances 8 = {<I>ij, zij}, where each <I>ij = {xi;yj;sij} is the feature vector of (6,, cj) consisting of three components: xi = f (qe, 61-) is the vector of monolingual relevancy features of 6,, yi = f (qc, cj) is the vector of monolingual relevancy features of 03-, and sij = sim(ei, 03-) is the vector of cross-lingual similarities between ei and 03-, and zij 2 (C(61), C(03)) is the corresponding click counts.
cross-lingual is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
McDonald, Ryan and Nivre, Joakim and Quirmbach-Brundage, Yvonne and Goldberg, Yoav and Das, Dipanjan and Ganchev, Kuzman and Hall, Keith and Petrov, Slav and Zhang, Hao and Täckström, Oscar and Bedini, Claudia and Bertomeu Castelló, Núria and Lee, Jungmee
Abstract
To show the usefulness of such a resource, we present a case study of cross-lingual transfer parsing with more reliable evaluation than has been possible before.
Experiments
One of the motivating factors in creating such a data set was improved cross-lingual transfer evaluation.
Introduction
First, a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components.
Introduction
Second, consistent syntactic representations are desirable in the evaluation of unsupervised (Klein and Manning, 2004) or cross-lingual syntactic parsers (Hwa et al., 2005).
Introduction
In the cross-lingual study of McDonald et al.
Towards A Universal Treebank
The selected sentences were preprocessed using cross-lingual taggers (Das and Petrov, 2011) and parsers (McDonald et al., 2011).
cross-lingual is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Abstract
In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual , is addressed through the means of clustering.
Clustering for Cross Lingual Sentiment Analysis
4.3 Approach 3: Cross-Lingual Clustering (XC)
Clustering for Cross Lingual Sentiment Analysis
(2012) introduced cross-lingual clustering.
Clustering for Cross Lingual Sentiment Analysis
In cross-lingual clustering, the objective function maximizes the joint likelihood of monolingual and cross-lingual factors.
Experimental Setup
Cross-lingual clustering for CLSA
Introduction
Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009).
Introduction
Instead, language gap for performing CLSA is bridged using linked cluster or cross-lingual clusters (explained in section 4) with the help of unlabelled monolingual corpora.
Related Work
In situations where labeled data is not present in a language, approaches based on cross-lingual sentiment analysis are used.
Results
Cross-lingual SA accuracies are presented in Table 3.
cross-lingual is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Kozhevnikov, Mikhail and Titov, Ivan
Abstract
This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline.
Background and Motivation
Cross-lingual annotation projection systems (Pado and Lapata, 2009), for example, propagate information directly via word alignment links.
Background and Motivation
An alternative approach, known as cross-lingual model transfer, or cross-lingual model adaptation, consists of modifying a source-language model to make it directly applicable to a new language.
Background and Motivation
(2012) enriches this representation with cross-lingual word clusters, considerably improving the performance.
Setup
The purpose of the study is not to develop a yet another semantic role labeling system — any existing SRL system can (after some modification) be used in this setup — but to assess the practical applicability of cross-lingual model transfer to this problem, compare it against the alternatives and identify its strong/weak points depending on a particular setup.
Setup
With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2).
Setup
This can be achieved either by means of cross-lingual annotation projection (Yarowsky et al., 2001) or by cross-lingual model transfer (Zeman and Resnik, 2008).
cross-lingual is mentioned in 31 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhigang and Li, Zhixing and Li, Juanzi and Tang, Jie and Z. Pan, Jeff
Abstract
In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem.
Experiments
It obtains the values by two steps: finding their counterparts (if available) in English using Wikipedia cross-lingual links and attribute alignments, and translating them into Chinese.
Introduction
Some translation-based cross-lingual knowledge
Introduction
The recall of new target infoboxes is highly limited by the number of equivalent cross-lingual articles and the number of existing source infoboxes.
Introduction
In general, we address the problem of cross-lingual knowledge extraction by using the imbalance between Wikipedias of different languages.
Our Approach
To generate the training data for the target attribute attrT, we first determine the equivalent cross-lingual attribute attrs.
Our Approach
Therefore, it is convenient to align the cross-lingual attributes using English Wikipedia as bridge.
Preliminaries
In this section, we introduce some basic concepts regarding Wikipedia, formally defining the key problem of cross-lingual knowledge extraction and providing an overview of the WikiCiKE framework.
Preliminaries
We will use “name in TEMPLATE PERSON” to refer to the attribute attrname in the template tp p E R30 N. In this cross-lingual task, we use the source (S) and target (T) languages to denote the languages of auxiliary and target Wikipcdias, rc-spcctivcly.
cross-lingual is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Abstract
In this work we address both problems by incorporating cross-lingual features and knowledge bases from English using cross—lingual links.
Abstract
We show the effectiveness of cross-lingual features and resources on a standard dataset as well as on two new test sets that cover both news and microblogs.
Introduction
To address this problem, we introduce the use of cross-lingual links between a disadvantaged language, Arabic, and a language with good discrim-inative features and large resources, English, to improve Arabic NER.
Introduction
Cross-lingual links are obtained using Wikipedia cross-language links and a large Machine Translation (MT) phrase table that is true cased, where word casing is preserved during training.
Introduction
- Using cross-lingual links to exploit orthographic features in other languages.
Related Work
2.1 Using cross-lingual Features
Related Work
If cross-lingual resources are available, such as parallel data, increased training data, better resources, or superior features can be used to improve the processing (ex.
Related Work
To overcome these two problems, we use cross-lingual features to improve NER using large bilingual resources, and we incorporate confidences to avoid having a binary feature.
cross-lingual is mentioned in 35 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Abstract
Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g.
Abstract
In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data.
Cross-Lingual Mixture Model for Sentiment Classification
In this section we present the cross-lingual mixture model (CLMM) for sentiment classification.
Cross-Lingual Mixture Model for Sentiment Classification
We first formalize the task of cross-lingual sentiment classification.
Introduction
In this paper we propose a cross-lingual mixture model (CLMM) for cross-lingual sentiment classification.
Introduction
Besides, CLMM can improve the accuracy of cross-lingual sentiment classification consistently regardless of whether labeled data in the target language are present or not.
Introduction
This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.
Related Work
In this section, we present a brief review of the related work on monolingual sentiment classification and cross-lingual sentiment classification.
Related Work
2.2 Cross-Lingual Sentiment Classification
Related Work
Cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e. g. Chinese) with labeled data in the source
cross-lingual is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Zhang, Duo and Mei, Qiaozhu and Zhai, ChengXiang
Abstract
One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other.
Abstract
Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary.
Abstract
Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.
Introduction
Although many topic models have been proposed and shown to be useful (see Section 2 for more detailed discussion of related work), most of them share a common deficiency: they are designed to work only for monolingual text data and would not work well for extracting cross-lingual latent topics, i.e.
Introduction
In this paper, we propose a novel topic model, called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) model, which can be used to mine shared latent topics from unaligned text data in different languages.
Introduction
However, the goals of their work are different from ours in that their models mainly focus on mining cross-lingual topics of matching word pairs and discovering the correspondence at the vocabulary level.
Problem Formulation
In general, the problem of cross-lingual topic extraction can be defined as to extract a set of common cross-lingual latent topics covered in text collections in different natural languages.
Problem Formulation
A cross-lingual latent topic will be represented as a multinomial word distribution over the words in all the languages, i.e.
Related Work
(Fung, 1995; Franz et al., 1998; Masuichi et al., 2000; Sadat et al., 2003; Gliozzo and Strapparava, 2006)), but most previous work aims at acquiring word translation knowledge or cross-lingual text categorization from comparable corpora.
cross-lingual is mentioned in 37 sentences in this paper.
Topics mentioned in this paper:
Lo, Chi-kiu and Beloucif, Meriem and Saers, Markus and Wu, Dekai
Abstract
We introduce XMEANT—a new cross-lingual version of the semantic frame based MT evaluation metric MEAN T—which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references.
Abstract
However, to go beyond tuning weights in the loglinear SMT model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the MT training pipeline is needed.
Abstract
We show that cross-lingual XMEANT outperforms monolingual MEANT by (l) replacing the monolingual context vector model in MEANT with simple translation probabilities, and (2) incorporating bracketing ITG constraints.
Introduction
We show that XMEANT, a new cross-lingual version of MEANT (Lo et al., 2012), correlates with human judgment even more closely than MEANT for evaluating MT adequacy via semantic frames, despite discarding the need for expensive human reference translations.
Introduction
Our results suggest that MT translation adequacy is more accurately evaluated via the cross-lingual semantic frame similarities of the input and the MT output which may obviate the need for expensive human reference translations.
Introduction
In order to continue driving MT towards better translation adequacy by deeply integrating semantic frame criteria into the MT training pipeline, it is necessary to have a cross-lingual semantic objective function that assesses the semantic frame similarities of input and output sentences.
Related Work
Evaluating cross-lingual MT quality is similar to the work of MT quality estimation (QE).
Related Work
Figure 3: Cross-lingual XMEANT algorithm.
XMEANT: a cross-lingual MEANT
But whereas MEANT measures lexical similarity using a monolingual context vector model, XMEANT instead substitutes simple cross-lingual lexical translation probabilities.
XMEANT: a cross-lingual MEANT
To aggregate individual lexical translation probabilities into phrasal similarities between cross-lingual semantic role fillers, we compared two natural approaches to generalizing MEANT’s method of comparing semantic parses, as described below.
cross-lingual is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Abstract
The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.
Cross-Language Structural Correspondence Learning
The resulting classifier fST, which will operate in the cross-lingual setting, is defined as follows:
Cross-Language Text Classification
One way to overcome this “feature barrier” is to find a cross-lingual representation for documents written in S and ’2', which enables the transfer of classification knowledge between the two languages.
Cross-Language Text Classification
Intuitively, one can understand such a cross-lingual representation as a concept space that underlies both languages.
Cross-Language Text Classification
In the following, we will use 6 to denote a map that associates the original |V|-dimensional representation of a document d written in S or ’2' with its cross-lingual representation.
Experiments
In this study, we are interested in whether the cross-lingual representation induced by CL-SCL captures the difference between positive and negative reviews; by balancing the reviews we ensure that the imbalance does not affect the learned model.
Experiments
Since SGD is sensitive to feature scaling the projection 6X is post-processed as follows: (1) Each feature of the cross-lingual representation is standardized to zero mean and unit variance, where mean and variance are estimated on D5 U Du.
Experiments
(2) The cross-lingual document representations are scaled by a constant 04 such that IDsl‘l 2x619.
Introduction
As we will see, a small number of pivots can capture a sufficiently large part of the correspondences between S and ’2' in order to (1) construct a cross-lingual representation and (2) learn a classifier fly for the task 7 that operates on this representation.
Introduction
Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.
cross-lingual is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris
Conclusion
Second, cross-lingual model transfer can be improved with more careful cross-lingual feature projection.
Experiments
5.3 Cross-lingual experiments
Experiments
Figure 2: Cross-lingual experiment: ROC curves for classifiers trained on the English data using a combination of all features, and applied to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi.
Experiments
Table 5: Cross-lingual experiment: f -scores for classifiers trained on the English data using a combination of all features, and applied, with optimal thresholds, to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi.
Methodology
To test this hypothesis, we use a cross-lingual model transfer approach: we use bilingual dictionaries to project words from other syntactic constructions found in other languages into English and then apply the English model on the derived conceptual representations.
Methodology
In addition, this coarse semantic categorization is preserved in translation (Schneider et al., 2013), which makes supersense features suitable for cross-lingual approaches such as ours.
Methodology
(2013) reveal an interesting cross-lingual property of distributed word representations: there is a strong similarity between the vector spaces across languages that can be easily captured by linear mapping.
Model and Feature Extraction
In this section we describe a classification model, and provide details on mono- and cross-lingual implementation of features.
Model and Feature Extraction
3.3 Cross-lingual feature projection
Related Work
(2013) propose a cross-lingual detection method that uses only English lexical resources and a dependency parser.
Related Work
Studies showed that cross-lingual evidence allows one to achieve a state-of-the-art performance in the WSD task, yet, most cross-lingual WSD methods employ parallel corpora (Navigli, 2009).
cross-lingual is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Parton, Kristen and McKeown, Kathleen R. and Coyne, Bob and Diab, Mona T. and Grishman, Ralph and Hakkani-Tür, Dilek and Harper, Mary and Ji, Heng and Ma, Wei Yun and Meyers, Adam and Stolbach, Sara and Sun, Ang and Tur, Gokhan and Xu, Wei and Yaman, Sibel
Abstract
Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT).
Abstract
In this paper, we present an error analysis of a new cross-lingual task: the SW task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence.
Abstract
The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding.
Introduction
Cross-lingual applications address this need by presenting information in the speaker’s language even when it originally appeared in some other language, using machine
Introduction
In this paper, we present an evaluation and error analysis of a cross-lingual application that we developed for a government-sponsored evaluation, the 5 W task.
Introduction
In this paper, we address the cross-lingual 5 W task: given a source-language sentence, return the 5W’s translated (comprehensibly) into the target language.
Prior Work
The cross-lingual 5W task is closely related to cross-lingual information retrieval and cross-lingual question answering (Wang and Card 2006; Mitamura et al.
Prior Work
In cross-lingual information extraction (Sudo et al.
cross-lingual is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Mehdad, Yashar and Negri, Matteo and Federico, Marcello
Abstract
Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets.
Conclusion
Our results in different cross-lingual settings prove the feasibility of the approach, with significant state-of-the-art improvements also on RTE-derived data.
Experiments and results
2Recently, a new dataset including “Unknown” pairs has been used in the “Cross-Lingual Textual Entailment for Content Synchronization” task at SemEval—2012 (Negri et al., 2012).
Experiments and results
(3-way) demonstrates the effectiveness of our approach to capture meaning equivalence and information disparity in cross-lingual texts.
Experiments and results
Cross-lingual models also significantly outperform pivoting methods.
Introduction
In this paper we set such problem as an application-oriented, cross-lingual variant of the Textual Entailment (TE) recognition task (Dagan and Glickman, 2004).
Introduction
(a) Experiments with multidirectional cross-lingual textual entailment.
Introduction
So far, cross-lingual
cross-lingual is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Abstract
We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art.
Conclusion
Coupled with very simple composition functions, vectors learned with this method outperform the state of the art on the task of cross-lingual document classification.
Corpora
The Europarl corpus v71 (Koehn, 2005) was used during initial development and testing of our approach, as well as to learn the representations used for the Cross-Lingual Document Classification task described in §5.2.
Experiments
First, we replicate the cross-lingual document classification task of Klementiev et al.
Experiments
We evaluate our models on the cross-lingual document classification (CLDC, henceforth) task first described in Klementiev et al.
Experiments
Cross-lingual compositional representations (ADD, BI and their multilingual extensions), I—Matrix (Klementiev et al., 2012) translated (MT) and glossed (Glossed) word baselines, and the majority class baseline.
Introduction
First, we show that for cross-lingual document classification on the Reuters RCVIRCV2 corpora (Lewis et al., 2004), we outperform the prior state of the art (Klementiev et al., 2012).
Related Work
(2013) train a cross-lingual encoder, where an autoencoder is used to recreate words in two languages in parallel.
cross-lingual is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wan, Xiaojun
Abstract
This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data.
Conclusion and Future Work
In this paper, we propose to use the co-training approach to address the problem of cross-lingual sentiment classification.
Introduction
In this study, we focus on the problem of cross-lingual sentiment classification, which leverages only English training data for supervised sentiment classification of Chinese product reviews, without using any Chinese resources.
Related Work 2.1 Sentiment Classification
In this study, we focus on improving the corpus-based method for cross-lingual sentiment classification of Chinese product reviews by developing novel approaches.
Related Work 2.1 Sentiment Classification
Cross-domain text classification can be considered as a more general task than cross-lingual sentiment classification.
Related Work 2.1 Sentiment Classification
In particular, several previous studies focus on the problem of cross-lingual text classification, which can be considered as a special case of general cross-domain text classification.
The Co-Training Approach
In the context of cross-lingual sentiment classification, each labeled English review or unlabeled Chinese review has two views of features: English features and Chinese features.
cross-lingual is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Abstract
We introduce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions.
Bilingual NER by Agreement
In order to model this uncertainty, we extend the two previously independent CRF models into a larger undirected graphical model, by introducing a cross-lingual edge factor gb(z', j ) for every pair of word positions (2', j) E A.
Bilingual NER by Agreement
Initially, each of the cross-lingual edge factors will attempt to assign a pair of tags that has the highest PMI score, but if the monolingual taggers do not agree, a penalty will start accumulating over this pair, until some other pair that agrees better with the monolingual models takes the top spot.
Bilingual NER by Agreement
Simultaneously, the monolingual models will also be encouraged to agree with the cross-lingual edge factors.
Joint Alignment and NER Decoding
We introduce a cross-lingual edge factor C (i, j) in the undirected graphical model for every pair of word indices (2', j), which predicts a binary vari-
Joint Alignment and NER Decoding
One special note is that after each iteration when we consider updates to the dual constraint for entity tags, we only check tag agreements for cross-lingual edge factors that have an alignment assignment value of 1.
Joint Alignment and NER Decoding
In other words, cross-lingual edges that are not aligned do not affect bilingual NER tagging.
cross-lingual is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan
Abstract
In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over leaming-to-rank with either only dense or only sparse features, and over very competitive baselines that combine state-of-the-art machine translation and retrieval.
Conclusion
Special domains such as patents or Wikipedia offer the possibility to extract cross-lingual relevance data from citation and link graphs.
Conclusion
These data can be used to directly optimizing cross-lingual ranking models.
Introduction
(2010) show that for the domain of Wikipedia, learning a sparse matrix of word associations between the query and document vocabularies from relevance rankings is useful in monolingual and cross-lingual retrieval.
Introduction
(2013) apply the idea of learning a sparse matrix of bilingual phrase associations from relevance rankings to cross-lingual retrieval in the patent domain.
Related Work
Both approaches work in a cross-lingual setting, the former on Wikipedia data, the latter on patents.
Related Work
features, and by evaluating both approaches on cross-lingual retrieval for patents and Wikipedia.
cross-lingual is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Experiments
Cross-lingual information transfer
Introduction
This need can be addressed in part by cross-lingual information access tasks such as entity linking (McNamee et al., 2011; Cassidy et al., 2012), event extraction (Hakkani-Tur et al., 2007), slot filling (Snover et al., 2011) and question answering (Parton et al., 2009; Parton and McKeown, 2010).
Introduction
A key bottleneck of high-quality cross-lingual information access lies in the performance of Machine Translation (MT).
Name-aware MT
Traditional name tagging approaches for single languages cannot address this requirement because they were all built on data and resources which are specific to each language without using any cross-lingual features.
Name-aware MT
We developed a bilingual joint name tagger (Li et al., 2012) based on conditional random fields that incorporates both monolingual and cross-lingual features and conducts joint inference, so that name tagging from two languages can mutually enhance each other and therefore inconsistent results can be corrected simultaneously.
Name-aware MT Evaluation
However, for cross-lingual information processing applications, we should acknowledge that certain informationally critical words are more important than other common words.
Related Work
Postprocessing: in a cross-lingual information retrieval or question answering framework, online query names can be utilized to obtain translation and post-edit MT output (Parton et al., 2009; Ma and McKeown, 2009; Parton and McKeown, 2010; Parton et al., 2012).
cross-lingual is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Abstract
We present a nonparametric Bayesian model that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morpheme patterns, or abstract morphemes.
Conclusions and Future Work
We started out by posing two questions: (i) Can we exploit cross-lingual patterns to improve unsupervised analysis?
Introduction
In this paper we investigate two questions: (i) Can we exploit cross-lingual correspondences to improve unsupervised language
Introduction
Our results indicate that cross-lingual patterns can indeed be exploited successfully for the task of unsupervised morphological segmentation.
Model
The model is fully unsupervised and is driven by a preference for stable and high frequency cross-lingual morpheme patterns.
Model
Our model utilizes a Dirichlet process prior for each language, as well as for the cross-lingual links (abstract morphemes).
Multilingual Morphological Segmentation
In the following section, we describe a model that can model both generic cross-lingual patterns (fy and 19-), as well as cognates between related languages (ktb for Hebrew and Arabic).
cross-lingual is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Shezaf, Daphna and Rappoport, Ari
Abstract
Our algorithm introduces nonaligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods.
Conclusion
At the heart of our method is the nonaligned signatures (NAS) context similarity score, used for removing incorrect translations using cross-lingual co-occurrences.
Conclusion
It would be interesting to further investigate this observation with other sources of lexicons (e.g., obtained from parallel or comparable corpora) and for other tasks, such as cross-lingual word sense disambiguation and information retrieval.
Previous Work
2.3 Cross-lingual Co-occurrences in Lexicon Construction
Previous Work
Rapp (1999) and Fung (1998) discussed semantic similarity estimation using cross-lingual context vector alignment.
Previous Work
Using cross-lingual co-occurrences to improve a lexicon generated using a pivot language was suggested by Tanaka and Iwasaki (1996).
cross-lingual is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hartmann, Silvana and Gurevych, Iryna
Discussion: a Multilingual FrameNet
Another issue that applies to all automatic (and also manual) approaches of cross-lingual FrameNet extension is the restricted cross-language applicability of frames.
Discussion: a Multilingual FrameNet
Unlike corpus-based approaches for cross-lingual FrameNet extension, our approach does not provide frame-semantic annotations for the example
Discussion: a Multilingual FrameNet
Example annotations can be additionally obtained via cross-lingual annotation projection (Pado and Lapata, 2009), and the lexical information in FNWKde can be used to guide this process.
Introduction
Previous cross-lingual transfer of FrameNet used corpus-based approaches, or resource alignment with multilingual expert-built resources, such as EuroWordNet.
Introduction
To our knowledge, Wiktionary has not been evaluated as an interlingual index for the cross-lingual extension of lexical-semantic resources.
Related Work
In this vein, Pado and Lapata (2005) propose a cross-lingual FrameNet extension to German and French; J ohansson and Nugues (2005) and J ohansson and Nugues (2006) do this for Spanish and Swedish, and Basili et al.
cross-lingual is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Dinu, Georgiana and Baroni, Marco
Abstract
We test this in a monolingual scenario (paraphrase generation) as well as in a cross-lingual setting (translation by synthesizing adjective-noun phrase vectors in English and generating the equivalent expressions in Italian).
Evaluation setting
The Italian language vectors for the cross-lingual experiments of Section 6 were trained on 1.6 billion tokens from itWaC.5 A word token is a word-form + POS-tag string.
Noun phrase translation
This section describes preliminary experiments performed in a cross-lingual setting on the task of composing English AN phrases and generating Italian translations.
Noun phrase translation
Creation of cross-lingual vector spaces A common semantic space is required in order to map words and phrases across languages.
Noun phrase translation
Cross-lingual decomposition training Training proceeds as in the monolingual case, this time concatenating the training data sets and estimating a single (de)-composition function for the two languages in the shared semantic space.
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Kim, Seokhwan and Lee, Gary Geunbae
Abstract
To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.
Conclusions
Experimental results show that our graph-based projection helped to improve the performance of the cross-lingual annotation projection of the semantic relations, and our system outperforms the other systems, which incorporate monolingual external resources.
Cross-lingual Annotation Projection for Relation Extraction
Cross-lingual annotation projection intends to learn an extractor ft for good performance without significant effort toward building resources for a resource-poor target language L5.
Cross-lingual Annotation Projection for Relation Extraction
Early studies in cross-lingual annotation projection were accomplished for various natural language processing tasks (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Hwa et al., 2005; Zitouni and Florian, 2008; Pado and Lapata, 2009).
Introduction
To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were generated by a relation extraction system in a resource-rich source language.
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Abstract
We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization.
Data and Tools
However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn.
Experiments
By using IGT Data, not only can we obtain more accurate word alignments, but also extract useful cross-lingual information for the resource-poor language.
Introduction
We extend this learning framework so that it can be used to transfer cross-lingual knowledge between different languages.
Our Approach
For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007).
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
(2010) propose a cross-lingual annotation projection approach which uses parallel corpora to acquire a relation detector on the target language.
Abstract
SL—CR (Supervised Learning with cross-lingual labeled instances): in addition to monolingual labeled instances (SL—MO), the training data for supervised learning contain labeled instances translated from the other language.
Abstract
AL-CR (Active Learning with cross-lingual instances): both the manually labeled instances and their translated ones are added to the respective training data.
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Abstract
We study the feasibility of exploiting cross-lingual context to obtain high-quality translation suggestions that improve over statistical language modelling and word-sense disambiguation baselines.
Experiments & Results
The latter study on cross-lingual WSD finds a positive impact when conducting feature selection per classifier.
Introduction
The cross-lingual context in our research question may at first seem artificial, but its design explicitly aims at applications related to computer-aided language learning (Laghos and Panayiotis, 2005; Levy, 1997) and computer-aided translation (Barrachina et al., 2009).
Introduction
The main research question in this research is how to disambiguate an L1 word or phrase to its L2 translation based on an L2 context, and whether such cross-lingual contextual approaches provide added value compared to baseline models that are not context informed or compared to standard language models.
System
The choice for this algorithm is motivated by the fact that it handles multiple classes with ease, but first and foremost because it has been successfully employed for word sense disambiguation in other studies (Hoste et al., 2002; Decadt et al., 2004), in particular in cross-lingual word sense disambiguation, a task closely resembling our current task (van Gompel and van den Bosch, 2013).
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing
Conclusions and Future Work
3) we will apply the BRAE model in other monolingual and cross-lingual tasks.
Discussions
Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks, such as cross-lingual question answering, since the semantic similarity between phrases in different languages can be calculated accurately.
Discussions
In addition to the cross-lingual applications, we believe the BRAE model can be applied in many
Introduction
Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks (e.g.
Introduction
cross-lingual question answering) and monolingual applications such as textual entailment, question answering and paraphrase detection.
cross-lingual is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lim, Lian Tze and Soon, Lay-Ki and Lim, Tek Yong and Tang, Enya Kong and Ranaivo-Malançon, Bali
Introduction
It can also be viewed as a simplified version of the Cross-Lingual Lexical Substitution (Mihalcea et al., 2010) and Cross-Lingual Word Sense Disambiguation (Lefever and Hoste, 2010) tasks, as defined in SemEval-2010.
Typical Resource Requirements for Translation Selection
While LSI is more frequently used in information retrieval, the translation knowledge acquisition task can be recast as a cross-lingual indexing task, following (Dumais et al., 1997).
Typical Resource Requirements for Translation Selection
Basile and Semeraro (2010) also used Wikipedia articles as a parallel corpus for their participation in the SemEval 2010 Cross-Lingual Lexical Substitution task.
Typical Resource Requirements for Translation Selection
(2011) also tackled the problem of cross-lingual disambiguation for under-resourced language pairs (English—Persian) using Wikipedia articles, by applying the one sense per collocation and one sense per discourse heuristics on a comparable corpus.
cross-lingual is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nastase, Vivi and Strapparava, Carlo
Conclusion
Monolingual and cross-lingual textual entailment in particular would be interesting applications, because they require finding shared meaning on two text fragments.
Cross Language Text Categorization
3.2 Raw cross-lingual text categorization
Cross Language Text Categorization
3.4 Cross-lingual text categorization in a latent semantic space adding etymology
Introduction
words) from the source to the target language, word etymologies are a novel source of cross-lingual knowledge.
cross-lingual is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Abstract
We present an information theoretic obj ec-tive for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages.
Related Work
(2012) use cross-lingual word clusters to show transfer of linguistic structure.
Related Work
Also closely related is the technique of cross-lingual annotation projection.
cross-lingual is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina
Introduction
One of the main challenges of unsupervised multilingual learning is to exploit cross-lingual patterns discovered in data, while still allowing a wide range of language-specific idiosyncrasies.
Introduction
For each pair of coupled bilingual constituents, a pair of part-of-speech sequences are drawn jointly from a cross-lingual distribution.
Related Work
Research in this direction was pioneered by (Wu, 1997), who developed Inversion Transduction Grammars to capture cross-lingual grammar variations such as phrase re-orderings.
cross-lingual is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Conclusions
We believe that this property would be useful in transliteration extraction, cross-lingual information retrieval applications.
Related Work
Denoting the number of cross-lingual mappings that are common in both A and Q as CA0, the number of cross-lingual mappings in A as CA and the number of cross-lingual mappings in Q as Cg, precision Pr is given as CAglCA, recall Be as GAO/CG and F-score as 2P7“ - Rc/(Pr + Re).
Transliteration alignment entropy
We expect a good alignment to have a sharp cross-lingual mapping with low alignment entropy.
cross-lingual is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: