Index of papers in Proc. ACL 2010 that mention
  • cross-lingual
Prettenhofer, Peter and Stein, Benno
Abstract
The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.
Cross-Language Structural Correspondence Learning
The resulting classifier fST, which will operate in the cross-lingual setting, is defined as follows:
Cross-Language Text Classification
One way to overcome this “feature barrier” is to find a cross-lingual representation for documents written in S and ’2', which enables the transfer of classification knowledge between the two languages.
Cross-Language Text Classification
Intuitively, one can understand such a cross-lingual representation as a concept space that underlies both languages.
Cross-Language Text Classification
In the following, we will use 6 to denote a map that associates the original |V|-dimensional representation of a document d written in S or ’2' with its cross-lingual representation.
Experiments
In this study, we are interested in whether the cross-lingual representation induced by CL-SCL captures the difference between positive and negative reviews; by balancing the reviews we ensure that the imbalance does not affect the learned model.
Experiments
Since SGD is sensitive to feature scaling the projection 6X is post-processed as follows: (1) Each feature of the cross-lingual representation is standardized to zero mean and unit variance, where mean and variance are estimated on D5 U Du.
Experiments
(2) The cross-lingual document representations are scaled by a constant 04 such that IDsl‘l 2x619.
Introduction
As we will see, a small number of pivots can capture a sufficiently large part of the correspondences between S and ’2' in order to (1) construct a cross-lingual representation and (2) learn a classifier fly for the task 7 that operates on this representation.
Introduction
Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.
cross-lingual is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Zhang, Duo and Mei, Qiaozhu and Zhai, ChengXiang
Abstract
One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other.
Abstract
Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary.
Abstract
Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.
Introduction
Although many topic models have been proposed and shown to be useful (see Section 2 for more detailed discussion of related work), most of them share a common deficiency: they are designed to work only for monolingual text data and would not work well for extracting cross-lingual latent topics, i.e.
Introduction
In this paper, we propose a novel topic model, called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) model, which can be used to mine shared latent topics from unaligned text data in different languages.
Introduction
However, the goals of their work are different from ours in that their models mainly focus on mining cross-lingual topics of matching word pairs and discovering the correspondence at the vocabulary level.
Problem Formulation
In general, the problem of cross-lingual topic extraction can be defined as to extract a set of common cross-lingual latent topics covered in text collections in different natural languages.
Problem Formulation
A cross-lingual latent topic will be represented as a multinomial word distribution over the words in all the languages, i.e.
Related Work
(Fung, 1995; Franz et al., 1998; Masuichi et al., 2000; Sadat et al., 2003; Gliozzo and Strapparava, 2006)), but most previous work aims at acquiring word translation knowledge or cross-lingual text categorization from comparable corpora.
cross-lingual is mentioned in 37 sentences in this paper.
Topics mentioned in this paper:
Shezaf, Daphna and Rappoport, Ari
Abstract
Our algorithm introduces nonaligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods.
Conclusion
At the heart of our method is the nonaligned signatures (NAS) context similarity score, used for removing incorrect translations using cross-lingual co-occurrences.
Conclusion
It would be interesting to further investigate this observation with other sources of lexicons (e.g., obtained from parallel or comparable corpora) and for other tasks, such as cross-lingual word sense disambiguation and information retrieval.
Previous Work
2.3 Cross-lingual Co-occurrences in Lexicon Construction
Previous Work
Rapp (1999) and Fung (1998) discussed semantic similarity estimation using cross-lingual context vector alignment.
Previous Work
Using cross-lingual co-occurrences to improve a lexicon generated using a pivot language was suggested by Tanaka and Iwasaki (1996).
cross-lingual is mentioned in 7 sentences in this paper.
Topics mentioned in this paper: