Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan

Article Structure

Abstract

We present an approach to cross-language retrieval that combines dense knowledge-based features and sparse word translations.

Introduction

Cross-Language Information Retrieval (CLIR) for the domain of web search successfully leverages state-of-the-art Statistical Machine Translation (SMT) to either produce a single most probable translation, or a weighted list of alternatives, that is used as search query to a standard search engine (Chin et al., 2008; Ture et al., 2012).

Related Work

CLIR addresses the problem of translating or projecting a query into the language of the document repository across which retrieval is performed.

Translation and Ranking for CLIR

SMT—based Models.

Model Combination

Combination by Borda Counts.

Data

Patent Prior Art Search (JP-EN).

Experiments

Experiment Settings.

Conclusion

Special domains such as patents or Wikipedia offer the possibility to extract cross-lingual relevance data from citation and link graphs.

Topics

cross-lingual

Appears in 7 sentences as: cross-lingual (7)
In Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
  1. In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over leaming-to-rank with either only dense or only sparse features, and over very competitive baselines that combine state-of-the-art machine translation and retrieval.
    Page 1, “Abstract”
  2. (2010) show that for the domain of Wikipedia, learning a sparse matrix of word associations between the query and document vocabularies from relevance rankings is useful in monolingual and cross-lingual retrieval.
    Page 1, “Introduction”
  3. (2013) apply the idea of learning a sparse matrix of bilingual phrase associations from relevance rankings to cross-lingual retrieval in the patent domain.
    Page 1, “Introduction”
  4. Both approaches work in a cross-lingual setting, the former on Wikipedia data, the latter on patents.
    Page 2, “Related Work”
  5. features, and by evaluating both approaches on cross-lingual retrieval for patents and Wikipedia.
    Page 2, “Related Work”
  6. Special domains such as patents or Wikipedia offer the possibility to extract cross-lingual relevance data from citation and link graphs.
    Page 5, “Conclusion”
  7. These data can be used to directly optimizing cross-lingual ranking models.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

SMT system

Appears in 4 sentences as: SMT system (2) SMT systems (2)
In Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
  1. This approach is advantageous if large amounts of in-domain sentence-parallel data are available to train SMT systems , but relevance rankings to train retrieval models are not.
    Page 1, “Introduction”
  2. In a direct translation approach (DT), a state-of-the-art SMT system is used to produce a single best translation that is used as search query in the target language.
    Page 2, “Related Work”
  3. For example, Google’s CLIR approach combines their state-of-the-art SMT system with their proprietary search engine (Chin et al., 2008).
    Page 2, “Related Work”
  4. order to simulate pass-through behavior of out-of-vocabulary terms in SMT systems , additional features accounting for source and target term identity were added to DK and BM models.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

best result

Appears in 3 sentences as: best result (2) best results (1)
In Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
  1. The best result is achieved by combining DT and PSQ with DK and VW.
    Page 5, “Experiments”
  2. Borda voting gives the best result under MAP which is probably due to the adjustment of the interpolation parameter for MAP on the development set.
    Page 5, “Experiments”
  3. Under NDCG and PRES, LinLearn achieves the best results , showing the advantage of automatically learning combination weights that leads to stable results across various metrics.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention best result.

See all papers in Proc. ACL that mention best result.

Back to top.

parallel sentences

Appears in 3 sentences as: parallel sentences (4)
In Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
  1. (2013), consisting of 1.8M parallel sentences from the NTCIR-7 J PEN PatentMT subtask (Fujii et al., 2008) and 2k parallel sentences for parameter development from the NTCIR-8 test collection.
    Page 4, “Data”
  2. For Wikipedia, we trained a DE-EN system on 4.1M parallel sentences from Europarl, Common Crawl, and News-Commentary.
    Page 4, “Data”
  3. Parameter tuning was done on 3k parallel sentences from the WMT’ 11 test set.
    Page 4, “Data”

See all papers in Proc. ACL 2014 that mention parallel sentences.

See all papers in Proc. ACL that mention parallel sentences.

Back to top.

vector representation

Appears in 3 sentences as: vector representation (3)
In Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval
  1. Optimization for these additional models including domain knowledge features was done by overloading the vector representation of queries q and documents (1 in the VW linear learner: Instead of sparse word-based features, q and d are represented by real-valued vectors of dense domain-knowledge features.
    Page 3, “Translation and Ranking for CLIR”
  2. This means that the vector representation of queries q and documents (1 in the VW linear learner is overloaded once more: In addition to dense domain-knowledge features, we incorporate arbitrary ranking models as dense features whose value is the score of the ranking model.
    Page 3, “Model Combination”
  3. LinLearn denotes model combination by overloading the vector representation of queries q and documents (1 in the VW linear learner by incorporating arbitrary ranking models as dense features.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention vector representation.

See all papers in Proc. ACL that mention vector representation.

Back to top.