Private Access to Phrase Tables for Statistical Machine Translation
Cancedda, Nicola

Article Structure

Abstract

Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated.

Introduction

It is generally taken for granted that whoever is deploying a Statistical Machine Translation (SMT) system has unrestricted rights to access and use the parallel data required for its training.

Private access to phrase tables

Let Alice2 be the owner of a PT, Bob the owner of the SMT decoder who would like to use the table, and Tina a trusted third-party.

Implementation

For clarity of exposition, in Section 2.2 we presented a method for looking up PT entries involving one interaction for each phrase lookup.

Related work

We are not aware of any previous work directly addressing the problem we solve, i.e.

Experiments

We validated our simple implementation using a phrase table of 38,488,777 lines created with the Moses toolkit3(Koehn et al., 2007) phrase-based SMT system, corresponding to 15,764,069 entries

Conclusions

Some SMT systems never get deployed because of legitimate and incompatible concerns of the prospective users and of the training data owners.

Topics

phrase table

Appears in 10 sentences as: Phrase Table (1) phrase table (6) phrase tables (3)
In Private Access to Phrase Tables for Statistical Machine Translation
  1. In this method, the owner of the TM generates a Phrase Table (PT) from it, and makes it accessible to the user following a special procedure.
    Page 1, “Introduction”
  2. 0 The user acquires all and only the phrase table entries required to perform the decoding of a specific file, thus avoiding complete transfer of the TM to the user;
    Page 1, “Introduction”
  3. While the exposition will focus on phrase tables , there is nothing in the method precluding its use with other resources, provided that they can be represented as lookup tables, a very mild constraint.
    Page 1, “Introduction”
  4. phrase tables where each record contains only a translation in the target language, and no associated statistics.
    Page 2, “Introduction”
  5. This mirrors the standard practice of filtering the phrase table for a given source file to translate before starting the actual decoding.
    Page 3, “Implementation”
  6. private access to a phrase table or other resources for the purpose of performing statistical machine translation.
    Page 3, “Related work”
  7. We validated our simple implementation using a phrase table of 38,488,777 lines created with the Moses toolkit3(Koehn et al., 2007) phrase-based SMT system, corresponding to 15,764,069 entries
    Page 3, “Experiments”
  8. Figure 4 displays the time required to complete retrieval for subsets of increasing size of the 2,000 sentence test set, and for phrase tables uniformly sampled at 25%, 50%, 75% and 100%.
    Page 4, “Experiments”
  9. 217,019 distinct digests are generated for all possible phrase of length up to 6 from the full test set, resulting in the retrieval of 47,072 entries (596,560 lines) from the full phrase table .
    Page 4, “Experiments”
  10. Figure 4: Time required for retrieval as a function of the number of sentences in the query, for different subsets of the original phrase table .
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

SMT system

Appears in 4 sentences as: SMT system (3) SMT systems (2)
In Private Access to Phrase Tables for Statistical Machine Translation
  1. At the same time, the prospective user of the SMT system that could be derived from such TM might be subject to confidentiality constraints on the text stream needing translation, so that sending out text to translate to an SMT system deployed by the owner of the PT is not an option.
    Page 1, “Introduction”
  2. We validated our simple implementation using a phrase table of 38,488,777 lines created with the Moses toolkit3(Koehn et al., 2007) phrase-based SMT system , corresponding to 15,764,069 entries
    Page 3, “Experiments”
  3. Some SMT systems never get deployed because of legitimate and incompatible concerns of the prospective users and of the training data owners.
    Page 4, “Conclusions”
  4. This same method can be easily extended to other resources used by SMT systems , and indeed even beyond SMT itself, whenever similar constraints on data access exist.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

Machine Translation

Appears in 3 sentences as: Machine Translation (2) machine translation (1)
In Private Access to Phrase Tables for Statistical Machine Translation
  1. Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated.
    Page 1, “Abstract”
  2. It is generally taken for granted that whoever is deploying a Statistical Machine Translation (SMT) system has unrestricted rights to access and use the parallel data required for its training.
    Page 1, “Introduction”
  3. private access to a phrase table or other resources for the purpose of performing statistical machine translation .
    Page 3, “Related work”

See all papers in Proc. ACL 2012 that mention Machine Translation.

See all papers in Proc. ACL that mention Machine Translation.

Back to top.

Statistical Machine Translation

Appears in 3 sentences as: Statistical Machine Translation (2) statistical machine translation (1)
In Private Access to Phrase Tables for Statistical Machine Translation
  1. Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated.
    Page 1, “Abstract”
  2. It is generally taken for granted that whoever is deploying a Statistical Machine Translation (SMT) system has unrestricted rights to access and use the parallel data required for its training.
    Page 1, “Introduction”
  3. private access to a phrase table or other resources for the purpose of performing statistical machine translation .
    Page 3, “Related work”

See all papers in Proc. ACL 2012 that mention Statistical Machine Translation.

See all papers in Proc. ACL that mention Statistical Machine Translation.

Back to top.