Non-Monotonic Sentence Alignment via Semisupervised Learning
Quan, Xiaojun and Kit, Chunyu and Song, Yan

Article Structure

Abstract

This paper studies the problem of non-monotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques.

Introduction

Bilingual sentence alignment is a fundamental task to undertake for the purpose of facilitating many important natural language processing applications such as statistical machine translation (Brown et al., 1993), bilingual lexicography (Kla-vans et al., 1990), and cross-language information retrieval (Nie et al., 1999).

Methodology 2.1 The Problem

An alignment algorithm accepts as input a bitext consisting of a set of source-language sentences, 8 = {31, 32, .

Topics

sentence pairs

Appears in 7 sentences as: sentence pair (1) sentence pairs (6)
In Non-Monotonic Sentence Alignment via Semisupervised Learning
  1. Its output is then double-checked and corrected by two experts in bilingual studies, resulting in a data set of 1747 1-1 and 70 1-0 or 0-1 sentence pairs .
    Page 5, “Methodology 2.1 The Problem”
  2. | i | i | i | 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Similarity of English sentence pair
    Page 6, “Methodology 2.1 The Problem”
  3. The horizontal axis is the similarity of English sentence pairs and the vertical is the similarity of the corresponding pairs in Chinese.
    Page 6, “Methodology 2.1 The Problem”
  4. Figure 4 confirms this, indicating that sentence pairs with high affinity in one language do have their counterparts with similarly high affinity in the other language.
    Page 6, “Methodology 2.1 The Problem”
  5. Hunalign is configured with the option [-realign], which triggers a three-step procedure: after an initial alignment, Hunalign heuristically enriches its dictionary using word co-occurrences in identified sentence pairs ; then, it reruns the alignment process using the updated
    Page 6, “Methodology 2.1 The Problem”
  6. This step generates a set of strictly-selected sentence pairs for use to train an IBM translation model 1 (Brown et al., 1993).
    Page 8, “Methodology 2.1 The Problem”
  7. For this purpose, the input bitexts are first divided into smaller aligned fragments before applying Champollion to derive finer-grained sentence pairs .
    Page 8, “Methodology 2.1 The Problem”

See all papers in Proc. ACL 2013 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

dynamic programming

Appears in 4 sentences as: dynamic programming (4)
In Non-Monotonic Sentence Alignment via Semisupervised Learning
  1. Consequently the task of sentence alignment becomes handily solvable by means of such basic techniques as dynamic programming .
    Page 1, “Introduction”
  2. Note that it is relatively straightforward to identify the type of many-to-many alignment in monotonic alignment using techniques such as dynamic programming if there is no scrambled pairing or the scrambled pairings are local, limited to a short distance.
    Page 2, “Methodology 2.1 The Problem”
  3. Both use dynamic programming to search for the best alignment.
    Page 8, “Methodology 2.1 The Problem”
  4. (2007), a generative model is proposed, accompanied by two specific alignment strategies, i.e., dynamic programming and divisive clustering.
    Page 8, “Methodology 2.1 The Problem”

See all papers in Proc. ACL 2013 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

semantic similarity

Appears in 3 sentences as: semantic similarity (3)
In Non-Monotonic Sentence Alignment via Semisupervised Learning
  1. When two sentences in 8 or T are not too short, or their content is not divergent in meaning, their semantic similarity can be estimated in terms of common words.
    Page 4, “Methodology 2.1 The Problem”
  2. Although semantic similarity estimation is a straightforward approach to deriving the two affinity matrices, other approaches are also feasible.
    Page 5, “Methodology 2.1 The Problem”
  3. To demonstrate the validity of the monolingual consistency, the semantic similarity defined by is evaluated as follows.
    Page 5, “Methodology 2.1 The Problem”

See all papers in Proc. ACL 2013 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

similarity scores

Appears in 3 sentences as: similarity score (1) similarity scores (2)
In Non-Monotonic Sentence Alignment via Semisupervised Learning
  1. All of these high-affinity pairs have a similarity score higher than 0.72.
    Page 6, “Methodology 2.1 The Problem”
  2. These two sets of similarity scores are then plotted in a scatter plot, as in Figure 4.
    Page 6, “Methodology 2.1 The Problem”
  3. Then, the relation matrix of a bitext is built of similarity scores for the rough translation and the actual translation at sentence level.
    Page 8, “Methodology 2.1 The Problem”

See all papers in Proc. ACL 2013 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.