Generating Code-switched Text for Lexical Learning
Labutov, Igor and Lipson, Hod

Article Structure

Abstract

A vast majority of L1 vocabulary acquisition occurs through incidental learning during reading (Nation, 2001; Schmitt et al., 2001).

Introduction

Today, an adult trying to learn a new language is likely to embrace an age-old and widely accepted practice of learning vocabulary through curated word lists and rote memorization.

Related Work

Our proposed approach to the computational generation of code-switched text, for the purpose of L2 pedagogy, is influenced by a number of fields that studied aspects of this phenomenon from distinct perspectives.

Model

3.1 Overview

Experiments

We carried out experiments on the effectiveness of our approach using the Amazon Mechanical Turk platform.

Results

We perform a one-way ANOVA across the four groups listed above, with the resulting F = 11.38 and p = 9.76 — 7.

Discussion

We observe from our experiments that the optimization-based approach does not in general outperform the HF baseline.

Future Work

In this work we demonstrated a pilot implementation of a model-based, optimization-based approach to content generation for assisting in the reading-based L2 language acquisition.

Topics

turkers

Appears in 12 sentences as: turker (1) Turkers (4) turkers (10)
In Generating Code-switched Text for Lexical Learning
  1. For collecting data about which words are likely to be “predicted” given their content, we developed an Amazon Mechanical Turk task that presented turkers with excerpts of a short story (English translation of “The Man who Repented” by
    Page 6, “Model”
  2. Turkers were required to type in their best guess, and the number of semantically similar guesses were counted by an average number of 6 other turkers .
    Page 7, “Model”
  3. Turkers that judged the semantic similarity of the guesses of other turkers achieved an average Cohen’s kappa agreement of 0.44, indicating fair to poor agreement.
    Page 7, “Model”
  4. Our experimental procedure was as follows: 162 turkers were partitioned into four groups, each corresponding to a treatment condition: OPT (N=34), HF (N=4l), RANDOM (N=43), MAN (N=44).
    Page 7, “Experiments”
  5. Font-size correlates with the score given by judge turkers in evaluating guesses of other turkers that were presented with the same text, but the word replaced with a blank.
    Page 7, “Experiments”
  6. Turkers were solicited to participate in a study that involved “reading a short story with a twist” (title of HIT).
    Page 8, “Experiments”
  7. Time was not controlled for this study, but on average turkers took 27 minutes to complete the reading.
    Page 8, “Experiments”
  8. Upon completing the reading portion of the task, turkers were presented with novel sentences that featured the words observed during reading, where only one of the sentences used the word in a semantically correct way.
    Page 8, “Experiments”
  9. Turkers were asked to select the sentence that “made the most sense”.
    Page 8, “Experiments”
  10. A “recall” metric was computed for each turker , defined as the ratio of correctly selected sentences to the total number of sentences presented.
    Page 8, “Experiments”
  11. The “grand-average recall” across all turkers was then computed and reported here.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention turkers.

See all papers in Proc. ACL that mention turkers.

Back to top.

n-gram

Appears in 4 sentences as: N-gram (2) n-gram (3)
In Generating Code-switched Text for Lexical Learning
  1. trailing word in the n-gram ).
    Page 3, “Model”
  2. Intuitively, an update to the parameter of from 2,24 occurs after the learner observes word w,- in a context (this may be an n-gram , an entire sentence or paragraph containing 10,-, but we will restrict our attention to fixed-length n-grams).
    Page 5, “Model”
  3. For this task, we focus only on the following context features for predicting the “predictability” of words: n-gram probability, vector-space similarity score, coreferring mentions.
    Page 6, “Model”
  4. N-gram probability and vector-space similarity 2 score are all computed within a fixed-size window of the word (trigrams using Microsoft N-gram service).
    Page 6, “Model”

See all papers in Proc. ACL 2014 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

language model

Appears in 3 sentences as: language model (3)
In Generating Code-switched Text for Lexical Learning
  1. Building on their work, (Adel et al., 2012) employ additional features and a recurrent network language model for modeling code-switching in conversational speech.
    Page 2, “Related Work”
  2. Broadly, as the learner progresses from one sentence to the next, exposing herself to more novel words, the updated parameters of the language model in turn guide the selection of new “switch-points” for replacing source words with the target foreign words.
    Page 5, “Model”
  3. Generally, this value may come directly from the surprisal quantity given by a language model , or may incorporate additional features that are found informative in predicting the constraint on the word.
    Page 5, “Model”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

Mechanical Turk

Appears in 3 sentences as: Mechanical Turk (3)
In Generating Code-switched Text for Lexical Learning
  1. Using an artificial language vocabulary, we evaluate a set of algorithms for generating code-switched text automatically by presenting it to Mechanical Turk subjects and measuring recall in a sentence completion task.
    Page 1, “Abstract”
  2. For collecting data about which words are likely to be “predicted” given their content, we developed an Amazon Mechanical Turk task that presented turkers with excerpts of a short story (English translation of “The Man who Repented” by
    Page 6, “Model”
  3. We carried out experiments on the effectiveness of our approach using the Amazon Mechanical Turk platform.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.

semantically similar

Appears in 3 sentences as: semantic similarity (1) semantically similar (2)
In Generating Code-switched Text for Lexical Learning
  1. Turkers were required to type in their best guess, and the number of semantically similar guesses were counted by an average number of 6 other turkers.
    Page 7, “Model”
  2. A ratio of the median of semantically similar guesses to the total number of guesses was then taken as the score representing “predictability” of the word being guessed in the given context.
    Page 7, “Model”
  3. Turkers that judged the semantic similarity of the guesses of other turkers achieved an average Cohen’s kappa agreement of 0.44, indicating fair to poor agreement.
    Page 7, “Model”

See all papers in Proc. ACL 2014 that mention semantically similar.

See all papers in Proc. ACL that mention semantically similar.

Back to top.