String Extension Learning
Heinz, Jeffrey

Article Structure

Abstract

This paper provides a unified, leaming-theoretic analysis of several learnable classes of languages discussed previously in the literature.

Introduction

The problem of generalizing from examples to patterns is an important one in linguistics and computer science.

Preliminaries

This section establishes notation and recalls basic definitions for formal languages, the paradigm of identification in the limit from positive data (Gold, 1967).

Grammars and Languages

Consider some set A.

String Extension Learning

Learning string extension classes is simple.

Subregular examples

This section shows how classes which make up the subregular hierarchies (McNaughton and Papert, 1971) are string extension language classes.

Other examples

This section provides examples of infinite and nonregular language classes that are string extension leamable.

Conclusion and open questions

One contribution of this paper is a unified way of thinking about many formal language classes, all of which have been shown to be identifiable in the limit from positive data by a string extension learner.

Topics

natural language

Appears in 6 sentences as: natural language (5) natural languages (1)
In String Extension Learning
  1. Potential applications include learnable models for aspects of natural language and cognition.
    Page 1, “Abstract”
  2. One notable case is the Strictly Piecewise (SP) languages, which was originally motivated for two reasons: the leamability properties discussed here and its ability to describe long-distance dependencies in natural language phonology (Heinz, 2007; Heinz, to appear).
    Page 1, “Introduction”
  3. Another example is the Strictly Local (SL) languages which are the categorical, symbolic version of n-gram models, which are widely used in natural language processing (Jurafsky and Martin, 2008).
    Page 1, “Introduction”
  4. Heinz (2007,2009a) shows that consonantal harmony patterns in natural language are describable by such SP2 languages and hypothesizes that humans learn them in the way suggested by $3122.
    Page 5, “Subregular examples”
  5. For theoretical linguistics, it appears that the string extension function f = (LR13,P2), which defines a class of languages which obey restrictions on both contiguous subsequences of length 3 and on discontiguous subsequences of length 2, provides a good first approximation to the segmental phonotactic patterns in natural languages (Heinz, 2007).
    Page 8, “Conclusion and open questions”
  6. Finally, since the stochastic counterpart of k:-SL class is the n-gram model, it is plausible that probabilistic string extension language classes can form the basis of new natural language processing techniques.
    Page 8, “Conclusion and open questions”

See all papers in Proc. ACL 2010 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.