The S-Space Package: An Open Source Package for Word Space Models
Jurgens, David and Stevens, Keith

Article Structure

Abstract

We present the S-Space Package, an open source framework for developing and evaluating word space algorithms.

Introduction

Word similarity is an essential part of understanding natural language.

Word Space Models

Word space models are based on the contextual distribution in which a word occurs.

The S-Space Framework

The S-Space framework is designed to be extensible, simple to use, and scalable.

Benchmarks

Word space benchmarks assess the semantic content of the space through analyzing the geometric properties of the space itself.

Algorithm Analysis

The content of a word space is fundamentally dependent upon the corpus used to construct it.

Future Work and Conclusion

We have described a framework for developing and evaluating word space algorithms.

Topics

vector space

Appears in 9 sentences as: Vector Space (1) vector space (8)
In The S-Space Package: An Open Source Package for Word Space Models
  1. Figure 1 illustrates the shared algorithmic structure of all the approaches, which is divided into four components: corpus processing, context selection, feature extraction and global vector space operations.
    Page 2, “Word Space Models”
  2. Feature extraction determines the dimensions of the vector space by selecting which tokens in the context will count as features.
    Page 2, “Word Space Models”
  3. Global vector space operations are applied to the entire space once the initial word features have been computed.
    Page 2, “Word Space Models”
  4. Document-Based Models LSA (Landauer and Dumais, 1997) ESA (Gabrilovich and Markovitch, 2007) Vector Space Model (Salton et al., 1975)
    Page 2, “Word Space Models”
  5. Document-based models divide a corpus into discrete documents and construct the vector space from word frequencies in the documents.
    Page 2, “The S-Space Framework”
  6. Co-occurrence models build the vector space using the distribution of co-occurring words in a context, which is typically defined as a region around a word or paths rooted in a parse tree.
    Page 2, “The S-Space Framework”
  7. WSI models also use co-occurrence but also attempt to discover distinct word senses while building the vector space .
    Page 3, “The S-Space Framework”
  8. The choice of similarity function for the vector space is the least standardized across approaches.
    Page 3, “The S-Space Framework”
  9. ’LU2 is the kth most-similar word to ml in the vector space .
    Page 4, “Benchmarks”

See all papers in Proc. ACL 2010 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

co-occurrence

Appears in 7 sentences as: Co-occurrence (2) co-occurrence (5)
In The S-Space Package: An Open Source Package for Word Space Models
  1. Later models have expanded the notion of co-occurrence but retain the premise that distributional similarity can be used to extract meaningful relationships between words.
    Page 2, “Word Space Models”
  2. Common approaches use a lexical distance, syntactic relation, or document co-occurrence to define the context.
    Page 2, “Word Space Models”
  3. Co-occurrence Models HAL (Burgess and Lund, 1997) COALS (Rohde et a1., 2009)
    Page 2, “Word Space Models”
  4. We divide the algorithms into four categories based on their structural similarity: document-based, co-occurrence , approximation, and Word Sense Induction (WSI) models.
    Page 2, “The S-Space Framework”
  5. Co-occurrence models build the vector space using the distribution of co-occurring words in a context, which is typically defined as a region around a word or paths rooted in a parse tree.
    Page 2, “The S-Space Framework”
  6. co-occurrence data rather than model it explicitly in order to achieve better scalability for larger data sets.
    Page 3, “The S-Space Framework”
  7. WSI models also use co-occurrence but also attempt to discover distinct word senses while building the vector space.
    Page 3, “The S-Space Framework”

See all papers in Proc. ACL 2010 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

semantic relatedness

Appears in 3 sentences as: semantic relatedness (1) semantic relation (1) semantic relations (1)
In The S-Space Package: An Open Source Package for Word Space Models
  1. Word association tests measure the semantic relatedness of two words by comparing word space similarity with human judgements.
    Page 4, “Benchmarks”
  2. This test is notably more challenging for word space models because human ratings are not tied to a specific semantic relation .
    Page 4, “Benchmarks”
  3. The variation in scoring illustrates that different algorithms are more effective at capturing certain semantic relations .
    Page 5, “Algorithm Analysis”

See all papers in Proc. ACL 2010 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.

SVD

Appears in 3 sentences as: SVD (4)
In The S-Space Package: An Open Source Package for Word Space Models
  1. The S-Space Package supports two common techniques: the Singular Value Decomposition ( SVD ) and randomized projections.
    Page 3, “The S-Space Framework”
  2. All matrix data structures are designed to seamlessly integrate with six SVD implementations for maximum portability, including SVDLIBJ1 , a Java port of SVDLIBCZ, a scalable sparse SVD library.
    Page 3, “The S-Space Framework”
  3. Moreover, algorithms which use operations such as the SVD have a limit to the corpora sizes they
    Page 4, “Algorithm Analysis”

See all papers in Proc. ACL 2010 that mention SVD.

See all papers in Proc. ACL that mention SVD.

Back to top.

Word Sense

Appears in 3 sentences as: Word Sense (2) word senses (1)
In The S-Space Package: An Open Source Package for Word Space Models
  1. Word Sense Induction Models Purandare and Pedersen (Purandare and Pedersen, 2004) HERMIT (Jurgens and Stevens, 2010)
    Page 2, “Word Space Models”
  2. We divide the algorithms into four categories based on their structural similarity: document-based, co-occurrence, approximation, and Word Sense Induction (WSI) models.
    Page 2, “The S-Space Framework”
  3. WSI models also use co-occurrence but also attempt to discover distinct word senses while building the vector space.
    Page 3, “The S-Space Framework”

See all papers in Proc. ACL 2010 that mention Word Sense.

See all papers in Proc. ACL that mention Word Sense.

Back to top.