Applying Grammar Induction to Text Mining
Salway, Andrew and Touileb, Samia

Article Structure

Abstract

We report the first steps of a novel investigation into how a grammar induction algorithm can be modified and used to identify salient information structures in a corpus.

Introduction

There is an obvious need for text mining techniques to deal with large volumes of very diverse material, especially since the advent of social media and user-generated content which includes dynamic discussions of wide-ranging and controversial topics.

Background

Harris (1954; 1988) demonstrated how linguistic units and structures can be identified (manually) through a distributional analysis of partially aligned sentential contexts.

Approach

For text mining purposes we do not see the need to induce a complete grammar for the corpus that we are mining.

Implementation

Here we report our first attempt to apply grammar induction to text mining.

Closing Remarks

At this stage in the research any conclusions must be tentative.

Topics

grammar induction

Appears in 5 sentences as: grammar induction (5)
In Applying Grammar Induction to Text Mining
  1. We report the first steps of a novel investigation into how a grammar induction algorithm can be modified and used to identify salient information structures in a corpus.
    Page 1, “Abstract”
  2. Our approach is to induce information structures from an unannotated corpus by modifying and applying the ADIOS grammar induction algorithm (Solan et al., 2005): the modifications serve to focus the algorithm on what is typically written about key-terms (Section 3).
    Page 1, “Introduction”
  3. Thus, we propose to use a grammar induction algorithm to identify the most salient information structures in a corpus and take these as representations of important semantic content.
    Page 2, “Approach”
  4. Here we report our first attempt to apply grammar induction to text mining.
    Page 3, “Implementation”
  5. the use of grammar induction to elucidate semantic content for text mining purposes shows promise.
    Page 5, “Closing Remarks”

See all papers in Proc. ACL 2014 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

social media

Appears in 3 sentences as: social media (3)
In Applying Grammar Induction to Text Mining
  1. There is an obvious need for text mining techniques to deal with large volumes of very diverse material, especially since the advent of social media and user-generated content which includes dynamic discussions of wide-ranging and controversial topics.
    Page 1, “Introduction”
  2. We see one particular area of application in elucidating the semantic content of social media debates about controversial topics, like climate change, both for casual users, and for social scientists studying online discourses.
    Page 1, “Introduction”
  3. To address the large scale and complexity of language use in social media , we modify the way in which text is presented to ADIOS by focusing separately on text around key terms of interest, rather than processing all sentences en masse.
    Page 2, “Approach”

See all papers in Proc. ACL 2014 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.

text classification

Appears in 3 sentences as: text classification (3)
In Applying Grammar Induction to Text Mining
  1. For broad topics it is desirable to perform f1ner- grained text classification and retrieval.
    Page 4, “Implementation”
  2. The alternation in V-groups contained by H-groups may reflect different beliefs and opinions which could be used for text classification and opinion mining.
    Page 4, “Implementation”
  3. The H-groups shown in Table 1 provide richer semantic descriptions of the domain than keywords do, and we noted potential applications for high-level summarization of a whole corpus, the creation of information extraction templates and finer- grained text classification and retrieval.
    Page 5, “Closing Remarks”

See all papers in Proc. ACL 2014 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.