Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
Cirik, Volkan

Article Structure

Abstract

We study substitute vectors to solve the part-of-speech ambiguity problem in an unsupervised setting.

Introduction

Learning syntactic categories of words (i.e.

Algorithm

We claim that if we categorize contexts a word type occurs in, we can address ambiguity by separating its occurrences before POS induction.

Experiments

In this section, the setup of each experiment will be presented.

Conclusion

Table 2 shows that the baseline experiment is slightly better than our experiments.

Topics

POS tags

Appears in 15 sentences as: POS tag (2) POS tagger (1) POS tagging (2) POS tags (10)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. part-of-speech or POS tagging ) is an important preprocessing step for many natural language processing applications because grammatical rules are not functions of individual words, instead, they are functions of word categories.
    Page 1, “Introduction”
  2. Unlike supervised POS tagging systems, POS induction systems make use of unsupervised methods.
    Page 1, “Introduction”
  3. Type based methods suffer from POS ambiguity because one POS tag is assigned to each word type.
    Page 1, “Introduction”
  4. However, occurrences of many words may have different POS tags .
    Page 1, “Introduction”
  5. They illustrate a situation where two occurrences of the “offers” have different POS tags .
    Page 1, “Introduction”
  6. In this study, we try to extend the state-of-the-art unsupervised POS tagger (Yatbaz et al., 2012) by solving the ambiguity problem it suffers because it has a type based approach.
    Page 1, “Introduction”
  7. In other words, if we categorize the contexts of a word type we can determine different POS tags of the word.
    Page 1, “Introduction”
  8. We induce number of POS tags of a word type at this step.
    Page 2, “Algorithm”
  9. Furthermore, they will have the same POS tags .
    Page 3, “Algorithm”
  10. As a result, this method inaccurately induces POS tags for the occurrences of word types with high gold tag perplexity.
    Page 4, “Experiments”
  11. In other words, we assume that the number of different POS tags of each word type is equal to 2.
    Page 4, “Experiments”

See all papers in Proc. ACL 2013 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

part-of-speech

Appears in 9 sentences as: Part-of-speech (1) part-of-speech (9)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. We study substitute vectors to solve the part-of-speech ambiguity problem in an unsupervised setting.
    Page 1, “Abstract”
  2. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications.
    Page 1, “Abstract”
  3. Because many words in natural languages have more than one part-of-speech tag, resolving part-of-speech ambiguity is an important task.
    Page 1, “Abstract”
  4. We claim that part-of-speech ambiguity can be solved using substitute vectors.
    Page 1, “Abstract”
  5. This study is built on previous work which has proven that word substitutes are very fruitful for part-of-speech induction.
    Page 1, “Abstract”
  6. part-of-speech or POS tagging) is an important preprocessing step for many natural language processing applications because grammatical rules are not functions of individual words, instead, they are functions of word categories.
    Page 1, “Introduction”
  7. In addition, we suggest that the occurrences with different part-of-speech categories of a word should be seen in different contexts.
    Page 1, “Introduction”
  8. Previous work (Yatbaz et al., 2012) demonstrates that clustering substitute vectors of all word types alone has limited success in predicting part-of-speech tag of a word.
    Page 2, “Algorithm”
  9. The output of clustering induces part-of-speech categories of words tokens.
    Page 4, “Algorithm”

See all papers in Proc. ACL 2013 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (3) treebank (1)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. For instance,the gold tag perplexity of word “offers” in the Penn Treebank Wall Street Journal corpus we worked on equals to 1.966.
    Page 2, “Introduction”
  2. Lastly, in step (i) of Figure 1, we run k-means clustering method on the S-CODE sphere and split word-substitute word pairs into 45 clusters because the treebank we worked on uses 45 part-of—speech tags.
    Page 4, “Algorithm”
  3. The experiments are conducted on Penn Treebank Wall Street Journal corpus.
    Page 4, “Experiments”
  4. Because we are trying to improve (Yatbaz et al., 2012), we select the experiment on Penn Treebank Wall Street Journal corpus in that work as our baseline and replicate it.
    Page 4, “Experiments”

See all papers in Proc. ACL 2013 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

context information

Appears in 3 sentences as: context information (3)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. of a target word, we separate occurrences of the word into different groups depending on the context information represented by substitute vectors.
    Page 2, “Introduction”
  2. To make use of both word identity and context information of a given type, we use S-CODE co-occurrence modeling (Maron et al., 2010) as (Yatbaz et al., 2012) does.
    Page 2, “Algorithm”
  3. In that experiment, POS induction is done by using word identities and context information represented by substitute words.
    Page 4, “Experiments”

See all papers in Proc. ACL 2013 that mention context information.

See all papers in Proc. ACL that mention context information.

Back to top.

natural language

Appears in 3 sentences as: natural language (2) natural languages (1)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications.
    Page 1, “Abstract”
  2. Because many words in natural languages have more than one part-of-speech tag, resolving part-of-speech ambiguity is an important task.
    Page 1, “Abstract”
  3. part-of-speech or POS tagging) is an important preprocessing step for many natural language processing applications because grammatical rules are not functions of individual words, instead, they are functions of word categories.
    Page 1, “Introduction”

See all papers in Proc. ACL 2013 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

part-of-speech tag

Appears in 3 sentences as: part-of-speech tag (2) Part-of-speech tagging (1)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications.
    Page 1, “Abstract”
  2. Because many words in natural languages have more than one part-of-speech tag , resolving part-of-speech ambiguity is an important task.
    Page 1, “Abstract”
  3. Previous work (Yatbaz et al., 2012) demonstrates that clustering substitute vectors of all word types alone has limited success in predicting part-of-speech tag of a word.
    Page 2, “Algorithm”

See all papers in Proc. ACL 2013 that mention part-of-speech tag.

See all papers in Proc. ACL that mention part-of-speech tag.

Back to top.

Penn Treebank

Appears in 3 sentences as: Penn Treebank (3)
In Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
  1. For instance,the gold tag perplexity of word “offers” in the Penn Treebank Wall Street Journal corpus we worked on equals to 1.966.
    Page 2, “Introduction”
  2. The experiments are conducted on Penn Treebank Wall Street Journal corpus.
    Page 4, “Experiments”
  3. Because we are trying to improve (Yatbaz et al., 2012), we select the experiment on Penn Treebank Wall Street Journal corpus in that work as our baseline and replicate it.
    Page 4, “Experiments”

See all papers in Proc. ACL 2013 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.