Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
Adler, Meni and Goldberg, Yoav and Gabay, David and Elhadad, Michael

Article Structure

Abstract

Morphological disambiguation proceeds in 2 stages: (1) an analyzer provides all possible analyses for a given token and (2) a stochastic disambiguation module picks the most likely analysis in context.

Introduction

The term unknowns denotes tokens in a text that cannot be resolved in a given lexicon.

Previous Work

Most of the work that dealt with unknowns in the last decade focused on unknown tokens (OOV).

Method

Our objective is: given an unknown word, provide a distribution of possible tags that can serve as the analysis of the unknown word.

Evaluation

For testing, we manually tagged the text which is used in the Hebrew Treebank (consisting of about 90K tokens), according to our tagging guideline (?

Conclusion

We have addressed the task of computing the distribution p(t|w) for unknown words for full morphological disambiguation in Hebrew.

Topics

morphological analysis

Appears in 11 sentences as: morphological analyses (2) morphological analysis (8) morphological analyzer (3)
In Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
  1. For the task of full morphological analysis, the lexicon must provide all possible morphological analyses for any given token.
    Page 1, “Introduction”
  2. In this paper, we investigate the characteristics of Hebrew unknowns for full morphological analysis , and propose a new method for handling such unavoidable lack of information.
    Page 1, “Introduction”
  3. In our evaluation, these learned distributions include the correct analysis for unknown words in 85% of the cases, contributing an error reduction of over 30% over a competitive baseline for the overall task of full morphological analysis in Hebrew.
    Page 1, “Introduction”
  4. The task of a morphological analyzer is to produce all possible analyses for a given token.
    Page 1, “Introduction”
  5. In this work, we refer to the morphological analyzer of MILA — the Knowledge Center for Processing Hebrew2 (hereafter KC analyzer).
    Page 2, “Introduction”
  6. As a result of this observation, our strategy is to focus on full morphological analysis for unknown tokens and apply a proper name classifier for unknown analyses and unknown tokens.
    Page 2, “Introduction”
  7. In this paper, we investigate various methods for achieving full morphological analysis distribution for unknown tokens.
    Page 2, “Introduction”
  8. Habash and Rambow (2006) used the root+pattern+features representation of Arabic tokens for morphological analysis and generation of Arabic dialects, which have no lexicon.
    Page 4, “Previous Work”
  9. They report high recall (95%—98%) but low precision (37%—63%) for token types and token instances, against gold-standard morphological analysis .
    Page 4, “Previous Work”
  10. Unlike Nakagawa, our model does not use any segmented text, and, on the other hand, it aims to select full morphological analysis for each token,
    Page 4, “Previous Work”
  11. model application is a set of possible full morphological analyses for the token — in exactly the same format as the morphological analyzer provides.
    Page 5, “Method”

See all papers in Proc. ACL 2008 that mention morphological analysis.

See all papers in Proc. ACL that mention morphological analysis.

Back to top.

maximum entropy

Appears in 6 sentences as: Maximum Entropy (1) maximum entropy (5)
In Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
  1. We introduce a novel algorithm that is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation.
    Page 1, “Abstract”
  2. Letters A maximum entropy model is built for all unknown tokens in order to estimate their tag distribution.
    Page 5, “Method”
  3. For each possible such segmentation, the full feature vector is constructed, and submitted to the Maximum Entropy model.
    Page 5, “Method”
  4. To address this lack of precision, we learn a maximum entropy model on the basis of the following binary features: one feature for each pattern listed in column Formation of Table 3 (40 distinct patterns) and one feature for “no pattern”.
    Page 5, “Method”
  5. Pattern-Letters This maximum entropy model is learned by combining the features of the letters model and the patterns model.
    Page 5, “Method”
  6. The algorithm we have proposed is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

POS tagging

Appears in 6 sentences as: POS tag (2) POS taggers (1) POS tagging (3)
In Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
  1. On the one hand, this tagset is much larger than the largest tagset used in English (from 17 tags in most unsupervised POS tagging experiments, to the 46 tags of the WSJ corpus and the about 150 tags of the LOB corpus).
    Page 2, “Introduction”
  2. On average, each token in the 42M corpus is given 2.7 possible analyses by the analyzer (much higher than the average 1.41 POS tag ambiguity reported in English (Dermatas and Kokkinakis, 1995)).
    Page 2, “Introduction”
  3. At the word level, a segmented word is attached to a POS, where the character model is based on the observed characters and their classification: Begin of word, In the middle of a word, End of word, the character is a word itself S. They apply Baum-Welch training over a segmented corpus, where the segmentation of each word and its character classification is observed, and the POS tagging is ambiguous.
    Page 4, “Previous Work”
  4. (of all words in a given sentence) and the POS tagging (of the known words) is based on a Viterbi search over a lattice composed of all possible word segmentations and the possible classifications of all observed characters.
    Page 4, “Previous Work”
  5. They report a very slight improvement on Hebrew and Arabic supervised POS taggers .
    Page 4, “Previous Work”
  6. Table 5 shows the result of the disambiguation when we only take into account the POS tag of the unknown tokens.
    Page 8, “Evaluation”

See all papers in Proc. ACL 2008 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

segmentations

Appears in 3 sentences as: segmentations (3)
In Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
  1. (of all words in a given sentence) and the POS tagging (of the known words) is based on a Viterbi search over a lattice composed of all possible word segmentations and the possible classifications of all observed characters.
    Page 4, “Previous Work”
  2. We hypothesize a uniform distribution among the possible segmentations and aggregate a distribution of possible tags for the analysis.
    Page 5, “Method”
  3. Our baseline proposes the most frequent tag (proper name) for all possible segmentations of the token, in a uniform distribution.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2008 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.

word segmentation

Appears in 3 sentences as: word segmentation (2) word segmentations (1)
In Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
  1. Nakagawa (2004) combine word-level and character-level information for Chinese and Japanese word segmentation .
    Page 4, “Previous Work”
  2. (of all words in a given sentence) and the POS tagging (of the known words) is based on a Viterbi search over a lattice composed of all possible word segmentations and the possible classifications of all observed characters.
    Page 4, “Previous Work”
  3. Their experimental results show that the method achieves high accuracy over state-of-the-art methods for Chinese and Japanese word segmentation .
    Page 4, “Previous Work”

See all papers in Proc. ACL 2008 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.