Index of papers in Proc. ACL 2008 that mention
  • segmentations
Snyder, Benjamin and Barzilay, Regina
Abstract
We present a nonparametric Bayesian model that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morpheme patterns, or abstract morphemes.
Experimental SetUp
We obtained gold standard segmentations of the Arabic translation with a handcrafted Arabic morphological analyzer which utilizes manually constructed word lists and compatibility rules and is further trained on a large corpus of hand-annotated Arabic data (Habash and Ram-bow, 2005).
Experimental SetUp
We don’t have gold standard segmentations for the English and Aramaic portions of the data, and thus restrict our evaluation to Hebrew and Arabic.
Introduction
the space of joint segmentations .
Introduction
For each language in the pair, the model favors segmentations which yield high frequency morphemes.
Model
For word 21) in language 5, we consider at once all possible segmentations , and for each segmentation all possible alignments.
Model
We are thus considering at once: all possible segmentations of 212 along with all possible alignments involving morphemes in 21) with some subset of previously sampled language-]: morphemes.3
segmentations is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Goldberg, Yoav and Tsarfaty, Reut
A Generative PCFG Model
Our use of an unweighted lattice reflects our belief that all the segmentations of the given input sentence are a-priori equally likely; the only reason to prefer one segmentation over the another is due to the overall syntactic context which is modeled via the PCFG derivations.
A Generative PCFG Model
(1996) who consider the kind of probabilities a generative parser should get from a PoS tagger, and concludes that these should be P(w|t) “and nothing fancier”.3 In our setting, therefore, the Lattice is not used to induce a probability distribution on a linear context, but rather, it is used as a common-denominator of state-indexation of all segmentations possibilities of a surface form.
Experimental Setup
We use the HSPELL9 (Har’el and Kenigsberg, 2004) wordlist as a lexeme-based lexicon for pruning segmentations involving invalid segments.
Experimental Setup
To evaluate the performance on the segmentation task, we report SEG , the standard harmonic means for segmentation Precision and Recall F1 (as defined in Bar-Haim et a1.
Previous Work on Hebrew Processing
Morphological analyzers for Hebrew that analyze a surface form in isolation have been proposed by Segal (2000), Yona and Wintner (2005), and recently by the knowledge center for processing Hebrew (Itai et al., 2006).
Previous Work on Hebrew Processing
Tsarfaty (2006) used a morphological analyzer ( Segal , 2000), a PoS tagger (Bar-Haim et al., 2005), and a general purpose parser (Schmid, 2000) in an integrated framework in which morphological and syntactic components interact to share information, leading to improved performance on the joint task.
segmentations is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Adler, Meni and Goldberg, Yoav and Gabay, David and Elhadad, Michael
Evaluation
Our baseline proposes the most frequent tag (proper name) for all possible segmentations of the token, in a uniform distribution.
Method
We hypothesize a uniform distribution among the possible segmentations and aggregate a distribution of possible tags for the analysis.
Previous Work
(of all words in a given sentence) and the POS tagging (of the known words) is based on a Viterbi search over a lattice composed of all possible word segmentations and the possible classifications of all observed characters.
segmentations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: