Index of papers in Proc. ACL 2014 that mention
  • segmentations
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Abstract
Experimental results show that the proposed method is comparable to supervised segmenters on the in-domain NIST OpenMT corpus, and yields a 0.96 BLEU relative increase on NTCIR PatentMT corpus which is out-of-domain.
Complexity Analysis
Character-based segmentation, LDC segmenter and Stanford Chinese segmenters were used as the baseline methods.
Complexity Analysis
The training was started from assuming that there was no previous segmentations on each sentence (pair), and the number of iterations was fixed.
Complexity Analysis
The monolingual bigram model, however, was slower to converge, so we started it from the segmentations of the unigram model, and using 10 iterations.
Introduction
Though supervised-learning approaches which involve training segmenters on manually segmented corpora are widely used (Chang et al., 2008), yet the criteria for manually annotating words are arbitrary, and the available annotated corpora are limited in both quantity and genre variety.
Methods
The set .7: is chosen to represent an unsegmented foreign language sentence (a sequence of characters), because an unsegmented sentence can be seen as the set of all possible segmentations of the sentence denoted F, i.e.
segmentations is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Experiments
0 Self-training Segmenters (STS): two variant models were defined by the approach reported in (Subramanya et al., 2010) that uses the supervised CRFs model’s decodings, incorporating empirical and constraint information, for unlabeled examples as additional labeled data to retrain a CRFs model.
Experiments
0 Virtual Evidences Segmenters (VES): Two variant models based on the approach in (Zeng et al., 2013) were defined.
Experiments
This behaviour illustrates that the conventional optimizations to the monolingual supervised model, e. g., accumulating more supervised data or predefined segmentation properties, are insufficient to help model for achieving better segmentations for SMT.
Introduction
The prior works showed that these models help to find some segmentations tailored for SMT, since the bilingual word occurrence feature can be captured by the character-based alignment (Och and Ney, 2003).
Introduction
Instead of directly merging the characters into concrete segmentations , this work attempts to extract word boundary distributions for character-level trigrams (types) from the “chars-to-word” mappings.
Methodology
It is worth mentioning that prior works presented a straightforward usage for candidate words, treating them as golden segmentations , either dictionary units or labeled resources.
segmentations is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
The model accounts for possible segmentations of a sentence into potential motifs, and prefers recurrent and cohesive motifs through features that capture frequency-based and statistical
Introduction
A slightly modified version of Viterbi could also be used to find segmentations that are constrained to agree with some given motif boundaries, but can segment other parts of the sentence optimally under these constraints.
Introduction
Additionally, a few feature for the segmentations model contained minor orthographic features based on word shape (length and capitalization patterns).
segmentations is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark and Christophe, Anne and Dupoux, Emmanuel and Demuth, Katherine
Word segmentation results
800 sample segmentations of each utterance.
Word segmentation results
The most frequent segmentation in these 800 sample segmentations is the one we score in the evaluations below.
Word segmentation results
Here we evaluate the word segmentations found by the “function word” Adaptor Grammar model described in section 2.3 and compare it to the baseline grammar with collocations and phonotactics from Johnson and Goldwater (2009).
segmentations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Monroe, Will and Green, Spence and Manning, Christopher D.
Abstract
However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect.
Arabic Word Segmentation Model
Some incorrect segmentations produced by the original system could be ruled out with the knowledge of these statistics.
Error Analysis
In 36 of the 100 sampled errors, we conjecture that the presence of the error indicates a shortcoming of the feature set, resulting in segmentations that make sense locally but are not plausible given the full token.
Error Analysis
4.3 Context-sensitive segmentations and multiple word senses
segmentations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
To generate the desegmentation table, we analyze the segmentations from the Arabic side of the parallel training data to collect mappings from morpheme sequences to surface forms.
Methods
where [prefix], [stem] and [suffix] are non-overlapping sets of morphemes, whose members are easily determined using the segmenter’s segment boundary markers.3 The second disjunct of Equation 1 covers words that have no clear stem, such as the Arabic «J lh “for him”, segmented as 1+ “for” +h “him”.
Related Work
For many segmentations , especially unsupervised ones, this amounts to simple concatenation.
Related Work
However, more complex segmentations , such as the Arabic tokenization provided by MADA (Habash et al., 2009), require further orthographic adjustments to reverse normalizations performed during segmentation.
segmentations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: