Index of papers in Proc. ACL that mention
  • segmentation model
Srivastava, Shashank and Hovy, Eduard
Abstract
We design a segmentation model to optimally partition a sentence into lineal constituents, which can be used to define distributional contexts that are less noisy, semantically more interpretable, and linguistically dis-ambiguated.
Introduction
While existing work has focused on the classification task of categorizing a phrasal constituent as a MWE or a non-MWE, the general ideas of most of these works are in line with our current framework, and the feature-set for our motif segmentation model is designed to subsume most of these ideas.
Introduction
3.1 Linear segmentation model
Introduction
The segmentation model forms the core of the framework.
segmentation model is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Abstract
This study investigates on building a better Chinese word segmentation model for statistical machine translation.
Abstract
It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model .
Experiments
4.2 Various Segmentation Models
Introduction
The practice in state-of-the-art MT systems is that Chinese sentences are tokenized by a monolingual supervised word segmentation model trained on the hand-annotated treebank data, e.g., Chinese treebank
Introduction
In recent years, a number of works (Xu et al., 2005; Chang et al., 2008; Ma and Way, 2009; Xi et al., 2012) attempted to build segmentation models for SMT based on bilingual unsegmented data, instead of monolingual segmented data.
Introduction
We propose leveraging the bilingual knowledge to form learning constraints that guide a supervised segmentation model toward a better solution for SMT.
Methodology
An intuitive manner is to directly leverage the induced boundary distributions as label constraints to regularize segmentation model learning, based on a constrained learning algorithm.
Related Work
(2008) enhanced a CRFs segmentation model in MT tasks by tuning the word granularity and improving the segmentation consistence.
Related Work
(2008) produced a better segmentation model for SMT by concatenating various corpora regardless of their different specifications.
Related Work
(2011) used the words learned from “chars-to-word” alignments to train a maximum entropy segmentation model .
segmentation model is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Experimental Results
‘able 2: Segmented Model Scores.
Experimental Results
We find that when using a ood segmentation model , segmentation of the lorphologically complex target language im-roves model performance over an unsegmented aseline (the confidence scores come from boot-3rap resampling).
Experimental Results
So, we ran the word-based baseline system, the segmented model (Unsup L—match), and the prediction model (CRF—LM) outputs, along with the reference translation through the supervised morphological analyzer Omorfi (Piri—nen and Listenmaa, 2007).
Models 2.1 Baseline Models
These are derived from the same unsupervised segmentation model used in other experiments.
Models 2.1 Baseline Models
performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline.
Models 2.1 Baseline Models
We therefore trained several different segmentation models , considering factors of granularity, coverage, and source-target symmetry.
Related Work
The goal of this experiment was to control the segmented model’s tendency to overfit by rewarding it for using correct whole-word forms.
segmentation model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark and Christophe, Anne and Dupoux, Emmanuel and Demuth, Katherine
Abstract
Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic “function words” at the beginnings and endings of collocations of (possibly multisyllabic) words.
Introduction
(1996) and Brent (1999) our word segmentation models identify word boundaries from unsegmented sequences of phonemes corresponding to utterances, effectively performing unsupervised learning of a lexicon.
Introduction
a word segmentation model should segment this as ju want tu si 69 buk, which is the IPA representation of “you want to see the book”.
Introduction
Section 2 describes the specific word segmentation models studied in this paper, and the way we extended them to capture certain properties of function words.
Word segmentation with Adaptor Grammars
Perhaps the simplest word segmentation model is the unigram model, where utterances are modeled as sequences of words, and where each word is a sequence of segments (Brent, 1999; Goldwater et al., 2009).
Word segmentation with Adaptor Grammars
The next two subsections review the Adaptor Grammar word segmentation models presented in Johnson (2008) and Johnson and Goldwater (2009): section 2.1 reviews how phonotac-tic syllable-structure constraints can be expressed with Adaptor Grammars, while section 2.2 reviews how phrase-like units called “collocations” capture inter-word dependencies.
Word segmentation with Adaptor Grammars
(2009) point out the detrimental effect that inter-word dependencies can have on word segmentation models that assume that the words of an utterance are independently generated.
segmentation model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark
Conclusion and future work
We showed how adaptor grammars can implement a previously investigated model of unsupervised word segmentation, the unigram word segmentation model .
Word segmentation with adaptor grammars
(2007a) presented an adaptor grammar that defines a unigram model of word segmentation and showed that it performs as well as the unigram DP word segmentation model presented by (Goldwater et al., 2006a).
Word segmentation with adaptor grammars
The adaptor grammar that encodes a unigram word segmentation model shown in Figure 1.
Word segmentation with adaptor grammars
(2007), a unigram word segmentation model tends to undersegment and misanalyse collocations as individual words.
segmentation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Mayfield, Elijah and Penstein Rosé, Carolyn
Background
We also build in parallel a segmentation model to select 3,- from the set {new}, same}.
Background
We build two segmentation models , one trained on contributions of less than four tokens, and another trained on contributions of four or more tokens, to distinguish between characteristics of contentful and non-contentful contributions.
Background
No segmentation model is used and no ILP constraints are enforced.
segmentation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine
Discussion
NO- p are scores for running just the word segmentation model with no /t/-deletion rule on the data that includes /t/-deletion, NO-VAR for running just the word segmentation model on the data with no /t/-deletions.
Experiments 4.1 The data
To give an impression of the impact of /t/-deletion, we also report numbers for running only the segmentation model on the Buckeye data with no deleted /t/s and on the data with deleted /t/s.
Experiments 4.1 The data
Also note that in the GOLD- p condition, our joint Bigram model performs almost as well on data with /t/-deletions as the word segmentation model on data that includes no variation at all.
The computational model
Bayesian word segmentation models try to compactly represent the observed data in terms of a small set of units (word types) and a short analysis (a small number of word tokens).
The computational model
(2009) segmentation models , exact inference is infeasible for our joint model.
segmentation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Character Classification Model
Although natural annotations in web text do not directly support the discriminative training of segmentation models , they do get rid of the implausible candidates for predictions of related characters.
Character Classification Model
We choose the perceptron algorithm (Collins, 2002) to train the classifier for the character classification-based word segmentation model .
Character Classification Model
Figure 2: Shrink of searching space for the character classification-based word segmentation model .
Introduction
Table 1: Feature templates and instances for character classification-based word segmentation model .
segmentation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Experimental SetUp
The probabilistic formulation of this model is close to our monolingual segmentation model , but it uses a greedy search specifically designed for the segmentation task.
Model
Our segmentation model is based on the notion that stable recurring string patterns within words are indicative of morphemes.
Model
We note that these single-language morpheme distributions also serve as monolingual segmentation models , and similar models have been successfully applied to the task of word boundary detection (Goldwater et al., 2006).
segmentation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Methods
M monolingual segmentation model
Methods
13 bilingual segmentation model
Methods
segmentation model M or B.
segmentation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: