Experiments | Japanese word segmentation, with all supervised segmentations removed in advance. |
Experiments | Semi-supervised results used only 10K sentences (1/5) of supervised segmentations . |
Experiments | segmentations . |
Inference | When we repeat this process, it is expected to mix rapidly because it implicitly considers all possible segmentations of the given string at the same time. |
Inference | Segmentations before the final k characters are marginalized using the following recursive relationship: |
Inference | Figure 4: Forward filtering of a[t] to marginalize out possible segmentations j before 75— k. |
Introduction | In order to extract “words” from text streams, unsupervised word segmentation is an important research area because the criteria for creating supervised training data could be arbitrary, and will be suboptimal for applications that rely on segmentations . |
Introduction | It is particularly difficult to create “correct” training data for speech transcripts, colloquial texts, and classics where segmentations are often ambiguous, let alone is impossible for unknown languages whose properties computational linguists might seek to uncover. |
Experiments | We evaluated both word segmentation (Seg) and joint word segmentation and POS tagging ( Seg & Tag). |
Experiments | For Seg , a token is considered to be a correct one if the word boundary is correctly identified. |
Experiments | For Seg & Tag, both the word boundary and its POS tag have to be correctly identified to be counted as a correct token. |
Sentence Selection: Single Language Pair | where Hx is the space of all possible segmentations for the 00V fragment X, Y)? |
Sentence Selection: Single Language Pair | We let Hx to be all possible segmentations of the fragment x for which the resulting phrase lengths are not greater than the maximum length constraint for phrase extraction in the underlying SMT model. |
Sentence Selection: Single Language Pair | Since we do not know anything about the segmentations a priori, we have put a uniform distribution over such segmentations . |
Experiments | We then used the SEG algorithm to learn the weight distribution model. |
Methods 2.1 Document Level and Profile Based CDC | The chained entities 5 are first objectified into the relation strength matrix R using SEG , the details of which are described in the following section. |
Methods 2.1 Document Level and Profile Based CDC | Algorithm 2 SEG (Freund et al., 1997) Input: Initial weight distribution p1; learning rate 77 > 0; training set {< st, 3/75 >} 1: for t=l to T do 2: Predict using: |
Methods 2.1 Document Level and Profile Based CDC | We adopt the Specialist Exponentiated Gradient ( SEG ) (Freund et al., 1997) algorithm to learn the mixing weights of the specialists’ prediction (Algorithm 2) in an online manner. |
Abstract | That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations ). |
Background 2.1 Terminology | (2003)), where different segmentations lead to the same translation string (Figure l), and in syntax-based systems (e.g., Chiang (2007)), where different derivation trees yield the same string (Figure 2). |
Background 2.1 Terminology | Figure l: Segmentation ambiguity in phrase-based MT: two different segmentations lead to the same translation string. |
Data and Evaluation | use the coauthor’s segmentations as the gold standard. |
Discussion | Also to be investigated is a quantitative study of the effects of high-precision/low-recall vs. low-precision/high-recall segmenters on the construction of discourse trees. |
Results | Additionally, we compared SLSeg and SPADE to the original RST segmentations of the three RST texts taken from RST literature. |