Datasets | For evaluation, we used a standard set of reference segmentations (Galley et al., 2003) of 25 meetings. |
Datasets | Segmentations are binary, i.e., each point of the document is either a segment boundary or not, and on average each meeting has 8 segment boundaries. |
Datasets | To get reference segmentations , we assign each turn a real value from 0 to 1 indicating how much a turn changes the topic. |
Topic Segmentation Experiments | Evaluation Metrics To evaluate segmentations , we use Pk (Beeferman et al., 1999) and WindowDiff (WD) (Pevzner and Hearst, 2002). |
Topic Segmentation Experiments | First, they require both hypothesized and reference segmentations to be binary. |
Topic Segmentation Experiments | Many algorithms (e. g., probabilistic approaches) give non-binary segmentations where candidate boundaries have real-valued scores (e.g., probability or confidence). |
Model | Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights, where “Seg” , “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing. |
Model | Beam Seg Tag Dep Speed |
Model | System Seg Tag Kruengkrai ’09 97.87 93.67 Zhang ’10 97.78 93.67 Sun ’11 98.17 94.02 Wang ’11 98.11 94.18 SegTag 97.66 93.61 SegTagDep 97.73 94.46 SegTag(d) 98.18 94.08 SegTagDep(d) 98.26 94.64 |
Abstract | Since it is hard to achieve the best segmentations with tagset IB, we propose an indirect way to use these constraints in the following section, instead of applying these constraints as straightforwardly as in English POS tagging. |
Abstract | wEsegGEN(c) i=1 where function segGEN maps character sequence c to the set of all possible segmentations of c. For example, W = (cl..cll)...(cn_lk+1...cn) represents a segmentation of 1:: words and the lengths of the first and last word are [1 and lk respectively. |
Abstract | We transform tagged character sequences to word segmentations first, and then evaluate word segmenta-tions by F-measure, as defined in Section 5.2. |