Experiments and Results | Trained and tested on Penn WSJ TreeBank, the POS tagger could obtain an accuracy of 97% and the NP chunker could produce an F-measure above 94% (Zhou and Su, 2000). |
Experiments and Results | Evaluated for the MUC-6 and MUC-7 Named-Entity task, the NER module (Zhou and Su, 2002) could provide an F-measure of 96.6% (MUC-6) and 94.1%(MUC-7). |
Experiments and Results | The overall F-measure for NWire, NPaper and BNews is 60.4%, 57.9% and 62.9% respectively. |
Abstract | Without using any additional labeled data this new approach obtained 7.6% higher F-Measure in trigger labeling and 6% higher F-Measure in argument labeling over a state-of-the-art IE system which extracts events independently for each sentence. |
Experimental Results and Analysis | We select the thresholds (d with k=1~13) for various confidence metrics by optimizing the F-measure score of each rule on the development set, as shown in Figure 2 and 3 as follows. |
Experimental Results and Analysis | The labeled point on each curve shows the best F-measure that can be obtained on the development set by adjusting the threshold for that rule. |
Experimental Results and Analysis | Table 5 shows the overall Precision (P), Recall (R) and F-Measure (F) scores for the blind test set. |