Abstract | To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters. |
Experiments | We treat the F1 score |
Experiments | Table 1 shows the F1 score of NER6 when trained on these monolingual German word clusters. |
Experiments | For Turkish the F1 score improves by 1.0 point over when there are no distributional clusters which clearly shows that the word alignment information improves the clustering quality. |
Experimental Results | We compute the precision, recall and F1 scores for each EC on the test set, and collect their counts in the reference and system output. |
Experimental Results | The F1 scores for majority of the ECs are above 70%, except for “*”, which is relatively rare in the data. |
Experimental Results | For the two categories that are interesting to MT, *pro* and *PRO*, the predictor achieves 74.3% and 81.5% in F1 scores , respectively. |
Evaluation | For the final evaluation, we optimized the number of clusters based on F1 score on calibration and validation sets (cf. |
Results | We omit the F1 score because its use for precision and recall estimates from different samples is unclear. |
Results | Note that for these methods, precision and recall can be traded off against each other by varying the number of clusters; we chose the number of clusters by optimizing the F1 score on the calibration and validaton sets. |