Current practice in summary evaluation | Since manual evaluation is still the undisputed gold standard, both at TAC and DUC there was much effort to evaluate manually as much data as possible. |
Current practice in summary evaluation | 2.1 Manual evaluation |
Current practice in summary evaluation | Automatic metrics, because of their relative speed, can be applied more widely than manual evaluation . |
Experimental results | The first question we have to ask is: which of the manual evaluation categories do we want our metric to imitate? |
Experimental results | The Pyramid is, at the same time, a costly manual evaluation method, so an automatic metric that successfully emulates it would be a useful replacement. |
Experimental results | Table 1: System-level Pearson’s correlation between automatic and manual evaluation metrics for TAC 2008 data. |
Introduction | However, manual evaluation of a large number of documents necessary for a relatively unbiased view is often unfeasible, especially in the contexts where repeated evaluations are needed. |
Introduction | A more detailed description of BE and ROUGE is presented in Section 2, which also gives an account of manual evaluation methods employed at TAC 2008. |