Getting Humans in the Loop | Figure 6: The relative error rate (using round 0 as a baseline) of the best Mechanical Turk user session for each of the four numbers of topics. |
Simulation Experiment | The lower the classification error rate , the better the model has captured the structure of the corpus.4 |
Simulation Experiment | While Null sees no constraints, it serves as an upper baseline for the error rate (lower error being better) but shows the effect of additional inference. |
Simulation Experiment | All Full is a lower baseline for the error rate since it both sees the constraints at the beginning and also runs for the maximum number of total iterations. |
Abstract | The experimental results demonstrate that our model is able to significantly outperform the state-of-the-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upper bounds. |
Experiments | For the combined model, the error rates are significantly reduced in all three data sets. |
Experiments | The average error rate reductions against 100% are 9.57% for the full model and 26.37% for the combined model. |
Experiments | If we compute the average error rate reductions against the human upper bounds (rather than an oracular 100%), the average error rate reduction for the full model is 29% and that for the combined model is 73%. |
Experimental Results | Table 2: Alignment error rate results for the bidirectional model versus the baseline directional models. |
Experimental Results | First, we measure alignment error rate (AER), which compares the pro- |
Experimental Results | The translation model weights were tuned for both the baseline and bidirectional alignments using lattice-based minimum error rate training (Kumar et al., 2009). |
Discussion and Conclusions | First, we can see from the results that several systems appear better when evaluating on a correlation measure like Pearson’s p, while others appear better when analyzing error rate . |
Discussion and Conclusions | Evaluating with a correlative measure yields predictably poor results, but evaluating the error rate indicates that it is comparable to (or better than) the more intelligent BOW metrics. |
Results | However, as the perceptron is designed to minimize error rate , this may not reflect an optimal objective when seeking to detect matches. |