Graph-based Local Coherence Modeling
Guinaudeau, Camille and Strube, Michael

Article Structure

Abstract

We propose a computationally efficient graph-based approach for local coherence modeling.

Introduction

Many NLP applications which process or generate texts rely on information about local coherence, i.e.

The Entity Grid Model

Barzilay and Lapata (2005; 2008) introduced the entity grid, a method for local coherence modeling that captures the distribution of discourse entities across sentences in a text.

Method

Our model is based on the insight that the entity grid (Barzilay and Lapata, 2008) corresponds to the incidence matrix of a bipartite graph representing the text (see Newman (2010) for more details on graph representation).

Experiments

We compare our model with the entity grid approach and evaluate the influence of the different weighting schemes used in the projection graphs, either PW or PACC, where weights are potentially decreased by distance information Dist.

Conclusions

In this paper, we proposed an unsupervised and computationally efficient graph-based local coher-

Topics

coreference

Appears in 19 sentences as: coref (8) coreference (16) coreferent (1)
In Graph-based Local Coherence Modeling
  1. Finally, they include a heuristic coreference resolution component by linking mentions which share a
    Page 2, “The Entity Grid Model”
  2. We also propose to use a coreference resolution system and consider coreferent entities to be the same discourse entity.
    Page 4, “Experiments”
  3. As the coreference resolution system is trained on well-formed textual documents and expects a correct sentence ordering, we use in all our experiments only features that do not rely on sentence order (e.g.
    Page 4, “Experiments”
  4. Second, we want to evaluate the influence of automatically performed coreference resolution in a controlled fashion.
    Page 5, “Experiments”
  5. The coreference resolution system used performs well on the CoNLL 2012 data.
    Page 5, “Experiments”
  6. B&L 0.877 0.877 E&C 0.915 0.915 wo coref W coref
    Page 5, “Experiments”
  7. We also evaluated the influence of coreference resolution on the performance of our system.
    Page 5, “Experiments”
  8. Random 0.028 0.071 E&C 0.068 0.167 wo coref W coref
    Page 6, “Experiments”
  9. ing coreference resolution improves the performance of the system when distance information is used alone in the system (Table 3).
    Page 6, “Experiments”
  10. When the coreference resolution system is used, the best accuracy value decreases while the insertion score increases from 0.114 to 0.138 (Table 4).
    Page 6, “Experiments”
  11. Therefore, coreference resolution tends to associate positions that are closer to the original ones.
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

graph-based

Appears in 19 sentences as: graph-based (19)
In Graph-based Local Coherence Modeling
  1. We propose a computationally efficient graph-based approach for local coherence modeling.
    Page 1, “Abstract”
  2. Similar to the application of graph-based methods in other areas of NLP (e.g.
    Page 1, “Introduction”
  3. work on word sense disambiguation by Navigli and Lapata (2010); for an overview over graph-based methods in NLP see Mihalcea and Radev (2011)) we model local coherence by relying only on centrality measures applied to the nodes in the graph.
    Page 1, “Introduction”
  4. We apply our graph-based model to the three tasks handled by Barzilay and Lapata (2008) to show that it provides the same flexibility over disparate tasks as the entity grid model: sentence ordering (Section 4.1), summary coherence ranking (Section 4.2), and readability assessment (Section 4.3).
    Page 1, “Introduction”
  5. In contrast to Barzilay and Lapata’s entity grid that contains information about absent entities, our graph-based representation only contains “positive” information.
    Page 3, “Method”
  6. From this graph-based representation, the local coherence of a text T can be measured by computing the average outdegree of a projection graph P. This centrality measure was chosen for two main reasons.
    Page 4, “Method”
  7. We evaluate the ability of our graph-based model to estimate the local coherence of a textual document with three different experiments.
    Page 4, “Experiments”
  8. 3Our graph-based model obtains for the discrimination task an accuracy of 0.846 and 0.635 on the ACCIDENTS and EARTHQUAKES datasets, respectively, compared to 0.904 and 0.872 as reported by Barzilay and Lapata (2008).
    Page 5, “Experiments”
  9. Table 3: Discrimination, reproduced baselines (B&L: Barzilay and Lapata (2008); E&C Elsner and Charniak (2011)) vs. graph-based
    Page 5, “Experiments”
  10. Table 4: Insertion, reproduced baselines vs. graph-based
    Page 6, “Experiments”
  11. Table 5: Summary Coherence Rating, reported results from Barzilay and Lapata (2008) vs. graph-based
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

coreference resolution

Appears in 15 sentences as: coreference resolution (16)
In Graph-based Local Coherence Modeling
  1. Finally, they include a heuristic coreference resolution component by linking mentions which share a
    Page 2, “The Entity Grid Model”
  2. We also propose to use a coreference resolution system and consider coreferent entities to be the same discourse entity.
    Page 4, “Experiments”
  3. As the coreference resolution system is trained on well-formed textual documents and expects a correct sentence ordering, we use in all our experiments only features that do not rely on sentence order (e.g.
    Page 4, “Experiments”
  4. Second, we want to evaluate the influence of automatically performed coreference resolution in a controlled fashion.
    Page 5, “Experiments”
  5. The coreference resolution system used performs well on the CoNLL 2012 data.
    Page 5, “Experiments”
  6. We also evaluated the influence of coreference resolution on the performance of our system.
    Page 5, “Experiments”
  7. ing coreference resolution improves the performance of the system when distance information is used alone in the system (Table 3).
    Page 6, “Experiments”
  8. When the coreference resolution system is used, the best accuracy value decreases while the insertion score increases from 0.114 to 0.138 (Table 4).
    Page 6, “Experiments”
  9. Therefore, coreference resolution tends to associate positions that are closer to the original ones.
    Page 6, “Experiments”
  10. Finally, Table 5 also shows that using a coreference resolution system for document representation does not improve the performance of our system.
    Page 7, “Experiments”
  11. We believe that, as mentioned by Barzilay and Lapata (2008), this degradation is related to the fact that automatic summarization systems do not use anaphoric expressions which makes the coreference resolution system useless in this case.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.

best results

Appears in 8 sentences as: best result (1) best results (7)
In Graph-based Local Coherence Modeling
  1. These extensions led to the best results reported so far for the sentence ordering task.
    Page 3, “The Entity Grid Model”
  2. Indeed, the difference between our best results and those of Elsner and Charniak are not statistically significant.
    Page 5, “Experiments”
  3. Table 3 finally shows that syntactic information improves the performance of our system (yet not significantly) and gives the best results (PACC).
    Page 5, “Experiments”
  4. The best results , that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).
    Page 6, “Experiments”
  5. When combined with distance information, syntactic information still improves the results (PACC), though not significantly, but does not lead to the best results for this task.
    Page 7, “Experiments”
  6. With our graph-based model, the best results are
    Page 7, “Experiments”
  7. As before, syntactic information leads to the best results , but does not allow the accuracy to be higher than random anymore.
    Page 8, “Experiments”
  8. As previously, syntactic information improves the results and, for this comparison, the best result is obtained when syntactic information alone is accounted for.
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention best results.

See all papers in Proc. ACL that mention best results.

Back to top.

data sparsity

Appears in 6 sentences as: data sparsity (6)
In Graph-based Local Coherence Modeling
  1. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
    Page 1, “Abstract”
  2. However, their approach has some disadvantages which they point out themselves: data sparsity , domain dependence and computational complexity, especially in terms of feature space issues while building their model (Barzilay and Lapata (2008, p.8, p.10, p.30), Elsner and Charniak (2011, p.126, p.127)).
    Page 1, “Introduction”
  3. The graph can easily span the entire text without leading to computational complexity and data sparsity problems.
    Page 1, “Introduction”
  4. From this we conclude that a graph is an alternative to the entity grid model: it is computationally more tractable for modeling local coherence and does not suffer from data sparsity problems (Section 5).
    Page 2, “Introduction”
  5. Second, as it relies only on graph centrality, our model does not suffer from the computational complexity and data sparsity problems mentioned by Barzilay and Lapata (2008).
    Page 9, “Conclusions”
  6. This can be easily done by adding edges in the projection graphs when sentences contain entities related from a discourse point of view while Lin et al.’s approach suffers from complexity and data sparsity problems similar to the entity grid model.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention data sparsity.

See all papers in Proc. ACL that mention data sparsity.

Back to top.

F-measure

Appears in 6 sentences as: F-measure (6)
In Graph-based Local Coherence Modeling
  1. Since the model can give the same score for a permutation and the original document, we also compute F-measure where recall is correct/total and precision equals correct/decisions.
    Page 5, “Experiments”
  2. For evaluation purposes, the accuracy still corresponds to the number of correct ratings divided by the number of comparisons, while the F-measure combines recall and precision measures.
    Page 6, “Experiments”
  3. Moreover, in contrast to the first experiment, when accounting for the number of entities “shared” by two sentences (PW), values of accuracy and F-measure are lower.
    Page 7, “Experiments”
  4. Values presented in the following section correspond to accuracy, where the system is correct if it assigns the higher local coherence score to the most “easy to read” document, and F-measure .
    Page 7, “Experiments”
  5. can also be seen that accuracy and F-measure are lower for comparing these two corpora.
    Page 9, “Experiments”
  6. Concerning the comparison between Britannica Student and Britannica Elementary articles, Table 7 shows that integrating distance information gives slightly different results and tends to decrease the values of accuracy and F-measure .
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

statistically significant

Appears in 5 sentences as: statistically significant (5)
In Graph-based Local Coherence Modeling
  1. Indeed, the difference between our best results and those of Elsner and Charniak are not statistically significant .
    Page 5, “Experiments”
  2. However, this improvement is not statistically significant .
    Page 6, “Experiments”
  3. The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).
    Page 6, “Experiments”
  4. A statistically significant improvement is provided by including syntactic information.
    Page 8, “Experiments”
  5. When articles from Britannica Student are compared to articles extracted from Encyclopedia Britannica, Table 7 shows that the different parameters have the same influence as for comparing between Encyclopedia Britannica and Britannica Elementary: statistically significant improvement with syntactic information, higher values when distance is taken into account, etc.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

CoNLL

Appears in 4 sentences as: CoNLL (4)
In Graph-based Local Coherence Modeling
  1. To do so, we use one of the top performing systems from the CoNLL 2012 shared task (Martschat et al., 2012).
    Page 4, “Experiments”
  2. These two tasks were performed on documents extracted from the English test part of the CoNLL 2012 shared task (Pradhan et al., 2012).
    Page 5, “Experiments”
  3. The coreference resolution system used performs well on the CoNLL 2012 data.
    Page 5, “Experiments”
  4. The system was trained on the English training part of the CoNLL 2012 shared task filtered in the same way as the test part.
    Page 5, “Experiments”

See all papers in Proc. ACL 2013 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

significant improvement

Appears in 4 sentences as: significant improvement (3) significantly improved (1)
In Graph-based Local Coherence Modeling
  1. The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).
    Page 6, “Experiments”
  2. A statistically significant improvement is provided by including syntactic information.
    Page 8, “Experiments”
  3. Finally, when distance is accounted for together with syntactic information, the accuracy is significantly improved (p < 0.01) with regard to the results obtained with syntactic information only.
    Page 8, “Experiments”
  4. When articles from Britannica Student are compared to articles extracted from Encyclopedia Britannica, Table 7 shows that the different parameters have the same influence as for comparing between Encyclopedia Britannica and Britannica Elementary: statistically significant improvement with syntactic information, higher values when distance is taken into account, etc.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.

feature vectors

Appears in 3 sentences as: feature vectors (3)
In Graph-based Local Coherence Modeling
  1. To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
    Page 2, “The Entity Grid Model”
  2. (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).
    Page 2, “The Entity Grid Model”
  3. A fundamental assumption underlying our model is that this bipartite graph contains the entity transition information needed for local coherence computation, rendering feature vectors and learning phase unnecessary.
    Page 3, “Method”

See all papers in Proc. ACL 2013 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

learning algorithms

Appears in 3 sentences as: learning algorithms (3)
In Graph-based Local Coherence Modeling
  1. From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms .
    Page 1, “Introduction”
  2. To make this representation accessible to machine learning algorithms , Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
    Page 2, “The Entity Grid Model”
  3. (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).
    Page 2, “The Entity Grid Model”

See all papers in Proc. ACL 2013 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

machine learning

Appears in 3 sentences as: machine learning (3)
In Graph-based Local Coherence Modeling
  1. From this grid Barzilay and Lapata (2008) derive probabilities of transitions between adjacent sentences which are used as features for machine learning algorithms.
    Page 1, “Introduction”
  2. To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
    Page 2, “The Entity Grid Model”
  3. (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).
    Page 2, “The Entity Grid Model”

See all papers in Proc. ACL 2013 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

shared task

Appears in 3 sentences as: shared task (3)
In Graph-based Local Coherence Modeling
  1. To do so, we use one of the top performing systems from the CoNLL 2012 shared task (Martschat et al., 2012).
    Page 4, “Experiments”
  2. These two tasks were performed on documents extracted from the English test part of the CoNLL 2012 shared task (Pradhan et al., 2012).
    Page 5, “Experiments”
  3. The system was trained on the English training part of the CoNLL 2012 shared task filtered in the same way as the test part.
    Page 5, “Experiments”

See all papers in Proc. ACL 2013 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.