Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach
Chiarcos, Christian

Article Structure

Abstract

This paper describes a series of experiments to test the hypothesis that the parallel application of multiple NLP tools and the integration of their results improves the correctness and robustness of the resulting analysis.

Motivation and overview

NLP systems for higher-level operations or complex annotations often integrate redundant modules that provide alternative analyses for the same linguistic phenomenon in order to benefit from their respective strengths and to compensate for their respective weaknesses, e. g., in parsing (Crys-mann et al., 2002), or in machine translation (Carl et al., 2000).

Ontologies and annotations

Various repositories of linguistic annotation terminology have been developed in the last decades, ranging from early texts on annotation standards (Bakker et al., 1993; Leech and Wilson, 1996) over relational data base models (Bickel and Nichols, 2000; Bickel and Nichols, 2002) to more recent formalizations in OWLRDF (or with OWLRDF export), e. g., the General Ontology of Linguistic Description (Farrar and Langendoen, 2003, GOLD), the ISO TC37/SC4 Data Category Registry (Ide and Romary, 2004; Kemps-

Processing linguistic annotations

3.1 Evaluation setup

Evaluation

Six experiments were conducted with the goal to evaluate the prediction of word classes and morphological features on parts of three corpora of German newspaper articles: NEGRA (Skut et al., 1998), TIGER (Brants et al., 2002), and the Potsdam Commentary Corpus (Stede, 2004, PCC).

Summary and Discussion

With the ontology-based approach described in this paper, the performance of annotation tools can be evaluated on a conceptual basis rather than by means of a string comparison with target annotations.

Extensions and Related Research

Natural extensions of the approach described in this paper include:

Topics

morphological analyses

Appears in 5 sentences as: morphological analyses (4) morphological analyzer (1)
In Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach
  1. 2.2 Integrating different morphosyntactic and morphological analyses
    Page 4, “Ontologies and annotations”
  2. (i) Morphisto, a morphological analyzer without contextual disambiguation (Zielinski and Simon, 2008),
    Page 5, “Processing linguistic annotations”
  3. Morphisto, for example, generates alternative morphological analyses , so that the disambiguation algorithm performs a random choice between these.
    Page 7, “Evaluation”
  4. (V) Integration with other ontological knowledge sources in order to improve the recall of morphosyntactic and morphological analyses (e.g., for disambiguating grammatical case).
    Page 9, “Extensions and Related Research”
  5. These observations provide further support for our conclusion that the ontology-based integration of morphosyntactic analyses enhances both the robustness and the level of detail of morphosyntactic and morphological analyses .
    Page 9, “Extensions and Related Research”

See all papers in Proc. ACL 2010 that mention morphological analyses.

See all papers in Proc. ACL that mention morphological analyses.

Back to top.

confidence scores

Appears in 4 sentences as: confidence score (1) confidence scores (4)
In Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach
  1. 1: If a tool supports a description with its analysis, the confidence score is increased by l (or by l /n if the tool proposes n alternative annotations).
    Page 6, “Processing linguistic annotations”
  2. The hasCase descriptions have identical confidence scores , so that the first hasCase description that the algorithm encounters is chosen for the set of resulting descriptions, the other one is ruled out because of their inconsistency.
    Page 6, “Processing linguistic annotations”
  3. (ii) Context-sensitive disambiguation of morphological features (e.g., by combination with a chunker and adjustment of confidence scores for morphological features over all tokens in the current chunk, cf.
    Page 8, “Extensions and Related Research”
  4. A novel feature of our approach as compared to existing applications of these methods is that confidence scores are not attached to plain strings, but to ontological descriptions: Tufis, for example, assigned confidence scores not to tools (as in a weighted majority vote), but rather, assessed the ‘credibility’ of a tool with respect to the predicted tag.
    Page 9, “Extensions and Related Research”

See all papers in Proc. ACL 2010 that mention confidence scores.

See all papers in Proc. ACL that mention confidence scores.

Back to top.