Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
Costa, Francisco and Branco, António

Article Structure

Abstract

We describe the semiautomatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet.

Introduction

Temporal information processing is a topic of natural language processing boosted by recent evaluation campaigns like TERN2()()4,1 TempEval-l (Verhagen et al., 2007) and the forthcoming TempEval-22 (Pustejovsky and Verhagen, 2009).

Brief Description of the Annotations

Figure 1 contains an example of a document from the TempEval-l corpus, which is similar to the TimeBank corpus (Pustejovsky et al., 2003).

Data Adaptation

We cleaned all TimeML markup in the TempEval-l data and the result was fed to the Google Translator Toolkit.3 This tool combines machine translation with a translation memory.

Data Description

The original English data for TempEval-l are based on the TimeBank data, and they are split into one dataset for training and development and another dataset for evaluation.

Comparing the two Datasets

One of the systems participating in the TempEval-1 competition, the USFD system (Hepple et al., 2007), implemented a very straightforward solution: it simply trained classifiers with Weka (Witten and Frank, 2005), using as attributes information that was readily available in the data and did not require any extra natural language processing (for all tasks, the attribute relType of <TLINK> elements is unknown and must be discovered, but all the other information is given).

Discussion

In this paper, we described the semiautomatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet.

Topics

machine learning

Appears in 8 sentences as: machine learning (8)
In Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
  1. Supervised machine learning approaches are pervasive in the tasks of temporal information processing.
    Page 1, “Introduction”
  2. Even when the best performing systems in these competitions are symbolic, there are machine learning solutions with results close to their performance.
    Page 1, “Introduction”
  3. In the TERN2004 competition (aimed at identifying and normalizing temporal expressions), a symbolic system performed best, but since then machine learning solutions, such as (Ahn et al., 2007), have appeared that obtain similar results.
    Page 1, “Introduction”
  4. The results of machine learning algorithms over the data thus obtained are compared to those reported for the English TempEval-l competition.
    Page 1, “Introduction”
  5. The authors’ objectives were to see “whether a ‘lite’ approach of this kind could yield reasonable performance, before pursuing possibilities that relied on ‘deeper’ NLP analysis methods”, “which of the features would contribute positively to system performance” and “if any [ machine learning ] approach was better suited to the TempEval tasks
    Page 4, “Comparing the two Datasets”
  6. For us, the results of (Hepple et al., 2007) are interesting as they allow for a straightforward evaluation of our adaptation efforts, since the same machine learning implementations can be used with the Portuguese data, and then compared to their results.
    Page 4, “Comparing the two Datasets”
  7. Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation.
    Page 5, “Comparing the two Datasets”
  8. Table 3: Performance of several machine learning algorithms on the Portuguese data for the TempEval-1 tasks.
    Page 6, “Comparing the two Datasets”

See all papers in Proc. ACL 2010 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

learning algorithms

Appears in 3 sentences as: learning algorithms (3)
In Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
  1. The results of machine learning algorithms over the data thus obtained are compared to those reported for the English TempEval-l competition.
    Page 1, “Introduction”
  2. Table 2: Performance of several machine learning algorithms on the English TempEval-l training data, with cross-validation.
    Page 5, “Comparing the two Datasets”
  3. Table 3: Performance of several machine learning algorithms on the Portuguese data for the TempEval-1 tasks.
    Page 6, “Comparing the two Datasets”

See all papers in Proc. ACL 2010 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
  1. SMO is an implementation of Support Vector Machines ( SVM ), rules.JRip is the RIPPER algorithm, and bayes .NaiveBayes is a Naive Bayes classifier.
    Page 5, “Comparing the two Datasets”
  2. In task C, the SVM algorithm was also the best performing algorithm among those that were also tried on the English data, but decision trees produced even better results here.
    Page 6, “Comparing the two Datasets”
  3. The results are: in task A the lazy.KStar classifier scored 58.6%, and the SVM classifier scored 75.5% in task B and 59.4% in task C, with trees .
    Page 6, “Comparing the two Datasets”

See all papers in Proc. ACL 2010 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.