Labeling Documents with Timestamps: Learning from their Time Expressions
Chambers, Nathanael

Article Structure

Abstract

Temporal reasoners for document understanding typically assume that a document’s creation date is known.

Introduction

This paper addresses a relatively new task in the NLP community: automatic document dating.

Previous Work

Most work on dating documents has come from the IR and knowledge management communities interested in dating documents with unknown origins.

Timestamp Classifiers

Labeling documents with timestamps is similar to topic classification, but instead of choosing from topics, we choose the most likely year (or other granularity) in which it was written.

Learning Time Constraints

This section departs from the above document classifiers and instead classifies individual emphyear mentions.

Datasets

This paper uses the New York Times section of the Gigaword Corpus (Graff, 2002) for evaluation.

Experiments and Results

We experiment on the Gigaword corpus as described in Section 5.

Discussion

The main contribution of this paper is the discriminative model (54% improvement) and a new set of

Topics

MaxEnt

Appears in 15 sentences as: MaXEnt (8) MaxEnt (9)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. We used a MaxEnt model and evaluated with the same filtering methods based
    Page 3, “Timestamp Classifiers”
  2. Ultimately, this MaxEnt model vastly outperforms these NLLR models.
    Page 3, “Timestamp Classifiers”
  3. The above language modeling and MaxEnt approaches are token-based classifiers that one could apply to any topic classification domain.
    Page 3, “Timestamp Classifiers”
  4. We then train a MaxEnt classifier and compare against previous work.
    Page 4, “Timestamp Classifiers”
  5. Figure 2: Distribution over years for a single document as output by a MaxEnt classifier.
    Page 5, “Learning Time Constraints”
  6. We train a MaxEnt model on each year mention, to be described next.
    Page 5, “Learning Time Constraints”
  7. We use a MaxEnt classifier trained on the individual year mentions.
    Page 5, “Learning Time Constraints”
  8. Let Td be the set of mentions in document d. We represent a MaXEnt classifier by Py(R|t) for a time mention t E Td and possible relations R. We map this distribution over relations to a distribution over years by defining PyeaT(Y|d):
    Page 6, “Learning Time Constraints”
  9. The MaXEnt classifiers are also from the Stanford toolkit, and both the document and year mention classifiers use its default settings (quadratic prior).
    Page 7, “Experiments and Results”
  10. MaXEnt Unigram is our new discriminative model for this task.
    Page 7, “Experiments and Results”
  11. MaXEnt Time is the discriminative model with rich time features (but not NER) as described in Section 3.3.2 (Time+NER includes NER).
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention MaxEnt.

See all papers in Proc. ACL that mention MaxEnt.

Back to top.

unigrams

Appears in 15 sentences as: Unigram (5) unigram (3) Unigrams (1) unigrams (7)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. They learned unigram language models (LMs) for specific time periods and scored articles with log-likelihood ratio scores.
    Page 2, “Previous Work”
  2. Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags, collocations, and tf-idf scores.
    Page 2, “Previous Work”
  3. As above, they learned unigram LMs, but instead measured the KL-divergence between a document and a time period’s LM.
    Page 2, “Previous Work”
  4. The unigrams w are lowercased tokens.
    Page 3, “Timestamp Classifiers”
  5. model as the Unigram NLLR.
    Page 3, “Timestamp Classifiers”
  6. Followup work by Kanhabua and Norvag (2008) applied two filtering techniques to the unigrams in the model:
    Page 3, “Timestamp Classifiers”
  7. We compare the NER features against the Unigram and Filtered NLLR models in our final experiments.
    Page 3, “Timestamp Classifiers”
  8. The final results (Section 6) only use the lowercased unigrams .
    Page 3, “Timestamp Classifiers”
  9. Barring other knowledge, the learners solely rely on the observed frequencies of unigrams in order to decide which class is most likely.
    Page 3, “Timestamp Classifiers”
  10. Previous work on document dating does not integrate this information except to include the unigram ‘2000’ in the model.
    Page 3, “Timestamp Classifiers”
  11. We extract NER features as sequences of uninterrupted tokens labeled with the same NER tag, ignoring unigrams (since unigrams are already included in the base model).
    Page 4, “Timestamp Classifiers”

See all papers in Proc. ACL 2012 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

NER

Appears in 7 sentences as: NER (9)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. However, we instead propose using NER labels to extract what may have counted as collocations in their data.
    Page 3, “Timestamp Classifiers”
  2. We compare the NER features against the Unigram and Filtered NLLR models in our final experiments.
    Page 3, “Timestamp Classifiers”
  3. We use the freely available Stanford Parser and NER system1 to generate the syntactic interpretation for these features.
    Page 4, “Timestamp Classifiers”
  4. Named Entities: Although not directly related to time expressions, we also include n— grams of tokens that are labeled by an NER system using Person, Organization, or Location.
    Page 4, “Timestamp Classifiers”
  5. We extract NER features as sequences of uninterrupted tokens labeled with the same NER tag, ignoring unigrams (since unigrams are already included in the base model).
    Page 4, “Timestamp Classifiers”
  6. MaXEnt Time is the discriminative model with rich time features (but not NER) as described in Section 3.3.2 (Time+NER includes NER ).
    Page 7, “Experiments and Results”
  7. 7%, and adding NER by another 6%.
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

development set

Appears in 5 sentences as: Development set (1) development set (4)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. Figure 4: Development set accuracy and A values.
    Page 6, “Learning Time Constraints”
  2. In other words, the development set includes documents from July 1995, July 1996, July 1997, etc.
    Page 7, “Datasets”
  3. The development set includes 7,300 from July of each year.
    Page 7, “Datasets”
  4. The A factor in the joint classifier is optimized on the development set as described in Section 4.3.
    Page 7, “Experiments and Results”
  5. The features described in this paper were selected solely by studying performance on the development set .
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

language models

Appears in 5 sentences as: language model (1) language modeling (1) Language Models (1) language models (2)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. They learned unigram language models (LMs) for specific time periods and scored articles with log-likelihood ratio scores.
    Page 2, “Previous Work”
  2. 3.1 Language Models
    Page 2, “Timestamp Classifiers”
  3. We apply Dirichlet-smoothing to the language models (as in de J ong et al.
    Page 3, “Timestamp Classifiers”
  4. The above language modeling and MaxEnt approaches are token-based classifiers that one could apply to any topic classification domain.
    Page 3, “Timestamp Classifiers”
  5. Unigram NLLR and Filtered NLLR are the language model implementations of previous work as described in Section 3.1.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

POS tags

Appears in 5 sentences as: POS tagger (1) POS tags (4)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags , collocations, and tf-idf scores.
    Page 2, “Previous Work”
  2. Word Classes: include only nouns, verbs, and adjectives as labeled by a POS tagger
    Page 3, “Timestamp Classifiers”
  3. on POS tags and tf-idf scores.
    Page 3, “Timestamp Classifiers”
  4. Typed Dependency POS: Similar to Typed Dependency, this feature uses POS tags of the dependency relation’s governor.
    Page 4, “Timestamp Classifiers”
  5. n—gram POS The 4—gram and 3-gram of POS tags that end with the year
    Page 6, “Learning Time Constraints”

See all papers in Proc. ACL 2012 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

fine-grained

Appears in 4 sentences as: fine-grained (4)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. We also experiment with 7 fine-grained relations:
    Page 5, “Learning Time Constraints”
  2. Obviously the more fine-grained a relation, the better it can inform a classifier.
    Page 5, “Learning Time Constraints”
  3. We use a similar function for the seven fine-grained relations.
    Page 6, “Learning Time Constraints”
  4. Not surprisingly, the fine-grained performance is quite a bit lower than the core relations.
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

Joint model

Appears in 4 sentences as: Joint model (3) joint model (1)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. Finally, given the document classifiers of Section 3 and the constraint classifier just defined in Section 4, we create a joint model combining the two with the following linear interpolation:
    Page 6, “Learning Time Constraints”
  2. Finally, the Joint model is the combined document and year mention classifiers as described in Section 4.3.
    Page 7, “Experiments and Results”
  3. Table 4 shows the F1 scores of the Joint model by year.
    Page 7, “Experiments and Results”
  4. Table 4: Yearly results for the Joint model .
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2012 that mention Joint model.

See all papers in Proc. ACL that mention Joint model.

Back to top.

generative models

Appears in 3 sentences as: generative model (1) generative models (2)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. This model alone improves on previous generative models by 77%.
    Page 1, “Abstract”
  2. Our model outperforms the generative models of previous work by 77%.
    Page 2, “Introduction”
  3. We thus begin with a bag-of-words approach, reproducing the generative model used by both de J ong (2005) and Kanhabua and Norvag (2008; 2009).
    Page 2, “Timestamp Classifiers”

See all papers in Proc. ACL 2012 that mention generative models.

See all papers in Proc. ACL that mention generative models.

Back to top.

Named Entities

Appears in 3 sentences as: Named Entities (1) Named entities (1) named entity (1)
In Labeling Documents with Timestamps: Learning from their Time Expressions
  1. Named entities are important to document dating due to the nature of people and places coming in and out of the news at precise moments in time.
    Page 3, “Timestamp Classifiers”
  2. Named Entities : Although not directly related to time expressions, we also include n— grams of tokens that are labeled by an NER system using Person, Organization, or Location.
    Page 4, “Timestamp Classifiers”
  3. Collecting named entity mentions will differentiate between an article discussing a bill and one discussing the US President, Bill Clinton.
    Page 4, “Timestamp Classifiers”

See all papers in Proc. ACL 2012 that mention Named Entities.

See all papers in Proc. ACL that mention Named Entities.

Back to top.