Extracting Sequences from the Web
Fader, Anthony and Soderland, Stephen and Etzioni, Oren

Article Structure

Abstract

Classical Information Extraction (IE) systems fill slots in domain-specific frames.

Introduction

Classical IE systems fill slots in domain-specific frames such as the time and location slots in seminar announcements (Freitag, 2000) or the terrorist organization slot in news stories (Chieu et al., 2003).

The SEQ System

Sequence extraction has two parts: identifying possible extractions (at, k, s) from text, and then classifying those extractions as either correct or incorrect.

Experimental Results

This section reports on two experiments.

Related Work

There has been extensive work in extracting lists or sets of entities from the Web.

Conclusions

We have demonstrated that an extractor leveraging sequence regularities can greatly outperform extractors without this knowledge.

Topics