Analysis | Although a few researchers have presented a sample of their systems’ output and the corresponding gold keyphrases to show the differences between them (Witten et al., 1999; Nguyen and Kan, 2007; Medelyan et al., 2009), a systematic analysis of the major types of errors made by state-of-the-art keyphrase extraction systems is missing. |
Analysis | To fill this gap, we ran four keyphrase extraction systems on four commonly-used datasets of varying sources, including Inspec abstracts (Hulth, 2003), DUC-2001 news articles (Over, 2001), scientific papers (Kim et al., 2010b), and meeting transcripts (Liu et al., 2009a). |
Corpora | Automatic keyphrase extraction systems have been evaluated on corpora from a variety of |
Evaluation | In this section, we describe metrics for evaluating keyphrase extraction systems as well as state-of-the-art results on commonly-used datasets. |
Evaluation | To score the output of a keyphrase extraction system , the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F). |
Keyphrase Extraction Approaches | A keyphrase extraction system typically operates in two steps: (1) extracting a list of words/phrases that serve as candidate keyphrases using some |
Experimental Setup | Moreover, we compare our abstractive system with the first part of our framework (utterance extraction in Figure l), which can be presented as an extractive query-based summarization system (our extractive system ). |
Experimental Setup | We also show the results of the version we use in our pipeline (our pipeline extractive system ). |
Experimental Setup | In contrast, in the stand alone version ( extractive system ) we limit the number of retrieved sentences to the desired length of the summary. |
Abstract | However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. |
Experiments | In this section, we first describe the Twitter corpus used in our experiments and then present how we build a baseline based on the previously proposed TwiCal system (Ritter et al., 2012), the state-of-the-art open event extraction system on tweets. |
Experiments | To evaluate the performance of the propose approach, we use preeison, recall, and F —measure as in general information extraction systems (Makhoul et al., 1999). |
Experiments | It can be observed from Table 2 that by using NW—NER, the performance of event extraction system is improved significantly by 7.5% and 3% respectively on F-measure when evaluated on 3-tuples (without keywords) or 4-tuples (with keywords). |
Introduction | We have conducted experiments on a Twitter corpus and the results show that our proposed approach outperforms TwiCal, the state-of-the-art open event extraction system , by 7.7% in F-measure. |
Introduction | Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data. |
Introduction | This paper considers relation adaptation, where a relation extraction system trained on many source domains is adapted to a new target domain. |
Introduction | There are at least three challenges when adapting a relation extraction system to a new domain. |
Related Work | Among these studies, Plank and Moschitti’s is the closest to ours because it adapts relation extraction systems to new domains. |
Abstract | In order to extract entities of a fine- grained category from semistructured data in web pages, existing information extraction systems rely on seed examples or redundancy across multiple web pages. |
Approach | We represent each web page 212 as a DOM tree, a common representation among wrapper induction and web information extraction systems (Sahuguet and Azavant, 1999; Liu et al., 2000; Crescenzi et al., 2001). |
Approach | In the literature, many information extraction systems employ more versatile extraction predicates (Wang and Cohen, 2009; Fumarola et al., 2011). |
Discussion | In contrast to information extraction systems that extract homogeneous records from web pages (Liu et al., 2003; Zheng et al., 2009), our system must choose the correct field from each record and also identify the relevant part of the page based on the query. |
Identifying Key Medical Relations | The first step in building a relation extraction system for medical domain is to identify the relations that are important for clinical decision making. |
Introduction | To construct a medical relation extraction system , several challenges have to be addressed: |
Introduction | The medical corpus underlying our relation extraction system contains 80M sentences (ll gigabytes pure text). |