Analysis | To fill this gap, we ran four keyphrase extraction systems on four commonly-used datasets of varying sources, including Inspec abstracts (Hulth, 2003), DUC-2001 news articles (Over, 2001), scientific papers (Kim et al., 2010b), and meeting transcripts (Liu et al., 2009a). |
Analysis | To be more concrete, consider the news article on athlete Ben Johnson in Figure 1, where the keyphrases are boldfaced. |
Analysis | Figure 1: A news article on Ben Johnson from the DUC-2001 dataset. |
Corpora | Consequently, it is harder to extract keyphrases from scientific papers, technical reports, and meeting transcripts than abstracts, emails, and news articles . |
Corpora | Topic change An observation commonly exploited in keyphrase extraction from scientific articles and news articles is that keyphrases typically appear not only at the beginning (Witten et al., 1999) but also at the end (Medelyan et al., 2009) of a document. |
Corpora | Topic correlation Another observation commonly exploited in keyphrase extraction from scientific articles and news articles is that the keyphrases in a document are typically related to each other (Turney, 2003; Mihalcea and Tarau, 2004). |
Experiments | The other uses the traditional Stanford NER to extract named entities from news articles published in the same period and then perform fuzzy matching to identify named entities from tweets. |
Introduction | Previous work in event extraction has focused largely on news articles , as the newswire texts have been the best source of information on current events (Hogen-boom et al., 2011). |
Methodology | However, it is often observed that events mentioned in tweets are also reported in news articles in the same period (Petrovic et al., 2013). |
Methodology | Therefore, named entities mentioned in tweets are likely to appear in news articles as well. |
Methodology | First, a traditional NER tool such as the Stanford Named Entity Recognizer2 is used to identify named entities from the news articles crawled from BBC and CNN during the same period that the tweets were published. |
Evaluation | Given the title or first sentence of a news article , run the same pattern extraction method that was used in training and, if possible, obtain a pattern p involving some entities. |
Evaluation | Replace the entity placeholders in the top-scored patterns pj with the entities that were actually mentioned in the input news article . |
Evaluation | Compared to this model, with heuristics we can obtain patterns for more than twice more news articles . |
Introduction | Then, the same extraction method is used to collect patterns from sentences in never-seen-before news articles . |
Fact Candidates | The first task was titled “Trustworthiness of News Articles”, where annotators were given a link to a news article and |
Fact Candidates | The second task was titled “Objectivity of News Articles” . |
Fact Candidates | We randomly selected 500 news articles from a corpus of about 300,000 news articles obtained from Google News from the topics of Top News, Business, Entertainment, and SciTech. |