Abstract | One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna .7”, for the user to answer. |
Comparable Question Mining | Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation |
Introduction | In this paper we propose a new way to increase user engagement around news articles , namely suggesting questions for the user to answer, which are related to the viewed article. |
Introduction | Sadly, fun and engaging comparative questions are typically not found within the text of news articles . |
Introduction | However, it is highly unlikely that such sources will contain enough relevant questions for any news article due to typical sparseness issues as well as differences in interests between askers in CQA sites and news reporters. |
Motivation and Algorithmic Overview | Given a news article , our algorithm generates a set of comparable questions for the article from question templates, e.g. |
Online Question Generation | The online part of our automatic generation algorithm takes as input a news article and generates concrete comparable questions for it. |
Introduction | To enable the NLP tools to better understand Twitter feeds, we propose the task of linking a tweet to a news article that is relevant to the tweet, thereby augmenting the context of the tweet. |
Introduction | For example, we want to supplement the implicit context of the above tweet with a news article such as the following entitled: |
Introduction | To create a gold standard dataset, we download tweets spanning over 18 days, each with a url linking to a news article of CNN or NYTIMES, as well as all the news of CNN and NYTIMES published during the period. |
Task and Data | The task is given the text in a tweet, a system aims to find the most relevant news article . |
Task and Data | For gold standard data, we harvest all the tweets that have a single url link to a CNN or NYTIMES news article , dated from the 11th of Jan to the 27th of Jan, 2013. |
Task and Data | In evaluation, we consider this url-referred news article as the gold standard — the most relevant document for the tweet, and remove the url from the text of the tweet. |
Abstract | We present disputant relation-based method for classifying news articles on contentious issues. |
Abstract | It performs unsupervised classification on news articles based on disputant relations, and helps readers intuitively view the articles through the opponent-based frame. |
Background and Related Work | The discourse of contentious issues in news articles show different characteristics from that studied in the sentiment classification tasks. |
Background and Related Work | For example, a news article can cast a negative light on a government program simply by covering the increase of deficit caused by it. |
Background and Related Work | News articles of a contentious issue are more diverse than debate articles conveying explicit argument of a specific side. |
Introduction | However, news articles are frequently biased and fail to fairly deliver conflicting arguments of the issue. |
Introduction | In this paper, we present disputant relation-based method for classifying news articles on con- |
Introduction | The method helps readers intuitively view the news articles through the opponent-based frame. |
Introduction | Specifically, you are given a corpus of news articles in which all tokens have been labeled as either belonging to personal name mentions or not. |
Introduction | Clearly the problems of identifying names in news articles and e-mails are closely related, and learning to do well on one should help your performance on the other. |
Introduction | When only the type of data being examined is allowed to vary (from news articles to e-mails, for example), the problem is called domain adaptation (Daumé III and Marcu, 2006). |
Investigation | These are: abstracts from biological journals [UT (Bunescu et al., 2004), Yapex (Franzen et al., 2002)]; news articles [MUC6 (Fisher et al., 1995), MUC7 (Borthwick et al., 1998)]; and personal e-mails [CSPACE (Kraut et al., 2004)]. |
Investigation | 0 person names in news articles and e-mails We chose this array of corpora so that we could evaluate our hierarchical prior’s ability to generalize across and incorporate information from a variety of domains, genres and tasks. |
Investigation | Figure 3 shows the results of an experiment in learning to recognize person names in MUC6 news articles . |
Abstract | We create a database of pictures that are naturally embedded into news articles and propose to use their captions as a proxy for annotation keywords. |
Abstract | We also demonstrate that the news article associated with the picture can be used to boost image annotation performance. |
BBC News Database | Many online news providers supply pictures with news articles , some even classify news into broad topic categories (e.g., business, world, sports, entertainment). |
BBC News Database | We downloaded 3,361 news articles from the BBC News website.2 Each article was accompanied with an image and its caption. |
Introduction | News articles associated with images and their captions spring readily to mind (e.g., BBC News, Yahoo News). |
Introduction | Importantly, our images are not standalone, they come with news articles whose content is shared with the image. |
Related Work | For example, news articles often contain images whose captions can be thought of as annotations. |
Experiments | We extracted a set of news articles and corresponding user comments from Yahoo! |
Experiments | We then run our summarization algorithm on the instantiated graph to produce a summary for each news article . |
Experiments | In addition, each news article and corresponding set of comments were presented to three human annotators. |
Framework | Depending on the summarization application, 0 can refer to the set of documents (e.g., newswire) related to a particular topic as in standard summarization; in other scenarios (e. g., user-generated content), it is a collection of comments associated with a news article or a blog post, etc. |
Introduction | On the other hand, in the case of user-generated content (say, comments on a news article ), even though the text is short, one is faced with a different set of problems: volume (popular articles generate more than 10,000 comments), noise (most comments are vacuous, linguistically deficient, and tangential to the article), and redundancy (similar views are expressed by multiple commenters). |
Introduction | We then conduct experiments on two corpora: the DUC 2004 corpus and a corpus of user comments on news articles . |
Analysis | To fill this gap, we ran four keyphrase extraction systems on four commonly-used datasets of varying sources, including Inspec abstracts (Hulth, 2003), DUC-2001 news articles (Over, 2001), scientific papers (Kim et al., 2010b), and meeting transcripts (Liu et al., 2009a). |
Analysis | To be more concrete, consider the news article on athlete Ben Johnson in Figure 1, where the keyphrases are boldfaced. |
Analysis | Figure 1: A news article on Ben Johnson from the DUC-2001 dataset. |
Corpora | Consequently, it is harder to extract keyphrases from scientific papers, technical reports, and meeting transcripts than abstracts, emails, and news articles . |
Corpora | Topic change An observation commonly exploited in keyphrase extraction from scientific articles and news articles is that keyphrases typically appear not only at the beginning (Witten et al., 1999) but also at the end (Medelyan et al., 2009) of a document. |
Corpora | Topic correlation Another observation commonly exploited in keyphrase extraction from scientific articles and news articles is that the keyphrases in a document are typically related to each other (Turney, 2003; Mihalcea and Tarau, 2004). |
Experiment settings | All six are large collections with 50 news articles , so this baseline is significantly different from a random baseline. |
Headline generation | Our approach takes as input, for training, a corpus of news articles organized in news collections. |
Headline generation | Algorithm 2 EXTRACTPATTERNSq;(n, E): n is the list of sentences in a news article . |
Introduction | For some applications it is important to understand, given a collection of related news articles and re- |
Related work | Most headline generation work in the past has focused on the problem of single-document summarization: given the main passage of a single news article , generate a very short summary of the article. |
Related work | Filippova (2010) reports a system that is very close to our settings: the input is a collection of related news articles , and the system generates a headline that describes the main event. |
Experimental Setup | Participants were presented with a news article and its corresponding highlights and were asked to rate the latter along three dimensions: informativeness (do the highlights represent the article’s main topics? |
Introduction | If our goal is to summarize news articles , then we may be better off selecting the first n sentences of the document. |
Introduction | Examples of CNN news articles with human-authored highlights are shown in Table 1. |
Modeling | Highlights on a small screen deVice would presumably be shorter than highlights for news articles on the web. |
The Task | Given a document, we aim to produce three or four short sentences covering its main topics, much like the “Story Highlights” accompanying the (online) CNN news articles . |
The Task | The majority were news articles , but the set also contained a mixture of editorials, commentary, interviews and reviews. |
Experiments | The other uses the traditional Stanford NER to extract named entities from news articles published in the same period and then perform fuzzy matching to identify named entities from tweets. |
Introduction | Previous work in event extraction has focused largely on news articles , as the newswire texts have been the best source of information on current events (Hogen-boom et al., 2011). |
Methodology | However, it is often observed that events mentioned in tweets are also reported in news articles in the same period (Petrovic et al., 2013). |
Methodology | Therefore, named entities mentioned in tweets are likely to appear in news articles as well. |
Methodology | First, a traditional NER tool such as the Stanford Named Entity Recognizer2 is used to identify named entities from the news articles crawled from BBC and CNN during the same period that the tweets were published. |
Evaluation | Given the title or first sentence of a news article , run the same pattern extraction method that was used in training and, if possible, obtain a pattern p involving some entities. |
Evaluation | Replace the entity placeholders in the top-scored patterns pj with the entities that were actually mentioned in the input news article . |
Evaluation | Compared to this model, with heuristics we can obtain patterns for more than twice more news articles . |
Introduction | Then, the same extraction method is used to collect patterns from sentences in never-seen-before news articles . |
Machine Translation Quality Prediction | DUC2001 provided 309 English news articles for document summarization tasks, and the articles were grouped into 30 document sets. |
Machine Translation Quality Prediction | The news articles were selected from TREC-9. |
Machine Translation Quality Prediction | We chose five document sets (d04, d05, d06, d08, d1 1) with 54 news articles out of the DUC2001 document sets. |
Related Work 2.1 Machine Translation Quality Prediction | and Chiorean (2008) propose to produce summaries with the MMR method from Romanian news articles and then automatically translate the summaries into English. |
Conclusions & Future Work | els do not seem to be effective in the context of a news articles search task, they are a good indicator of effectiveness in the context of web search. |
Evaluation | Robust04 is composed 528,155 of news articles coming from three newspapers and the FBIS. |
Evaluation | We hypothesize that the heterogeneous nature of the web allows to model very different topics covering several aspects of the query, while news articles are contributions focused on a single subject. |
Evaluation | Although topics coming from news articles may be limited, they benefit from the rich vocabulary of professional writers who are trained to avoid repetition. |
Document-level Parsing Approaches | For example, this is true for 75% cases in our development set containing 20 news articles from RST—DT and for 79% cases in our development set containing 20 how-to-do manuals from the Instructional corpus. |
Introduction | While previous approaches have been tested on only one corpus, we evaluate our approach on texts from two very different genres: news articles and instructional how-to-do manuals. |
Related work | They evaluate their approach on the RST—DT corpus (Carlson et al., 2002) of news articles . |
Fact Candidates | The first task was titled “Trustworthiness of News Articles”, where annotators were given a link to a news article and |
Fact Candidates | The second task was titled “Objectivity of News Articles” . |
Fact Candidates | We randomly selected 500 news articles from a corpus of about 300,000 news articles obtained from Google News from the topics of Top News, Business, Entertainment, and SciTech. |
Data | The TR-CONLL corpus (Leidner, 2008) contains 946 REUTERS news articles published in August 1996. |
Error Analysis | An instance of California in a baseball-related news article is incorrectly predicted to be the town California, Pennsylvania. |
Introduction | ically recorded travel costs on the shaping of empires (Scheidel et al., 2012), and systems that convey the geographic content in news articles (Teitler et al., 2008; Sankaranarayanan et al., 2009) and microblogs (Gelernter and Mushegian, 2011). |
Introduction | Then, it produces a new article by selecting content from the Internet for each part of this template. |
Method | Our model constructs a new article by following these two steps: Ranking First, we attempt to rank candidate excerpts based on how representative they are of each individual topic. |
Rank(eij1...€ij7~,Wj) | 5 We are continually submitting new articles ; however, we report results on those that have at least a 6 month history at time of writing. |
Cross Language Text Categorization | The data we work with consists of comparable corpora of news articles in English and Italian. |
Cross Language Text Categorization | Each news article is annotated with one of the four categories: culture_andJchool, tourism, quality_0f_llfe, madejnltaly. |
Introduction | To test the usefulness of etymological information we work with comparable collections of news articles in English and Italian, whose articles are assigned one of four categories: culture_andJchool, tourism, qual-ity_0f_life, madejnltaly. |
Experiments | We processed news articles for an entire year in 2008 by Xinhua news who publishes news in both English and Chinese, which were also used by Kim et al. |
Experiments | The English corpus consists of 100,746 news articles, and the Chinese corpus consists of 88,031 news articles . |
Experiments | 0 D0: All news articles are used. |
Conclusions | We have presented extractive and abstractive models that generate image captions for news articles . |
Related Work | Instead of relying on manual annotation or background ontological information we exploit a multimodal database of news articles , images, and their captions. |
Results | It is well known that news articles are written so that the lead contains the most important information in a story.7 This is an encouraging result as it highlights the importance of the visual information for the caption generation task. |
Evaluation | Since our major focus is to detect events from news articles , we only keep the web pages with keyword “news” in URL field. |
Introduction | One standard way for that is to cluster news articles as events by following a two-step approach (Yang et al., 1998): 1) represent document as vectors and calculate similarities between documents; 2) run the clustering algorithm to obtain document clusters as events.1 Underlying text representation often plays a critical role in this approach, especially for long text streams. |
Introduction | D1 and D2 are news articles about U.S. presidential election respectively in years 2004 and 2008. |
Related Work | Important events are those reported in a large number of news articles and each event is constructed according to one single query and represented by a set of sentences. |
Temporal and Linguistic Processing | In news articles , this is the DCT. |
Temporal and Linguistic Processing | Figure 4 shows an example of an analyzed excerpt of a news article . |
Abstract | Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. |
Introduction | More importantly, their model was applied to news articles and scientific publications, where most documents follow the global topical trends. |
Method | Unlike news articles from traditional media, which are mostly about current affairs, an important property of microblog posts is that many posts are about users’ personal encounters and interests rather than global events. |
Introduction | Summarization has been applied to different genres, such as news articles , scientific articles, and speech domains including broadcast news, meetings, conversations and lectures. |
Opinion Summarization Methods | To obtain this, we trained a maximum entropy classifier with a bag-of-words model using a combination of data sets from several domains, including movie data (Pang and Lee, 2004), news articles from MPQA corpus (Wilson and Wiebe, 2003), and meeting transcripts from AMI corpus (Wilson, 2008a). |
Related Work | Previous studies have used various domains, including news articles , scientific articles, web documents, reviews. |