Abstract | Experiments on Automatic Content Extraction (ACE)1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system. |
Background | Most previous research on relation extraction assumed that entity mentions were given In this work we aim to address the problem of end-to-end entity mention and relation extraction from raw texts. |
Conclusions and Future Work | In this paper we introduced a new architecture for more powerful end-to-end entity mention and relation extraction. |
Experiments | Furthermore, we combine these two criteria to evaluate the performance of end-to-end entity mention and relation extraction. |
Experiments | The human F1 score on end-to-end relation extraction is only about 70%, which indicates it is a very challenging task. |
Experiments | For end-to-end entity mention and relation extraction, both the joint approach and the pipelined baseline outperform the best results reported by (Chan and Roth, 2011) under the same setting. |
Introduction | The goal of end-to-end entity mention and relation extraction is to discover relational structures of entity mentions from unstructured texts. |
Related Work | We extended the similar idea to our end-to-end task by incrementally predicting relations along with entity mention segments. |
Abstract | Experiments on benchmark datasets show that our approach outperforms previous state-of-the-art systems, with error reductions of 13% to 21% in end-to-end performance. |
Experimental Setup | For end-to-end performance, value F1 is the primary metric. |
Experimental Setup | Comparison Systems We compare our system primarily to HeidelTime (Stro'tgen and Gertz, 2013), which is state of the art in the end-to-end task. |
Formal Overview | We compare to the state-of-the-art systems for end-to-end resolution (Strotgen and Gertz, 2013) and resolution given gold mentions (Bethard, 2013b), both of which do not use any machine learning techniques. |
Introduction | On these benchmark datasets, we present new state-of-the-art results, with error reductions of up to 28% for the detection task and 21% for the end-to-end task. |
Results | End-to-end results Figure 4 shows development and test results for TempEval-3. |
Results | Precision vs. Recall Our probabilistic model of time expression resolution allows us to easily tradeoff precision and recall for end-to-end performance by varying the resolution probability threshold. |
Results | We also manually categorized all resolution errors for end-to-end performance with 10-fold cross validation of the TempEval-3 Dev dataset, |
Background | Figure 1: End-to-end question answering by GUSP for sentence get flight from toronto to san diego stopping in dtw. |
Experiments | Since our goal is not to produce a specific logical form, we directly evaluate on the end-to-end task of translating questions into database queries and measure question-answering accuracy. |
Experiments | The numbers for GUSP-FULL and GUSP++ are end-to-end question answering accuracy, whereas the numbers for ZC07 and FUBL are recall on exact match in logical forms. |
Grounded Unsupervised Semantic Parsing | Figure 1 shows an example of end-to-end question answering using GUSP. |
Introduction | We evaluated GUSP on end-to-end question answering using the ATIS dataset for semantic parsing (Zettlemoyer and Collins, 2007). |
Introduction | Despite these challenges, GUSP attains an accuracy of 84% in end-to-end question answering, effectively tying with the state-of-the-art supervised approaches (85% by Zettlemoyer & Collins (2007), 83% by Kwiatkowski et al. |
Abstract | We present a method to transliterate names in the framework of end-to-end statistical machine translation. |
End-to-End results | Finally, here are end-to-end machine translation results for three sentences, with and without the transliteration module, along with a human reference translation. |
Evaluation | In the result section of this paper, we will use the NEWA metric to measure and compare the accuracy of NE translations in our end-to-end SMT translations and four human reference translations. |
Introduction | The task of transliterating names (independent of end-to-end MT) has received a significant amount of research, e.g., (Knight and Graehl, 1997; Chen et al., 1998; Al-Onaizan, 2002). |
Introduction | Most of this work has been disconnected from end-to-end MT, a problem which we address head-on in this paper. |
Abstract | In this paper, we propose a novel recursive recurrent neural network (RZNN) to model the end-to-end decoding process for statistical machine translation. |
Introduction | Different from the work mentioned above, which applies DNN to components of conventional SMT framework, in this paper, we propose a novel RZNN to model the end-to-end decoding process. |
Our Model | In this section, we leverage DNN to model the end-to-end SMT decoding process, using a novel recursive recurrent neural network (RZNN), which is different from the above mentioned work applying DNN to components of conventional SMT framework. |
Our Model | Our R2NN is used to model the end-to-end translation process, with recurrent global information added. |
Related Work | Unfortunately, the better word alignment result generated by this model, cannot bring significant performance improvement on a end-to-end SMT evaluation task. |
Conclusions | 5 We currently do not evaluate the end-to-end system over different corpora. |
Conclusions | Table 4: Extrinsic evaluation, where we plugged the two merging models into an end-to-end feedback detection system by Swanson and Yamangil. |
Correction Detection | In comparison, phrase extraction systems aim to improve the end-to-end MT or paraphrasing systems. |
Experimental Setup | In addition to evaluating the merging algorithms on the standalone task of correction detection, we have also plugged in the merging algorithms into an end-to-end system in which every automatically detected correction is further classified into an error type. |
Experiments | We start by evaluating end-to-end performance of LUCHS when applied to Wikipedia text, then analyze the characteristics of its components. |
Experiments | Figure 2: Precision / recall curve for end-to-end system performance on 100 random articles. |
Experiments | To evaluate the end-to-end performance of LUCHS, we test the pipeline which first classifies incoming pages, activating a small set of extractors on the text. |
Introduction | 0 We evaluate the overall end-to-end performance of LUCHS, showing an F1 score of 61% when extracting relations from randomly selected Wikipedia pages. |
Abstract | Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induction, information extraction), all end-to-end solutions to date require heavy supervision and/or manual engineering, limiting their scope and scalability. |
Conclusion | This paper introduced OntoUSP, the first unsupervised end-to-end system for ontology induction and knowledge extraction from text. |
Introduction | (2006)), but to date there is no sufficiently automatic end-to-end solution. |
Introduction | Ideally, we would like to have an end-to-end unsupervised (or lightly supervised) solution to the problem of knowledge acquisition from text. |
Abstract | When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution. |
Introduction | There is also work on end-to-end coreference resolution that uses large noun-similarity lists (Daumé III and Marcu, 2005) or structured knowledge bases such as Wikipedia (Yang and Su, 2007; Haghighi and Klein, 2009; Kobdani et al., 2011) and YAGO (Rahman and Ng, 2011). |
Introduction | Altogether, our final system produces the best numbers reported to date on end-to-end coreference resolution (with automatically detected system mentions) on multiple data sets (ACE 2004 and ACE 2005) and metrics (MUC and B3), achieving significant improvements over the Reconcile DT baseline and over the state-of-the-art results of Haghighi and Klein (2010). |
Semantics via Web Features | datasets for end-to-end coreference resolution (see Section 4.3). |
Abstract | We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. |
Conclusions and Future Work | Two end-to-end SMT tasks are involved to test the power of the proposed model at learning the semantic phrase embeddings. |
Introduction | Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to check whether a translation candidate and the source phrase are in the same meaning. |
Abstract | We present a generic phrase training algorithm which is parameterized with feature functions and can be optimized jointly with the translation engine to directly maximize the end-to-end system performance. |
Conclusions | It can be optimized jointly with the translation engine to directly maximize the end-to-end translation performance. |
Experimental Results | Since the translation engine implements a log-linear model, the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process. |
Experiments | The relatively low performance of the baseline system TagMe demonstrates that only relying on prior popularity and topical information within a single tweet is not enough for an end-to-end wikification system for the short tweets. |
Experiments | However, when ,u 2 0.4, the system performance dramatically decreases, showing that prior popularity is not enough for an end-to-end wikification system. |
Introduction | An end-to-end wikification system needs to solve two sub-problems: (i) concept mention detection, (ii) concept mention disambiguation. |
Introduction | Our model is also the first to directly learn relational patterns as part of the process of training an end-to-end taxonomic induction system, rather than using patterns that were hand-selected or learned via pairwise classifiers on manually annotated co-occurrence patterns. |
Introduction | Finally, it is the first end-to-end (i.e., non-incremental) system to include sibling (e.g., coordination) patterns at all. |
Related Work | Our model also automatically learns relational patterns as a part of the taxonomic training phase, instead of relying on handpicked rules or pairwise classifiers on manually annotated co-occurrence patterns, and it is the first end-to-end (i.e., non-incremental) system to include heterogeneous relational information via sibling (e.g., coordination) patterns. |
Conclusion | Performance remains good when resolving toponyms identified automatically, indicating that end-to-end systems based on our models may improve the experience of digital humanities scholars interested in finding and visualizing toponyms in large corpora. |
Introduction | However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer. |
Toponym Resolvers | 4States and countries are not annotated in CWAR, so we do not evaluate end-to-end using NER plus toponym resolution for it as there are many (falsely) false positives. |
Introduction | We performed an end-to-end evaluation against a database of 15 million facts automatically extracted from general web text (Fader et al., 2011). |
Introduction | 0 We introduce PARALEX, an end-to-end open-domain question answering system. |
Question Answering Model | For the end-to-end QA task, we return a ranked list of answers from the k highest scoring queries. |
Experiments | 5.4 End-to-End Result |
Experiments | We compare their influence on RankingSVM accuracy, alignment crossing-link number, end-to-end BLEU score, and the model size. |
Experiments | These features also correspond to BLEU score improvement for End-to-End evaluations. |
Experiments | Finally, the third task is to complete the end-to-end navigation task. |
Experiments | Finally, we evaluate the system on the end-to-end navigation task. |
Experiments | Table 4: End-to-end navigation task completion rates. |
Coreference Subtask Analysis | This section examines the role of three such subtasks — named entity recognition, anaphoricity determination, and coreference element detection — in the performance of an end-to-end coreference resolution system. |
Coreference Subtask Analysis | We expect CE detection to be an important subproblem for an end-to-end coreference system. |
Introduction | (2003) represents a fully automatic end-to-end resolver. |
Abstract | This preference for sparse solutions together with effective pruning methods forms a phrase alignment regimen that produces better end-to-end translations than standard word alignment approaches. |
Experiments | 7.2 End-to-end Evaluation |
Experiments | Given an unlimited amount of time, we would tune the prior to maximize end-to-end performance, using an objective function such as BLEU. |