Conclusions and Future Work | Our results show that discourse-based metrics can improve the state-of-the-art MT metrics, by increasing correlation with human judgments, even when only sentence-level discourse information is used. |
Conclusions and Future Work | First, at the sentence-level , we can use discourse information to re-rank alternative MT hypotheses; this could be applied either for MT parameter tuning, or as a postprocessing step for the MT output. |
Experimental Results | We speculate that this might be caused by the fact that the lexical information in DR-LEX is incorporated only in the form of unigram matching at the sentence-level , while the metrics in group IV are already complex combined metrics, which take into account stronger lexical models. |
Experimental Results | This is remarkable given that DR has a strong negative Tau as an individual metric at the sentence-level . |
Experimental Setup | As in the WMT12 experimental setup, we use these rankings to calculate correlation with human judgments at the sentence-level , i.e. |
Introduction | From its foundations, Statistical Machine Translation (SMT) had two defining characteristics: first, translation was modeled as a generative process at the sentence-level . |
Introduction | Recently, there have been two promising research directions for improving SMT and its evaluation: (a) by using more structured linguistic information, such as syntax (Galley et al., 2004; Quirk et al., 2005), hierarchical structures (Chiang, 2005), and semantic roles (Wu and Fung, 2009; Lo et al., 2012), and (b) by going beyond the sentence-level , e.g., translating at the document level (Hardmeier et al., 2012). |
Introduction | Going beyond the sentence-level is important since sentences rarely stand on their own in a well-written text. |
Related Work | Unlike their work, which measures lexical cohesion at the document-level, here we are concerned with coherence (rhetorical) structure, primarily at the sentence-level . |
Approach | We formulate the sentence-level sentiment classification task as a sequence labeling problem. |
Approach | The inputs to the model are sentence-segmented documents annotated with sentence-level sentiment labels (positive, negative or neutral) along with a set of unlabeled documents. |
Introduction | In this paper, we focus on the task of sentence-level sentiment classification in online reviews. |
Introduction | Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012). |
Introduction | In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning. |
Related Work | In this paper, we focus on the study of sentence-level sentiment classification. |
Related Work | Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints. |
Related Work | We also show that constraints derived from the discourse context can be highly useful for dis-ambiguating sentence-level sentiment. |
Conclusions | In this paper we performed a sentence-level correlation analysis of automatic evaluation measures against expert human judgements for the automatic image description task. |
Conclusions | We found that sentence-level unigram BLEU is only weakly correlated with human judgements, even though it has extensively reported in the literature for this task. |
Methodology | (2011) to perform a sentence-level analysis, setting n = 1 and no brevity penalty to get the unigram BLEU measure, or n = 4 with the brevity penalty to get the Smoothed BLEU measure. |
Methodology | The sentence-level evaluation measures were calculated for each image—description—reference tuple. |
Methodology | The evaluation measure scores were then compared with the human judgements using Spearman’s correlation estimated at the sentence-level . |
Results | Sentence-level automated measure score |
Results | Sentence-level automated measure score |
Bottom-up tree-building | In particular, starting from the constituents on the bottom level (EDUs for intra-sentential parsing and sentence-level discourse trees for multi-sentential parsing), at each step of the tree-building, we greedily merge a pair of adjacent discourse constituents such that the merged constituent has the highest probability as predicted by our structure model. |
Linear time complexity | The total time to generate sentence-level discourse trees for n sentences is ZZ=10(m%). |
Overall work flow | (2013), we perform a sentence-level parsing for each sentence first, followed by a text-level parsing to generate a full discourse tree for the whole document. |
Overall work flow | Each sentence 5,, after being segmented into EDUs (not shown in the figure), goes through an intra-sentential bottom-up tree-building model Minna, to form a sentence-level discourse tree Tgi, with the EDUs as leaf nodes. |
Overall work flow | We then combine all sentence-level discourse tree TS}: ’s using our multi-sentential bottom-up tree-building model Mmum to generate the text-level discourse tree TD. |
Related work | First, they decomposed the problem of text-level discourse parsing into two stages: intra-sentential parsing to produce a discourse tree for each sentence, followed by multi-sentential parsing to combine the sentence-level discourse trees and produce the text-level discourse tree. |
Abstract | In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input. |
Conclusion and Future Work | We presented a sentence-level classification approach for MT system selection for diglossic languages. |
Discussion and Error Analysis | In 21% of the error cases, our classifier predicted a better translation than the one considered gold by BLEU due to BLEU bias, e.g., severe sentence-level length penalty due to an extra punctuation in a short sentence. |
Introduction | In this paper we study the use of sentence-level dialect identification together with various linguistic features in optimizing the selection of outputs of four different MT systems on input text that includes a mix of dialects. |
MT System Selection | For baseline system selection, we use the classification decision of Elfardy and Diab (2013)’s sentence-level dialect identification system to decide on the target MT system. |
MT System Selection | We run the 5,562 sentences of the classification training data through our four MT systems and produce sentence-level BLEU scores (with length penalty). |
Related Work | used features from their token-level system to train a classifier that performs sentence-level dialect ID (Elfardy and Diab, 2013). |
Background: Deep Learning | Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase. |
Introduction | In this case, people understand the meaning because of the IT topical context which goes beyond sentence-level analysis and requires more relevant knowledge. |
Introduction | This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible. |
Related Work | our method is that it is applicable to both sentence-level and document-level SMT, since we do not place any restrictions on the input. |
Related Work | 0 We directly optimized bilingual topic similarity in the deep learning framework with the help of sentence-level parallel data, so that the learned representation could be easily used in SMT decoding procedure. |
Topic Similarity Model with Neural Network | learn topic representations using sentence-level parallel data. |
Experiments | Each of these models have the same task: to predict sentence-level ideology labels for sentences in a test set. |
Experiments | Table 1: Sentence-level bias detection accuracy. |
Experiments | RNNl initializes all parameters randomly and uses only sentence-level labels for training. |
Recursive Neural Networks | They have achieved state-of-the-art performance on a variety of sentence-level NLP tasks, including sentiment analysis, paraphrase detection, and parsing (Socher et al., 2011a; Hermann and Blunsom, 2013). |
Related Work | Finally, combining sentence-level and document-level models might improve bias detection at both levels. |
Experiments | Method 4, named REBOL, implements REsponse-Based Online Learning by instantiating y+ and y‘ to the form described in Section 4: In addition to the model score 3, it uses a cost function 0 based on sentence-level BLEU (Nakov et al., 2012) and tests translation hypotheses for task-based feedback using a binary execution function 6. |
Experiments | This can be attributed to the use of sentence-level BLEU as cost function in RAMPION and REBOL. |
Response-based Online Learning | Computation of distance to the reference translation usually involves cost functions based on sentence-level BLEU (Nakov et al. |
Response-based Online Learning | In addition, we can use translation-specific cost functions based on sentence-level BLEU in order to boost similarity of translations to human reference translations. |
Response-based Online Learning | Our cost function c(y(i), y) = (l — BLEU(y(i), is based on a version of sentence-level BLEU Nakov et al. |
Discussion and Conclusions | This is the most realistic evaluation of methods for predicting sentence-level grammaticality to date. |
Introduction | While some applications (e.g., grammar checking) rely on such fine-grained predictions, others might be better addressed by sentence-level grammaticality judgments (e. g., machine translation evaluation). |
Introduction | Regarding sentence-level grammaticality, there has been much work on rating the grammatical- |
Introduction | With this unique data set, which we will release to the research community, it is now possible to conduct realistic evaluations for predicting sentence-level grammaticality. |
Conventional Neural Network | To model the tag dependency, previous neural network models (Collobert et al., 2011; Zheng et al., 2013) introduce a transition score Aij for jumping from tag i E T to tag j E T. For a input sentence cum] with a tag sequence tum], a sentence-level score is then given by the sum of transition and network scores: |
Conventional Neural Network | Given the sentence-level score, Zheng et al. |
Conventional Neural Network | (2013), their model is a global one where the training and inference is performed at sentence-level . |
Max-Margin Tensor Neural Network | (2013), our model is also trained at sentence-level and carries out inference globally. |
Experiments | The task is to predict sentence-level sentiment, so each training example is a sentence. |
Experiments | It has been shown that syntactic information is helpful for sentence-level predictions (Socher et al., 2013), so the parse tree regularizer is naturally suitable for this task. |
Structured Regularizers for Text | This regularizer captures the idea that phrases might be selected as relevant or (in most cases) irrelevant to a task, and is expected to be especially useful in sentence-level prediction tasks. |
Structured Regularizers for Text | In sentence level prediction tasks, such as sentence-level sentiment analysis, it is known that most constituents (especially those that correspond to shorter phrases) in a parse tree are uninformative (neutral sentiment). |
Related Work | (2004) introduced a sentence-level QE system where an arbitrary threshold is used to classify the MT output as good or bad. |
Related Work | To address this problem, Quirk (2004) related the sentence-level correctness of the QE model to human judgment and achieved a high correlation with human judgement for a small annotated corpus; however, the proposed model does not scale well to larger data sets. |
Results | Table l: Sentence-level correlation with HAJ |