Abstract | For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA—CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. |
Experiments | Although word-level BLEU has often been found inferior to the new-generation metrics when the target language is English or other European languages, prior research has shown that character-level BLEU is highly competitive when the target language is Chinese (Li et al., 2011). |
Experiments | word-level Character-level |
Experiments | word-level Character-level |
Motivation | In this work, we attempt to address both of these issues by introducing TESLA-CELAB, a character-level metric that also models word-level linguistic phenomenon. |
Abstract | Character-level information can benefit downstream applications by offering flexible granularities for word segmentation while improving word-level dependency parsing accuracies. |
Character-Level Dependency Tree | Inner-word dependencies can also bring benefits to parsing word-level dependencies. |
Character-Level Dependency Tree | When the internal structures of words are annotated, character-level dependency parsing can be treated as a special case of word-level dependency parsing, with “words” being “characters”. |
Character-Level Dependency Tree | The word-level dependency parsing features are added when the inter-word actions are applied, and the features for joint word segmentation and POS-tagging are added when the actions PW, SHW and SHC are applied. |
Introduction | Moreover, manually annotated intra-word dependencies can give improved word-level dependency accuracies than pseudo intra-word dependencies. |
Abstract | We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging. |
Character-level POS Tagset | Some of these tags are directly derived from the commonly accepted word-level part-of-speech, such as noun, verb, adjective and adverb. |
Chinese Morphological Analysis with Character-level POS | This hybrid model constructs a lattice that consists of word-level and character-level nodes from a given input sentence. |
Chinese Morphological Analysis with Character-level POS | Word-level nodes correspond to words found in the system’s lexicon, which has been compiled from training data. |
Chinese Morphological Analysis with Character-level POS | upper part of the lattice (word-level nodes) represents known words, where each node carries information such as character form, character-level POS , and word-level POS. |
Introduction | Table l. Character-level POS sequence as a more specified version of word-level POS: an example of verb. |
Introduction | Another advantage of character-level P08 is that, the sequence of character-level P08 in a word can be seen as a more fine-grained version of word-level POS. |
Introduction | The five words in this table are very likely to be tagged with the same word-level POS as verb in any available annotated corpora, while it can be commonly agreed among native speakers of Chinese that the syntactic behaviors of these words are different from each other, due to their distinctions in word constructions. |
Evaluation 11: Human Evaluation on ConnotationWordNet | We collect two separate sets of labels: a set of labels at the word-level , and another set at the sense-level. |
Evaluation 11: Human Evaluation on ConnotationWordNet | For word-level labels we apply similar procedure as above. |
Evaluation 11: Human Evaluation on ConnotationWordNet | Lexicon Word-level Sense-level SentiWordNet 27.22 14.29 OpinionFinder 3 1 .95 -Feng2013 62.72 -GWORD+SENSE(95%) 84.91 83.43 GWORD+SENSE(99%) 84.91 83.71 E-GWORD+SENSE(95%) 86.98 86.29 E-GWORD+SENSE(99%) 86.69 85.71 |
Introduction | For non-polysemous words, which constitute a significant portion of English vocabulary, learning the general connotation at the word-level (rather than at the sense-level) would be a natural operational choice. |
Introduction | As a result, researchers often would need to aggregate labels across different senses to derive the word-level label. |
Introduction | Therefore, in this work, we present the first unified approach that learns both sense- and word-level connotations simultaneously. |
Pairwise Markov Random Fields and Loopy Belief Propagation | We formulate the task of learning sense- and word-level connotation lexicon as a graph-based classification task (Sen et al., 2008). |
Evaluation | Degorski (2011) uses concatenation of word-level base forms assigned by the tagger as a baseline. |
Introduction | According to the lemmatisation principles accompanying the NCP tagset, adjectives are lemmatised as masculine forms (gléwny), hence it is not sufficient to take word-level lemma nor the orthographic form to obtain phrase lemmatisation. |
Introduction | It is worth stressing that even the task of word-level lemmatisation is nontrivial for inflectional languages due to a large number of inflected forms and even larger number of syncretisms. |
Phrase lemmatisation as a tagging problem | To show the real setting, this time we give full NCP tags and word-level lemmas assigned as a result of tagging. |
Phrase lemmatisation as a tagging problem | The notation cas=n om means that to obtain the desired form (e. g. gléwne) you need to find an entry in a morphological dictionary that bears the same word-level lemma as the inflected form (gféwny) and a tag that results from taking the tag of the inflected form (adj : sgzinst :n:pos) and setting the value of the tagset attribute cas (grammatical case) to the value nom (nominative). |
Phrase lemmatisation as a tagging problem | Our idea is simple: by expressing phrase lemmatisation in terms of word-level transformations we can reduce the task to tagging problem and apply well known Machine Learning techniques that have been devised for solving such problems (e. g. CRF). |
Preparation of training data | The development set was enhanced with word-level transformations that were induced automatically in the following manner. |
Preparation of training data | The dictionary is stored as a set of (orthographic form, word-level lemma, tag). |
Preparation of training data | The task is to find a suitable transformation for the given inflected form from the original phrase, its tag and word-level lemma, but also given the desired form being part of human-assigned lemma. |
Abstract | We propose a novel two-level model, which computes similarities between word-level vectors that are biased by topic-level context representations. |
Abstract | Evaluations on a naturally-distributed dataset show that our model significantly outperforms prior word-level and topic-level models. |
Background and Model Setting | However, while DIRT computes sim(v, 21’) over vectors in the original word-level space, topic-level models compute sim(d, d’, w) by measuring similarity of vectors in a reduced-dimensionality latent space. |
Background and Model Setting | slots in the original word-level space while biasing the similarity measure through topic-level context models. |
Introduction | To address this hypothesized caveat of prior context-sensitive rule scoring methods, we propose a novel generic scheme that integrates word-level and topic-level representations. |
Introduction | Rather than computing a single context-insensitive rule score, we compute a distinct word-level similarity score for each topic in an LDA model. |
Results | Specifically, topics are leveraged for high-level domain disambiguation, while fine grained word-level distributional similarity is computed for each rule under each such domain. |
Results | This result more explicitly shows the advantages of integrating word-level and context-sensitive topic-level similarities for differentiating valid and invalid contexts for rule applications. |
Two-level Context-sensitive Inference | Thus, our model computes similarity over word-level (rather than topic-level) argument vectors, while biasing it according to the specific argument words in the given rule application context. |
Two-level Context-sensitive Inference | The core of our contribution is thus defining the context-sensitive word-level vector similarity measure sim(v, v’ , w), as described in the remainder of this section. |
Two-level Context-sensitive Inference | This way, rather than replacing altogether the word-level values v(w) by the topic probabilities p(t|dv, w), as done in the topic-level models, we use the latter to only bias the former while preserving fine-grained word-level representations. |
Background | In the hybrid model, given an input sentence, a lattice that consists of word-level and character-level nodes is constructed. |
Background | Word-level nodes, which correspond to |
Background | In other words, we use word-level nodes to identify known words and character-level nodes to identify unknown words. |
Experiments | Since we were interested in finding an optimal combination of word-level and character-level nodes for training, we focused on tuning 7“. |
Policies for correct path selection | Ideally, we need to build a word-character hybrid model that effectively learns the characteristics of unknown words (with character-level nodes) as well as those of known words (with word-level nodes). |
Policies for correct path selection | If we select the correct path yt that corresponds to the annotated sentence, it will only consist of word-level nodes that do not allow learning for unknown words. |
Policies for correct path selection | We therefore need to choose character-level nodes as correct nodes instead of word-level nodes for some words. |
Training method | W0 <w0> for word-level W1 <p0> nodes |
Training method | Templates W0—W3 are basic word-level unigram features, where Length(w0) denotes the length of the word wo. |
Training method | Templates B0-B9 are basic word-level bigram features. |
Introduction | Word-level alignments are then drawn based on the tree alignment. |
Introduction | Finally, parallel sentences are assembled from these generated part-of-speech sequences and word-level alignments. |
Introduction | The model is trained using bilingual data with automatically induced word-level alignments, but is tested on purely monolingual data for each language. |
Model | word-level alignments, as observed data. |
Model | We obtain these word-level alignments using GIZA++ (Och and Ney, 2003). |
Model | Finally, word-level alignments are drawn based on the structure of the alignment tree. |
Related Work | Assuming that trees induced over parallel sentences have to exhibit certain structural regularities, Kuhn manually specifies a set of rules for determining when parsing decisions in the two languages are inconsistent with GIZA++ word-level alignments. |
Introduction | Most state-of—the-art statistical machine translation systems are based on large phrase tables extracted from parallel text using word-level alignments. |
Introduction | These word-level alignments are most often obtained using Expectation Maximization on the conditional generative models of Brown et al. |
Introduction | As these word-level alignment models restrict the word alignment complexity by requiring each target word to align to zero or one source words, results are improved by aligning both source-to-target as well as target-to-source, |
Phrasal Inversion Transduction Grammar | Combining the two approaches, we have a staged training procedure going from the simplest unconstrained word based model to a constrained Bayesian word-level ITG model, and finally proceeding to a constrained Bayesian phrasal model. |
Automatic Evaluation Method using Noun-Phrase Chunking | Secondly, the system calculates word-level scores based on the correct matched words using the determined correspondences of noun phrases. |
Automatic Evaluation Method using Noun-Phrase Chunking | The system calculates the final scores combining word-level scores and phrase-level scores. |
Automatic Evaluation Method using Noun-Phrase Chunking | 2.2 Word-level Score |
Experiments and Results | One should note that, in general, better performance was obtained when using character-level rather than word-level information. |
Experiments and Results | This confirms the results already reported by other researchers that have used character-level and word-level information for AA (Houvardas and Stamatatos, 2006; |
Experiments and Results | Also, n-gram information is more dense in documents than word-level information. |
Introduction | Also, we empirically show that local histograms at the character-level are more helpful than local histograms at the word-level for AA. |
Related Work | Some researchers have gone a step further and have attempted to capture sequential information by using n-grams at the word-level (Peng et al., 2004) or by discovering maximal frequent word sequences (Coyotl-Morales et al., 2006). |
Collaborative Decoding | 0 Word-level system combination (Rosti et al., 2007) of member decoders’ n-best outputs |
Discussion | Word-level system combination (system combination hereafter) (Rosti et al., 2007; He et al., 2008) has been proven to be an effective way to improve machine translation quality by using outputs from multiple systems. |
Experiments | We also implemented the word-level system combination (Rosti et al., 2007) and the hypothesis selection method (Hildebrand and Vogel, 2008). |
Experiments | Word-level Comb 4045/4085 2952/3035 Hypo Selection 40.09/40.50 29.02/29.71 |
Introduction | Most of the work focused on seeking better word alignment for consensus-based confusion network decoding (Matusov et al., 2006) or word-level system combination (He et al., 2008; Ayan et al., 2008). |
Introduction | We also conduct extensive investigations when different settings of co-decoding are applied, and make comparisons with related methods such as word-level system combination of hypothesis selection from multiple n-best lists. |
Experimental design | We report both word-level and letter-level error rates. |
Experimental design | The word-level error rate is the fraction of words on which a method makes at least one mistake. |
Experimental design | Specifically, for English our word-level accuracy (“ower”) is 96.33% while their best (“WA”) is 95.65%. |
Experimental results | For both languages, PAT GEN has higher serious letter-level and word-level error rates than TEX using the existing pattern files. |
History of automated hyphenation | The accuracy we achieve is slightly higher: word-level accuracy of 96.33% compared to their |
Abstract | While operating at the character level, the model makes use of word-level and contextual information. |
Conclusions | In the future, we plan to extend the model to use word-level language models to select between top character predictions in the output. |
Experiments | The word-error-rate WER metric is computed by summing the total number of word-level substitution errors, insertion errors, and deletion errors in the output, and dividing by the number of words in the reference. |
Related Work | Discriminative models have been proposed at the word-level for error correction (Duan et al., 2012) and for error detection (Habash and Roth, 2011). |
The GSEC Approach | We implemented another approach for error correction based on a word-level maximum likelihood model. |
Gaining Dependency Structures | Arrows in red (upper): PASS, orange (bottom) =word-level dependencies generated from PASs, blue=newly appended dependencies. |
Gaining Dependency Structures | In order to generate word-level dependency trees from the PCFG tree, we use the LTH constituent-to-dependency conversion tool3 written by J ohansson and Nugues (2007). |
Gaining Dependency Structures | Table 1 lists the mapping from HPSG’s PAS types to word-level dependency arcs. |
Conclusion | It is the first model of lexical-phonetic acquisition to include word-level context and to be tested on an infant-directed corpus with realistic phonetic variability. |
Conclusion | Whether trained using gold standard or automatically induced word boundaries, the model recovers lexical items more effectively than a system that assumes no phonetic variability; moreover, the use of word-level context is key to the model’s success. |
Introduction | Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model). |
Related work | In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model. |
Related work | Here, we use a naturalistic corpus, demonstrating that lexical-phonetic learning is possible in this more general setting and that word-level context information is important for doing so. |
Evaluation | Segmentation performance is measured using word-level precision (P), recall (R), and F-measure (F). |
Evaluation | The best performance result achieved by G2 in our experiment is 81.7 in word-level F-measure, although this was obtained from search setting (c), using a heuristic p value 0.37. |
Evaluation | It would be interesting to confirm this by studying the correlation between description length and word-level F-measure. |
Introduction | Furthermore, the word-level information is often augmented with the POS tags, which, along with segmentation, form the basic foundation of statistical NLP. |
Model | We use standard measures of word-level precision, recall, and F1 score, for evaluating each task. |
Related Works | To incorporate the word-level features into the character-based decoder, the features are decomposed into substring-level features, which are effective for incomplete words to have comparable scores to complete words in the beam. |
Polylingual Tree-based Topic Models | In this section, we bring existing tree-based topic models (Boyd-Graber et al., 2007, tLDA) and polylingual topic models (Mimno et al., 2009, pLDA) together and create the polylingual tree-based topic model (ptLDA) that incorporates both word-level correlations and document-level alignment information. |
Polylingual Tree-based Topic Models | Word-level Correlations Tree-based topic models incorporate the correlations between words by |
Polylingual Tree-based Topic Models | Build Prior Tree Structures One remaining question is the source of the word-level connections across languages for the tree prior. |
Introduction | In Section 2, we review the previous work on word-level confidence estimation which is used for error detection. |
Related Work | Ueffing and Ney (2007) exhaustively explore various word-level confidence measures to label each word in a generated translation hypothesis as correct or incorrect. |
Related Work | (2009) study several confidence features based on mutual information between words and n-gram and backward n-gram language model for word-level and sentence-level CE. |
Related Work | Nevertheless, any such word-level representation can be used to offset inherent sparsity problems associated with full lexi-calization (Cirik and Sensoy, 2013). |
Related Work | Word-level vector space embeddings have so far had limited impact on parsing performance. |
Related Work | While this method learns to map word combinations into vectors, it builds on existing word-level vector representations. |
Methods | In this section, we discuss how a lattice from a multi-stack phrase-based decoder such as Moses (Koehn et al., 2007) can be desegmented to enable word-level features. |
Methods | We now have a desegmented lattice, but it has not been annotated with an unsegmented ( word-level ) language model. |
Methods | Indeed, the expanded word-level context is one of the main benefits of incorporating a word-level LM. |
A Phrase-Based Error Model | Furthermore, the word-level alignments between Q and C can most often be identified with little ambiguity. |
A Phrase-Based Error Model | Thus we restrict our attention to those phrase transformations consistent with a good word-level alignment. |
Related Work | (2006) extend the error model by capturing word-level similarities learned from query logs. |
Abstract | We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees. |
Approach | A parallel corpus is word-level aligned using an alignment toolkit (Graca et al., 2009) and the source (English) is parsed using a dependency parser (McDonald et al., 2005). |
Introduction | For example, several early works (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Merlo et al., 2002) demonstrate transfer of shallow processing tools such as part-of-speech taggers and noun-phrase chunkers by using word-level alignment models (Brown et al., 1994; Och and Ney, 2000). |