Abstract | For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA—CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. |
Experiments | Although word-level BLEU has often been found inferior to the new-generation metrics when the target language is English or other European languages, prior research has shown that character-level BLEU is highly competitive when the target language is Chinese (Li et al., 2011). |
Experiments | word-level Character-level |
Experiments | word-level Character-level |
Motivation | In this work, we attempt to address both of these issues by introducing TESLA-CELAB, a character-level metric that also models word-level linguistic phenomenon. |
Conclusion | It is the first model of lexical-phonetic acquisition to include word-level context and to be tested on an infant-directed corpus with realistic phonetic variability. |
Conclusion | Whether trained using gold standard or automatically induced word boundaries, the model recovers lexical items more effectively than a system that assumes no phonetic variability; moreover, the use of word-level context is key to the model’s success. |
Introduction | Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model). |
Related work | In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model. |
Related work | Here, we use a naturalistic corpus, demonstrating that lexical-phonetic learning is possible in this more general setting and that word-level context information is important for doing so. |
Gaining Dependency Structures | Arrows in red (upper): PASS, orange (bottom) =word-level dependencies generated from PASs, blue=newly appended dependencies. |
Gaining Dependency Structures | In order to generate word-level dependency trees from the PCFG tree, we use the LTH constituent-to-dependency conversion tool3 written by J ohansson and Nugues (2007). |
Gaining Dependency Structures | Table 1 lists the mapping from HPSG’s PAS types to word-level dependency arcs. |
Introduction | Furthermore, the word-level information is often augmented with the POS tags, which, along with segmentation, form the basic foundation of statistical NLP. |
Model | We use standard measures of word-level precision, recall, and F1 score, for evaluating each task. |
Related Works | To incorporate the word-level features into the character-based decoder, the features are decomposed into substring-level features, which are effective for incomplete words to have comparable scores to complete words in the beam. |