Evaluation | Word Generation Tools and Settings For unsupervised learning of morphology, we use Morfessor CAT-MAP (V. 0.9.2) which was shown to be a very accurate morphological analyzer for morphologically rich languages (Creutz and Lagus, 2007). |
Evaluation | and thus we also have a morphological analyzer that can give all possible segmentations for a given word. |
Evaluation | By running the morphological analyzer on the OOVs, we can have the potential upper bound of OOV reduction by the system (labeled “oo” in Tables 2 and 3). |
Introduction | For low-resource languages, resources such as morphological analyzers are not usually available, and even good scholarly descriptions of the morphology (from which a tool could be built) are often not available. |
Abstract | In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis . |
Abstract | Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved. |
Conclusion | In our error analysis, we believe that by exploring the character-level POS and the internal word structure (Zhang et a1., 2013) at the same time, it is possible to further improve the performance of morphological analysis and parsing. |
Conclusion | Corpus-based Japanese Morphological Analysis . |
Evaluation | In Table 6 we compare our approach with morphological analyzers in previous studies. |
Introduction | Therefore, compared to word-level POS, the character-level POS can produce information for more expressive features during the learning process of a morphological analyzer . |
Introduction | In this paper, we investigate the usefulness of character-level P08 in the task of Chinese morphological analysis . |
Introduction | Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved. |
Conclusion | In the future, we plan to explore introducing multiple segmentation options into the lattice, and the application of our method to a full morphological analysis (as opposed to segmentation) of the target language. |
Related Work | The transformation might take the form of a morphological analysis or a morphological segmentation. |
Related Work | 2.1 Morphological Analysis |
Related Work | Many languages have access to morphological analyzers , which annotate surface forms with their lemmas and morphological features. |
MT System Selection | These features rely on language models, MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary. |
MT System Selection | These features are: sentence length (in words), percentage of selected words and phrases, number of selected words, number of selected phrases, number of words morphologically selected as dialectal by a mainly Levantine morphological analyzer , number of words selected as dialectal by the tool’s DA-MSA lexicons, number of OOV words against the MSA-Pivot system training data, number of words in the sentences that appeared less than 5 times in the training data, number of words in the sentences that appeared between 5 and 10 times in the training data, number of words in the sentences that appeared between 10 and 15 times in the training data, number of words that have spelling errors and corrected by this tool (e.g., word-lengthening), number of punctuation marks, and number of words that are written in Latin script. |
Machine Translation Experiments | The MSA portion of the Arabic side is segmented according to the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Sadat and Habash, 2006) using the MADA+TOKAN morphological analyzer and tok-enizer v3.1 (Roth et al., 2008), while the DA portion is ATB-tokenized with MADA-ARZ (Habash et al., 2013). |
Related Work | Sawaf (2010) and Salloum and Habash (2013) used hybrid solutions that combine rule-based algorithms and resources such as leXicons and morphological analyzers with statistical models to map DA to MSA before using MSA-to-English MT systems. |
Conclusion | As a next step, we will focus on morphological analysis and disambiguation of Turkish words. |
Conclusion | After determining the correct morphological analysis of Turkish words, we will use the parts of these analyses to replace the leaf nodes that we intentionally left as “*NONE*”. |
Corpus construction strategy | corresponds to the morphological analysis “gec-NEG-FUT-ZSG” of the verb “gecmeyeceksin”. |