Index of papers in Proc. ACL that mention
  • morphological analysis
Goldberg, Yoav and Tsarfaty, Reut
A Generative PCFG Model
The Input The set of analyses for a token is thus represented as a lattice in which every arc corresponds to a specific lexeme l, as shown in Figure l. A morphological analyzer M : W —> L is a function mapping sentences in Hebrew (W E W) to their corresponding lattices (M = L E L).
A Generative PCFG Model
212,, and a morphological analyzer , we look for the most probable parse tree 7r s.t.
A Generative PCFG Model
Since the lattice L for a given sentence W is determined by the morphological analyzer M we have
Experimental Setup
Morphological Analyzer Ideally, we would use an of-the-shelf morphological analyzer for mapping each input token to its possible analyses.
Experimental Setup
patible with the one of the Hebrew Treebank.8 For this reason, we use a data-driven morphological analyzer derived from the training data similar to (Cohen and Smith, 2007).
Experimental Setup
To control for the effect of the HSPELL-based pruning, we also experimented with a morphological analyzer that does not perform this pruning.
Model Preliminaries
We represent all morphological analyses of a given utterance using a lattice structure.
Previous Work on Hebrew Processing
Morphological analyzers for Hebrew that analyze a surface form in isolation have been proposed by Segal (2000), Yona and Wintner (2005), and recently by the knowledge center for processing Hebrew (Itai et al., 2006).
Previous Work on Hebrew Processing
Morphological dis-ambiguators that consider a token in context (an utterance) and propose the most likely morphological analysis of an utterance (including segmentation) were presented by Bar-Haim et a1.
Previous Work on Hebrew Processing
Tsarfaty (2006) used a morphological analyzer (Segal, 2000), a PoS tagger (Bar-Haim et al., 2005), and a general purpose parser (Schmid, 2000) in an integrated framework in which morphological and syntactic components interact to share information, leading to improved performance on the joint task.
morphological analysis is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Adler, Meni and Goldberg, Yoav and Gabay, David and Elhadad, Michael
Introduction
For the task of full morphological analysis, the lexicon must provide all possible morphological analyses for any given token.
Introduction
In this paper, we investigate the characteristics of Hebrew unknowns for full morphological analysis , and propose a new method for handling such unavoidable lack of information.
Introduction
In our evaluation, these learned distributions include the correct analysis for unknown words in 85% of the cases, contributing an error reduction of over 30% over a competitive baseline for the overall task of full morphological analysis in Hebrew.
Method
model application is a set of possible full morphological analyses for the token — in exactly the same format as the morphological analyzer provides.
Previous Work
Habash and Rambow (2006) used the root+pattern+features representation of Arabic tokens for morphological analysis and generation of Arabic dialects, which have no lexicon.
Previous Work
They report high recall (95%—98%) but low precision (37%—63%) for token types and token instances, against gold-standard morphological analysis .
Previous Work
Unlike Nakagawa, our model does not use any segmented text, and, on the other hand, it aims to select full morphological analysis for each token,
morphological analysis is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Spiegler, Sebastian and Flach, Peter A.
Abstract
The first algorithm PROMODES, which participated in the Morpho Challenge 2009 (an intema-tional competition for unsupervised morphological analysis ) employs a lower order model whereas the second algorithm PROMODES-H is a novel development of the first using a higher order model.
Introduction
This study is called morphological analysis .
Introduction
four tasks are assigned to morphological analysis : word decomposition into morphemes, building morpheme dictionaries, defining morphosyn-tactical rules which state how morphemes can be combined to valid words and defining mor-phophonological rules that specify phonological changes morphemes undergo when they are combined to words.
Introduction
Results of morphological analysis are applied in speech synthesis (Sproat, 1996) and recognition (Hirsimaki et al., 2006), machine translation (Amtrup, 2003) and information retrieval (Kettunen, 2009).
Related work
We have presented two probabilistic generative models for word decomposition, PROMODES and PROMODES-H. Another generative model for morphological analysis has been described by Snover and Brent (2001) and Snover et al.
Related work
Combining different morphological analysers has been performed, for example, by Atwell and Roberts (2006) and Spiegler et al.
morphological analysis is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Shen, Mo and Liu, Hongxiao and Kawahara, Daisuke and Kurohashi, Sadao
Abstract
In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis .
Abstract
Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved.
Conclusion
In our error analysis, we believe that by exploring the character-level POS and the internal word structure (Zhang et a1., 2013) at the same time, it is possible to further improve the performance of morphological analysis and parsing.
Conclusion
Corpus-based Japanese Morphological Analysis .
Evaluation
In Table 6 we compare our approach with morphological analyzers in previous studies.
Introduction
Therefore, compared to word-level POS, the character-level POS can produce information for more expressive features during the learning process of a morphological analyzer .
Introduction
In this paper, we investigate the usefulness of character-level P08 in the task of Chinese morphological analysis .
Introduction
Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved.
morphological analysis is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen
Evaluation
Word Generation Tools and Settings For unsupervised learning of morphology, we use Morfessor CAT-MAP (V. 0.9.2) which was shown to be a very accurate morphological analyzer for morphologically rich languages (Creutz and Lagus, 2007).
Evaluation
and thus we also have a morphological analyzer that can give all possible segmentations for a given word.
Evaluation
By running the morphological analyzer on the OOVs, we can have the potential upper bound of OOV reduction by the system (labeled “oo” in Tables 2 and 3).
Introduction
For low-resource languages, resources such as morphological analyzers are not usually available, and even good scholarly descriptions of the morphology (from which a tool could be built) are often not available.
morphological analysis is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Conclusion and Future Work
In order to help with replication of the results in this paper, we have run the various morphological analysis steps and created the necessary training, tuning and test data files needed in order to train, tune and test any phrase-based machine translation system with our data.
Conclusion and Future Work
We would particularly like to thank the developers of the open-source Moses machine translation toolkit and the Omorfi morphological analyzer for Finnish which we used for our experiments.
Experimental Results
So, we ran the word-based baseline system, the segmented model (Unsup L—match), and the prediction model (CRF—LM) outputs, along with the reference translation through the supervised morphological analyzer Omorfi (Piri—nen and Listenmaa, 2007).
Models 2.1 Baseline Models
performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline.
Related Work
Segmented translation performs morphological analysis on the morphologically complex text for use in the translation model (Brown et al., 1993; Goldwater and McClosky, 2005; de Gispert and Marifio, 2008).
Related Work
Previous work in segmented translation has often used linguistically motivated morphological analysis selectively applied based on a language-specific heuristic.
Translation and Morphology
In fact, in our experiments, unsupervised morphology always outperforms the use of a hand-built morphological analyzer .
morphological analysis is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan
Conclusion
For future work, we want to expand our work to other dialects, while utilizing dialectal morphological analysis to improve conversion.
Previous Work
Sawaf (2010) proposed a dialect to MSA normalization that used character-level rules and morphological analysis .
Previous Work
We tokenized Egyptian and Arabic according to the ATB tokenization scheme using the MADA+TOKAN morphological analyzer and to-kenizer v3.1 (Roth et al., 2008).
Proposed Methods 3.1 Egyptian to EG’ Conversion
Perhaps a morphological analyzer , or just a part-of-speech tagger, could enforce (or probabilistically encourage) a match in parts of speech.
Proposed Methods 3.1 Egyptian to EG’ Conversion
In particular, using a morphological analyzer seeems like a promising possibility.
Proposed Methods 3.1 Egyptian to EG’ Conversion
One approach could be to run a morphological analyzer for dialectal Arabic (e.g.
morphological analysis is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Chiarcos, Christian
Evaluation
Morphisto, for example, generates alternative morphological analyses , so that the disambiguation algorithm performs a random choice between these.
Extensions and Related Research
(V) Integration with other ontological knowledge sources in order to improve the recall of morphosyntactic and morphological analyses (e.g., for disambiguating grammatical case).
Extensions and Related Research
These observations provide further support for our conclusion that the ontology-based integration of morphosyntactic analyses enhances both the robustness and the level of detail of morphosyntactic and morphological analyses .
Ontologies and annotations
2.2 Integrating different morphosyntactic and morphological analyses
Processing linguistic annotations
(i) Morphisto, a morphological analyzer without contextual disambiguation (Zielinski and Simon, 2008),
morphological analysis is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina and Knight, Kevin
Introduction
In addition, morphological analysis plays a crucial role here, as highly frequent morpheme correspondences can be particularly revealing.
Introduction
In addition, our model carries out an implicit morphological analysis of the lost language, utilizing the known morphological structure of the related language.
Model
This interplay implicitly relies on a morphological analysis of words in the lost language, while utilizing knowledge of the known language’s lexicon and morphology.
Problem Formulation
rect morphological analysis of words in the lost language must be learned, we assume that the inventory and frequencies of prefixes and suffixes in the known language are given.
Problem Formulation
In summary, the observed input to the model consists of two elements: (i) a list of unanalyzed word types derived from a corpus in the lost language, and (ii) a morphologically analyzed lexicon in a known related language derived from a separate corpus, in our case nonparallel.
morphological analysis is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Inflection prediction models
Morphological analysis: returns the set of possible morphological analyses Aw = {a1, ..., a”} for w. A morphological analysis a is a vector of categorical values, where each dimension and its possible values are defined by L.
Inflection prediction models
For the morphological analysis operation, we used the same set of morphological features described in (Minkov et al., 2007), that is, seven features for Russian (POS, Person, Number, Gender, Tense, Mood and Case) and 12 for Arabic (POS, Person, Number, Gender, Tense, Mood, Negation, Determiner, Conjunction, Preposition, Object and Possessive pronouns).
Inflection prediction models
The same is true with the operation of morphological analysis .
Introduction
Work in this area is motivated by two advantages offered by morphological analysis : (1) it provides linguistically motivated clustering of words and makes the data less sparse; (2) it captures morphological constraints applicable on the target side, such as agreement phenomena.
morphological analysis is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Garrette, Dan and Mielens, Jason and Baldridge, Jason
Abstract
We also show that finite-state morphological analyzers are effective sources of type information when few labeled examples are available.
Data
While we do not explore a rule-writing approach to POS-tagging, we do consider the impact of rule-based morphological analyzers as a component in our semi-supervised POS-tagging system.
Introduction
We also did not consider morphological analyzers as a form of type supervision, as suggested by Merialdo (1994).
Introduction
Also, morphological analyzers help for morphologically rich languages when there are few labeled types or tokens (and, it never hurts to use them).
Morphological Transducers
We use FSTs for morphological analysis : the FST accepts a word type and produces a set of morphological features.
morphological analysis is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Abstract
On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes.
Experimental Setup and Results
On the Turkish side, we perform a full morphological analysis , (Oflazer, 1994), and morphological disambiguation (Yuret and Ture, 2006) to select the contextually salient interpretation of words.
Experimental Setup and Results
6For example, the morphological analyzer outputs +A3 s g to mark a singular noun, if there is no explicit plural morpheme.
Related Work
Goldwater and McClosky (2005) use morphological analysis on the Czech side to get improvements in Czech-to-English statistical machine translation.
morphological analysis is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Alignment Methods
morphological analyzers to normalize or split the sentence into morpheme streams (Corston-Oliver and Gamon, 2004).
Introduction
A myriad of methods have been proposed to handle each of these phenomena individually, including morphological analysis , stemming, compound breaking, number regularization, optimizing word segmentation, and transliteration, which we outline in more detail in Section 2.
Related Work on Data Sparsity in SMT
Previous works have attempted to handle morphology, decompounding and regularization through lemmatization, morphological analysis , or unsuperVised techniques (NieBen and Ney, 2000; Brown, 2002; Lee, 2004; Goldwater and McClosky, 2005; Talbot and Osborne, 2006; Mermer and Akin, 2010; Macherey et al., 2011).
Related Work on Data Sparsity in SMT
unified framework, requiring no language specific tools such as morphological analyzers or word seg-menters.
morphological analysis is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Experimental SetUp
This Bible edition is augmented by gold standard morphological analysis (including segmentation) performed by biblical scholars.
Experimental SetUp
We obtained gold standard segmentations of the Arabic translation with a handcrafted Arabic morphological analyzer which utilizes manually constructed word lists and compatibility rules and is further trained on a large corpus of hand-annotated Arabic data (Habash and Ram-bow, 2005).
Experimental SetUp
The accuracy of this analyzer is reported to be 94% for full morphological analyses , and 98%-99% when part-of-speech tag accuracy is not included.
Multilingual Morphological Segmentation
The underlying assumption of our work is that structural commonality across different languages is a powerful source of information for morphological analysis .
morphological analysis is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Conclusion
In the future, we plan to explore introducing multiple segmentation options into the lattice, and the application of our method to a full morphological analysis (as opposed to segmentation) of the target language.
Related Work
The transformation might take the form of a morphological analysis or a morphological segmentation.
Related Work
2.1 Morphological Analysis
Related Work
Many languages have access to morphological analyzers , which annotate surface forms with their lemmas and morphological features.
morphological analysis is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
MT System Selection
These features rely on language models, MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary.
MT System Selection
These features are: sentence length (in words), percentage of selected words and phrases, number of selected words, number of selected phrases, number of words morphologically selected as dialectal by a mainly Levantine morphological analyzer , number of words selected as dialectal by the tool’s DA-MSA lexicons, number of OOV words against the MSA-Pivot system training data, number of words in the sentences that appeared less than 5 times in the training data, number of words in the sentences that appeared between 5 and 10 times in the training data, number of words in the sentences that appeared between 10 and 15 times in the training data, number of words that have spelling errors and corrected by this tool (e.g., word-lengthening), number of punctuation marks, and number of words that are written in Latin script.
Machine Translation Experiments
The MSA portion of the Arabic side is segmented according to the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Sadat and Habash, 2006) using the MADA+TOKAN morphological analyzer and tok-enizer v3.1 (Roth et al., 2008), while the DA portion is ATB-tokenized with MADA-ARZ (Habash et al., 2013).
Related Work
Sawaf (2010) and Salloum and Habash (2013) used hybrid solutions that combine rule-based algorithms and resources such as leXicons and morphological analyzers with statistical models to map DA to MSA before using MSA-to-English MT systems.
morphological analysis is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ohno, Tomohiro and Murata, Masaki and Matsubara, Shigeki
Experiment
All the data are annotated with information on morphological analysis , clause boundary detection and dependency analysis by hand.
Linefeed Insertion Technique
In our method, a sentence, on which morphological analysis , bunsetsu segmentation, clause boundary analysis and dependency analysis are performed, is considered the input.
Preliminary Analysis about Linefeed Points
The data is annotated by hand with information on morphological analysis , bunsetsu segmentation, dependency analysis, clause boundary detection, and linefeeds insertion.
morphological analysis is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Naradowsky, Jason and Smith, David A.
Experimental Results
In these cases, the joint model, entertaining all morphological possibilities, was able to find the combination of links and morphological analyses that are collectively more likely.
Introduction
To date, studies of morphological analysis and dependency parsing have been pursued more or less independently.
Previous Work
Since space does not allow a full review of the vast literature on morphological analysis and parsing, we focus only on past research involving joint morphological and syntactic inference (§2.l); we then discuss Latin (§2.2), a language representative of the challenges that motivated our approach.
morphological analysis is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Elfardy, Heba and Diab, Mona
Approach to Sentence-Level Dialect Identification
The aforementioned approach relies on language models (LM) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification
Percentage of words in the sentence that is analyzable by an MSA morphological analyzer .
Approach to Sentence-Level Dialect Identification
Percentage of words in the sentence that is analyzable by an EDA morphological analyzer .
morphological analysis is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and Torisawa, Kentaro
Experiments
We used MeCab as a morphological analyzer and CaboCha14 (Kudo and Matsumoto, 2002) as the dependency parser to find the boundaries of the bunsetsu.
Gazetteer Induction 2.1 Induction by MN Clustering
After preprocessing the first sentence of an article using a morphological analyzer , MeCab9, we extracted the last noun after the appearance of Japanese postpo-sition “Oi (wa)” (% “is”).
Using Gazetteers as Features of NER
Asahara and Motsumoto (2003) proposed using characters instead of morphemes as the unit to alleviate the effect of segmentation errors in morphological analysis and we also used their character-based method.
morphological analysis is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh
Conclusion
As a next step, we will focus on morphological analysis and disambiguation of Turkish words.
Conclusion
After determining the correct morphological analysis of Turkish words, we will use the parts of these analyses to replace the leaf nodes that we intentionally left as “*NONE*”.
Corpus construction strategy
corresponds to the morphological analysis “gec-NEG-FUT-ZSG” of the verb “gecmeyeceksin”.
morphological analysis is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: