Abstract | The idea draws on the observation that the lemmatisation of almost all Polish noun phrases may be decomposed into transformation of singular words (tokens) that make up each phrase. |
Conclusions and further work | We presented a novel approach to lemmatisation of Polish noun phrases . |
Introduction | Similar task may be defined for whole noun phrases (Degorski, 2011). |
Introduction | By lemmatisation of noun phrases (NPs) we will understand assigning each NP a grammatically correct NP corresponding to the same phrase that could stand as a dictionary entry. |
Phrase lemmatisation as a tagging problem | One of the assumptions of KPWr annotation is that actual noun phrases and prepositional phrases are labelled collectively as NP chunks. |
Phrase lemmatisation as a tagging problem | To obtain real noun phrases , phrase-initial prepositions must be stripped off3. |
Related works | Other named entity types may be realised as arbitrary noun phrases . |
Related works | As he notes, organisation names are often built of noun phrases , hence it is important to understand their internal structure. |
Detection of New Entities | To detect noun phrases that potentially refer to entities, we apply a part-of-speech tagger to the input text. |
Introduction | However, state-of-the-art open IE methods extract all noun phrases that are likely to denote entities. |
Introduction | ture are typed noun phrases . |
Introduction | Therefore, our setting resembles the established task of fine-grained typing for noun phrases (Fleis-chmann 2002), with the difference being that we disregard common nouns and phrases for prominent in-KB entities and instead exclusively focus on the difficult case of phrases that likely denote new entities. |
Related Work | Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities. |
Related Work | Noun phrases in the subject role in a large collection of fact triples are heuristically linked to Freebase entities. |
Introduction | In this paper, we focus on improving case prediction for noun phrases (NPs) in German translations. |
Introduction | German sentences exhibit a freer constituent order, and thus case is an important indicator of the grammatical functions of noun phrases . |
Introduction | In all four examples, the verb and the participating noun phrases Mitarbeiter (employee), Kollege (colleague) and Bericht (report) are identical, and the noun phrases are assigned the same case. |
Using subcategorization information | Verb—noun tuples referring to specific syntactic functions within verb subcategorization (verb—noun subcat case prediction) are integrated with an associated probability for accusative (direct object), dative (indirect object) and nominative (subject).6 Further to the subject and object noun phrases , the subcategorization information provides quantitative triples for verb—preposition—noun pairs, thus predicting the case of NPs within prepositional phrases (we do this only when the prepositions are ambiguious, i.e., they could subcategorize either a dative or an accusative NP). |
Using subcategorization information | In addition to modelling subcategorization information, it is also important to differentiate between subcategorized noun phrases (such as object or subject), and noun phrases |
Distributional Semantic Hidden Markov Models | Given a document consisting of a sequence of T clauses headed by propositional heads H (verbs or event nouns), and argument noun phrases fl, a DSHMM models the joint probability of observations H, fl, and latent random variables E and g representing domain events and slots respectively; i.e., P(H, fl, E, g |
Distributional Semantic Hidden Markov Models | We assume that event heads are verbs or event nouns, while arguments are the head words of their syntactically dependent noun phrases . |
Guided Summarization Slot Induction | First, the maximal noun phrases are extracted from the contributors and clustered based on the TAC slot of the contributor. |
Guided Summarization Slot Induction | These clusters of noun phrases then become the gold standard clusters against which automatic systems are compared. |
Guided Summarization Slot Induction | Noun phrases are considered to be matched if the lemmata of their head words are the same and they are extracted from the same summary. |
Model | 0 Generating Specification Tree: For each text specification, draw a specification tree 75 from all possible trees over the sequence of noun phrases in this specification. |
Model | For example, at the unigram level we aim to capture that noun phrases containing specific words such as “cases” and “lines” may be key phrases (correspond to data chunks appear in the input), and that verbs such as “contain” may indicate that the next noun phrase is a key phrase. |
Model | Total # of words 7330 Total # of noun phrases 1829 Vocabulary size 781 Avg. |
Problem Formulation | As input, we are given a set of text specifications w = {2121, - - - ,wN}, where each w is a text specification represented as a sequence of noun phrases We use UIUC shallow parser to preprocess each text specificaton into a sequence of the noun phrases.4 In addition, we are given a set of input examples for each wi. |
Problem Formulation | Our model predicts specification trees 1: = {751, - - - ,tN } for the text specifications, where each specification tree ti is a dependency tree over noun phrases In general many program input formats are nested tree structures, in which the tree root denotes the entire chunk of program input data and each chunk (tree node) can be further divided into sub-chunks or primitive fields that appear in the program input (see Figure 3). |
Target Candidate Ranking | Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities, noun phrases and events. |
Target Candidate Ranking | Therefore we limited the types of vertices into: Morph (M), Entity(E), which includes target candidates, Event (EV), and NonEntity Noun Phrases (NP); and used co-occnrrence as the edge type. |
Target Candidate Ranking | We extract entities, events, and nonentity noun phrases that occur in more than one tweet as neighbors. |
Large-Scale Harvesting of Semantic Predicates | We search the English Wikipedia for all the token sequences which match n, resulting in a list of noun phrases filling the * argument. |
Large-Scale Harvesting of Semantic Predicates | As can be seen, a wide range of noun phrases are extracted, from quantities such as glass and cap to other aspects, such as brand and constituent. |
Preliminaries | While in principle * could match any sequence of words, since we aim at generalizing nouns, in what follows we allow * to match only noun phrases (e.g., glass, hot cup, very big bottle, etc. |