MWE-dedicated Features | We use part-of-speech unigrams and bigrams in order to capture MWEs with irregular syntactic structures that might indicate the id-iomacity of a word sequence. |
MWE-dedicated Features | We also integrated mixed bigrams made up of a word and a part-of-speech . |
MWE-dedicated Features | We associate each word with its part-of-speech tags found in our external morphological lexicon. |
Multiword expressions | In this paper, we focus on contiguous MWEs that form a lexical unit which can be marked by a part-of-speech tag (e. g. at night is an adverb, because of is a preposition). |
Resources | Compounds are identified with a specific nonterminal symbol ”MWX” where X is the part-of-speech of the expression. |
Resources | They have a flat structure made of the part-of-speech of their components as shown in figure 1. |
Resources | The nonterminal tagset is composed of 14 part-of-speech labels and 24 phrasal ones (including 11 MWE labels). |
Two strategies, two discriminative models | Constant and Sigogne (2011) proposed to combine MWE segmentation and part-of-speech tagging into a single sequence labelling task by assigning to each token a tag of the form TAG+X where TAG is the part-of-speech (POS) of the leXical unit the token belongs to and X is either B (i.e. |
Hello. My name is Inigo Montoya. | Interestingly, this distinctiveness takes place at the level of words, but not at the level of other syntactic features: the part-of-speech composition of memorable quotes is in fact more likely with respect to newswire. |
Hello. My name is Inigo Montoya. | Thus, we can think of memorable quotes as consisting, in an aggregate sense, of unusual word choices built on a scaffolding of common part-of-speech patterns. |
Hello. My name is Inigo Montoya. | In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes. |
Never send a human to do a machine’s job. | We then develop models using features based on the measures formulated earlier in this section: generality measures (the four listed in Table 4); distinctiveness measures (likelihood according to l, 2, and 3-gram “common language” models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them); and similarity-to-slogans measures (likelihood according to l, 2, and 3-gram slogan-language models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them). |
Implementation Details | We first perform word segmentation (if needed) and part-of-speech tagging. |
Implementation Details | After that, we obtain the word-segmented sentences with the part-of-speech tags. |
Parsing with dependency language model | The feature templates are outlined in Table l, where TYPE refers to one of the typeszPL or PR, h_pos refers to the part-of-speech tag of :1: h, h_word refers to the lexical form of :1: h, ch_pos refers to the part-of-speech tag of mch, and ch_word refers to the lexical form of mm. |