Introduction | Their experiments suggest that function words play a special role in the acquisition process: children learn function words before they learn the vast bulk of the associated content words , and they use function words to help identify context words. |
Introduction | Traditional descriptive linguistics distinguishes function words, such as determiners and prepositions, from content words , such as nouns and verbs, corresponding roughly to the distinction between functional categories and lexical categories of modern generative linguistics (Fromkin, 2001). |
Introduction | Function words differ from content words in at |
Word segmentation results | Thus, the present model, initially aimed at segmenting words from continuous speech, shows three interesting characteristics that are also exhibited by human infants: it distinguishes between function words and content words (Shi and Werker, 2001), it allows learners to acquire at least some of the function words of their language (e. g. (Shi et al., 2006)); and furthermore, it may also allow them to start grouping together function words according to their category (Cauvet et al., 2014; Shi and Melancon, 2010). |
Word segmentation with Adaptor Grammars | This means that “function words” are memoised independently of the “content words” that Word expands to; i.e., the model learns distinct “function word” and “content word” vocabularies. |
Experimental Setup | A Reverb argument is represented as the conjunction of its content words that appear more than 10 times in the corpus. |
Experimental Setup | The first, LEFTMOST, selects the leftmost content word for each predicate. |
Our Proposal: A Latent LC Approach | In our experiments we attempt to keep the approach maximally general, and define H p to be the set of all subsets of size 1 or 2 of content words in Wpl. |
Our Proposal: A Latent LC Approach | 1We use a POS tagger to identify content words . |
Our Proposal: A Latent LC Approach | Prepositions are considered content words under this definition. |
Experiments | For ontologizing WT and OW, the bag of content words W is given by the content words in sense definitions and, if available, additional related words obtained from lexicon relations (see Section 3). |
Experiments | tively small in number, are already disambiguated and, therefore, the ontologization was just performed on the definition’s content words . |
Lexical Resource Ontologization | We first create the empty undirected graph G L = (V, E) such that V is the set of concepts in L and E = (D. For each source concept c E V we create a bag of content words W = {2121, . |
Lexical Resource Ontologization | ,wn} which includes all the content words in its definition d and, if available, additional related words obtained from lexicon relations (e.g., synonyms in Wiktionary). |
Lexical Resource Ontologization | The definition contains two content words : fruitn and conifern. |
Resource Alignment | In this component the personalization vector vi is set by uniformly distributing the probability mass over the nodes corresponding to the senses of all the content words in the extended definition of di according to the sense inventory of a semantic network H. We use the same semantic graph H for computing the semantic signatures of both definitions. |
Baselines | Since such correlation is more from the semantic perspective than the grammatical perspective, only content words are considered in our graph model, ignoring functional words (e.g., the, t0,. |
Baselines | Especially, the content words limited to those with part-of- |
Baselines | While the above word-based graph model can well capture the relatedness between content words , it can only partially model the focus of a negation eXpression since negation focus is more directly related with topic than content. |
The Idea | Our solution is to redefine DCS trees without the aid of any databases, by considering each node of a DCS tree as a content word in a sentence (but may no longer be a table in a specific database), while each edge represents semantic relations between two words. |
The Idea | 0 Content words: a content word (e.g. |
The Idea | A DCS tree ’2' = (N, 5) is defined as a rooted tree, where each node 0 E N is labeled with a content word 212(0) and each edge (a, 0’) E 5 C N x N is labeled with a pair of semantic roles (r, r’)7. |
Introduction | Since the information within the sentence is insufficient for topic modeling, we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected. |
Topic Similarity Model with Neural Network | One problem with auto-encoder is that it treats all words in the same way, making no distinguish-ment between function words and content words . |
Topic Similarity Model with Neural Network | For each positive instance ( f, e), we select 6’ which contains at least 30% different content words from 6. |
Experiments | We restrict the corpus to content words by retaining only words tagged as adj, n, part and v (adjectives, nouns, particles, and verbs). |
Experiments | As well as function words, we also remove the five most frequent content words (be, go, get, want, come). |
Experiments | On average, situations are only 59 words long, reflecting the relative lack of content words in CD8 utterances. |