Introduction | Their experiments suggest that function words play a special role in the acquisition process: children learn function words before they learn the vast bulk of the associated content words , and they use function words to help identify context words. |
Introduction | Traditional descriptive linguistics distinguishes function words, such as determiners and prepositions, from content words , such as nouns and verbs, corresponding roughly to the distinction between functional categories and lexical categories of modern generative linguistics (Fromkin, 2001). |
Introduction | Function words differ from content words in at |
Word segmentation results | Thus, the present model, initially aimed at segmenting words from continuous speech, shows three interesting characteristics that are also exhibited by human infants: it distinguishes between function words and content words (Shi and Werker, 2001), it allows learners to acquire at least some of the function words of their language (e. g. (Shi et al., 2006)); and furthermore, it may also allow them to start grouping together function words according to their category (Cauvet et al., 2014; Shi and Melancon, 2010). |
Word segmentation with Adaptor Grammars | This means that “function words” are memoised independently of the “content words” that Word expands to; i.e., the model learns distinct “function word” and “content word” vocabularies. |
Experimental Setup | A Reverb argument is represented as the conjunction of its content words that appear more than 10 times in the corpus. |
Experimental Setup | The first, LEFTMOST, selects the leftmost content word for each predicate. |
Our Proposal: A Latent LC Approach | In our experiments we attempt to keep the approach maximally general, and define H p to be the set of all subsets of size 1 or 2 of content words in Wpl. |
Our Proposal: A Latent LC Approach | 1We use a POS tagger to identify content words . |
Our Proposal: A Latent LC Approach | Prepositions are considered content words under this definition. |
Causal Relations for Why-QA | A bansetsa is a syntactic constituent composed of a content word and several function words such as postpositions and case markers. |
Causal Relations for Why-QA | Our term matching method judges that a causal relation is a candidate of an appropriate causal relation if its effect part contains at least one content word (nouns, verbs, and adjectives) in the question. |
Causal Relations for Why-QA | The n-grams of 75 f1 and tfg are restricted to those containing at least one content word in a question. |
System Architecture | We retrieved documents from Japanese web texts using Boolean AND and OR queries generated from the content words in why-questions. |
Baselines | Since such correlation is more from the semantic perspective than the grammatical perspective, only content words are considered in our graph model, ignoring functional words (e.g., the, t0,. |
Baselines | Especially, the content words limited to those with part-of- |
Baselines | While the above word-based graph model can well capture the relatedness between content words , it can only partially model the focus of a negation eXpression since negation focus is more directly related with topic than content. |
Experiments | For ontologizing WT and OW, the bag of content words W is given by the content words in sense definitions and, if available, additional related words obtained from lexicon relations (see Section 3). |
Experiments | tively small in number, are already disambiguated and, therefore, the ontologization was just performed on the definition’s content words . |
Lexical Resource Ontologization | We first create the empty undirected graph G L = (V, E) such that V is the set of concepts in L and E = (D. For each source concept c E V we create a bag of content words W = {2121, . |
Lexical Resource Ontologization | ,wn} which includes all the content words in its definition d and, if available, additional related words obtained from lexicon relations (e.g., synonyms in Wiktionary). |
Lexical Resource Ontologization | The definition contains two content words : fruitn and conifern. |
Resource Alignment | In this component the personalization vector vi is set by uniformly distributing the probability mass over the nodes corresponding to the senses of all the content words in the extended definition of di according to the sense inventory of a semantic network H. We use the same semantic graph H for computing the semantic signatures of both definitions. |
Experiments | Because the features for QA pairs are quite sparse and the content words in the questions are usually morphologically different from the ones with the same meaning in the answers, the Cosine Similarity method become less powerful. |
Learning with Homogenous Data | Figure 4 shows the percentage of the concurrent words in the top-ranked content words with high frequency. |
Learning with Homogenous Data | The number k on the horizontal axis in Figure 4 represents the top k content words in the |
Learning with Homogenous Data | Percentage of concurrent content words |
The Idea | Our solution is to redefine DCS trees without the aid of any databases, by considering each node of a DCS tree as a content word in a sentence (but may no longer be a table in a specific database), while each edge represents semantic relations between two words. |
The Idea | 0 Content words: a content word (e.g. |
The Idea | A DCS tree ’2' = (N, 5) is defined as a rooted tree, where each node 0 E N is labeled with a content word 212(0) and each edge (a, 0’) E 5 C N x N is labeled with a pair of semantic roles (r, r’)7. |
Backgrounds | Particles are suffixes or tokens in Japanese grammar that immediately follow modified content words or sentences. |
Composed Rule Extraction | A chunk contains roughly one content word (usually the head) and affixed function words, such as case markers (e.g., go) and verbal morphemes (e.g., sa re to, which indicate past tense and passive voice). |
Introduction | This indicates that the alignments of the function words are more easily to be mistaken than content words . |
Related Research | Specially, we observed that most incorrect or ambiguous word alignments are caused by function words rather than content words . |
Corpora and Parameters | F0 (upper bound for content word frequency in patterns) influences which words are considered as hook and target words. |
Corpora and Parameters | Since content words determine the joining of patterns into clusters, the more ambiguous a word is, the noisier the resulting clusters. |
Corpora and Parameters | The value we use for FH is lower than that used for F0, in order to allow as HFWs function words of relatively low frequency (e.g., ‘through’), while allowing as content words some frequent words that participate in meaningful relationships (e.g., ‘ game’). |
Pattern Clustering Algorithm | Following (Davidov and Rappoport, 2006), we classified words into high-frequency words (HFWs) and content words (CWs). |
Conclusion | With this approach, using a stop list does not have a major effect on results for most relation classes, which suggests most of the word pairs affecting performance are content word pairs which may truly be semantically related to the discourse structure. |
Introduction | We show that our formulation outperforms the original one while requiring less features, and that using a stop list of functional words does not significantly affect performance, suggesting that these features indeed represent semantically related content word pairs. |
Word Pairs | An analysis in (Pitler et al., 2009) also shows that the top word pairs (ranked by information gain) all contain common functional words, and are not at all the semantically-related content words that were imagined. |
Experiments | We restrict the corpus to content words by retaining only words tagged as adj, n, part and v (adjectives, nouns, particles, and verbs). |
Experiments | As well as function words, we also remove the five most frequent content words (be, go, get, want, come). |
Experiments | On average, situations are only 59 words long, reflecting the relative lack of content words in CD8 utterances. |
Introduction | Since the information within the sentence is insufficient for topic modeling, we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected. |
Topic Similarity Model with Neural Network | One problem with auto-encoder is that it treats all words in the same way, making no distinguish-ment between function words and content words . |
Topic Similarity Model with Neural Network | For each positive instance ( f, e), we select 6’ which contains at least 30% different content words from 6. |
Content Selection | A valid indicator-argument pair should have at least one content word and satisfy one of the following constraints: |
Content Selection | For training data construction, we consider a relation instance to be a positive example if it shares any content word with its corresponding abstracts, and a negative example otherwise. |
Surface Realization | number of content words in indicator/argument number of content words that are also in previous DA indicator/argument only contains stopword? |
Architecture of BRAINSUP | ning, 2003) and produce the patterns by stripping away all content words from the parses. |
Evaluation | Two or three content words appearing in each slogan were randomly selected as the target words 1:. |
Evaluation | Furthermore, we only considered sentences in which all the content words are listed in WordNet (Miller, 1995) with the observed part of speech.8 The LSA space used for the semantic feature functions was also learned on BNC data, but in this case no filtering was applied. |
Application to Essay Scoring | Likewise, a feature that calculates the average PMI for all pairs of content word types in the text failed to produce an improvement over the baseline for setA pl-p6. |
Methodology | The second is which pairs of words in a text to consider when building a profile for the text; we opted for all pairs of content word types occurring in a text, irrespective of the distance between them. |
Methodology | Thus, the text “The dog barked and wagged its tail” is much tighter than the text “Green ideas sleep furiously”, with all the six content word pairs scoring above PMI=5.5 in the first and below PMI=2.2 in the second.4 |
Capturing Paradigmatic Relations via Word Clustering | Another interesting fact is that almost all of them are content words . |
Capturing Syntagmatic Relations via Constituency Parsing | 4.1.1 Content Words vs. Function Words |
Capturing Syntagmatic Relations via Constituency Parsing | The majority of the words that are better labeled by the tagger are content words , including nouns(NN, NR, NT), numbers (CD, OD), predicates (VA, VC, VE), adverbs (AD), nominal modifiers (JJ), and so on. |
Elementary Trees to String Grammar | (0/ , 7’ ) deletes a src content word |
Elementary Trees to String Grammar | (0/ , 7’ ) over generates a tgt content word ( v ) |
The Projectable Structures | The transformations could be as simple as merging two adjacent nonterminals into one bracket to accommodate non-contiguity on the target side, or lexicalizing those words which have fork-style, many—to—many alignment, or unaligned content words to enable the rest of the span to be generalized into nonterminals. |
Method | Finally, because our focus is the influence of semantic context, we selected only content words whose prior sentential context contained at least two further content words . |
Models of Processing Difficulty | The model takes into account only content words , function words are of little interest here as they can be found in any context. |
Models of Processing Difficulty | common content words and each vector component is given by the ratio of the probability of a c,-given I to the overall probability of 6,. |
Model | During initial unsupervised parsing we experiment with incorporating knowledge through a combination of statistical priors favoring a skewed distribution of words into classes, and an initial hard clustering of the vocabulary into function and content words . |
Unsupervised Parsing | Because the function and content word preclus-tering preceded parameter estimation, it can be combined with either EM or VB learning. |
Unsupervised Parsing | Although this initial split forces sparsity on the emission matrix and allows more uniform sized clusters, Dirichlet priors may still help, if word clusters within the function or content word subsets vary in size and frequency. |
Conclusion | or hr rl st vn frame 0 4 0 1 0 evoking word 3 4 7 3 0 ew & hw stem 9 34 20 8 0 ew & phrase type 11 7 11 3 1 head word 1 3 19 8 3 1 hw stem 1 1 17 8 8 1 content word 7 19 12 3 0 cw stem 1 1 26 13 5 0 cw P08 4 5 14 15 2 directed path 19 27 24 6 7 undirected path 21 35 17 2 6 partial path 15 18 16 13 5 last word 15 1 8 12 3 2 first word 1 1 23 53 26 10 supersense 7 7 35 25 4 position 4 6 30 9 5 others 27 29 3 3 19 6 total 188 298 313 152 50 |
Experiment and Discussion | The characteristics of x are: frame, frame evoking word, head word, content word (Surdeanu et al., 2003), first/last word, head word of left/right sister, phrase type, position, voice, syntactic path (di-rected/undirected/partial), governing category (Gildea and Jurafsky, 2002), WordNet supersense in the phrase, combination features of frame evoking word & headword, combination features of frame evoking word & phrase type, and combination features of voice & phrase type. |
Experiment and Discussion | associations with lexical and structural characteristics such as the syntactic path, content word , and head word. |
Alignment Link Confidence Measure | F-content and F-function are the F-scores for content words and function words, respectively. |
Alignment Link Confidence Measure | Overall it improves the F-score by 1.5 points (from 69.3 to 70.8), 1.8 point improvement for content words and 1.0 point for function words. |
Related Work | On the other hand, removing incorrect content word links produced cleaner phrase translation tables. |
BBC News Database | We randomly selected 240 image-caption pairs and manually assessed whether the caption content words (i.e., nouns, verbs, and adjectives) could describe the image. |
BBC News Database | We rank the document’s content words (i.e., nouns, verbs, and adjectives) according to their tf * idf weight and select the top k to be the final annotations. |
BBC News Database | Again we only use content words (the average title length in the training set was 4.0 words). |