Corpora and Parameters | F0 (upper bound for content word frequency in patterns) influences which words are considered as hook and target words. |
Corpora and Parameters | Since content words determine the joining of patterns into clusters, the more ambiguous a word is, the noisier the resulting clusters. |
Corpora and Parameters | The value we use for FH is lower than that used for F0, in order to allow as HFWs function words of relatively low frequency (e.g., ‘through’), while allowing as content words some frequent words that participate in meaningful relationships (e.g., ‘ game’). |
Pattern Clustering Algorithm | Following (Davidov and Rappoport, 2006), we classified words into high-frequency words (HFWs) and content words (CWs). |
BBC News Database | We randomly selected 240 image-caption pairs and manually assessed whether the caption content words (i.e., nouns, verbs, and adjectives) could describe the image. |
BBC News Database | We rank the document’s content words (i.e., nouns, verbs, and adjectives) according to their tf * idf weight and select the top k to be the final annotations. |
BBC News Database | Again we only use content words (the average title length in the training set was 4.0 words). |