Experiments | Because the features for QA pairs are quite sparse and the content words in the questions are usually morphologically different from the ones with the same meaning in the answers, the Cosine Similarity method become less powerful. |
Learning with Homogenous Data | Figure 4 shows the percentage of the concurrent words in the top-ranked content words with high frequency. |
Learning with Homogenous Data | The number k on the horizontal axis in Figure 4 represents the top k content words in the |
Learning with Homogenous Data | Percentage of concurrent content words |
Model | During initial unsupervised parsing we experiment with incorporating knowledge through a combination of statistical priors favoring a skewed distribution of words into classes, and an initial hard clustering of the vocabulary into function and content words . |
Unsupervised Parsing | Because the function and content word preclus-tering preceded parameter estimation, it can be combined with either EM or VB learning. |
Unsupervised Parsing | Although this initial split forces sparsity on the emission matrix and allows more uniform sized clusters, Dirichlet priors may still help, if word clusters within the function or content word subsets vary in size and frequency. |
Method | Finally, because our focus is the influence of semantic context, we selected only content words whose prior sentential context contained at least two further content words . |
Models of Processing Difficulty | The model takes into account only content words , function words are of little interest here as they can be found in any context. |
Models of Processing Difficulty | common content words and each vector component is given by the ratio of the probability of a c,-given I to the overall probability of 6,. |