Experiments 4.1 The data | Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context. |
Experiments 4.1 The data | For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops |
Experiments 4.1 The data | Unigram |
Introduction | We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms. |
The computational model | Our models build on the Unigram and the Bigram model introduced in Goldwater et al. |
The computational model | Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1). |
Model | For notational convenience, we use terms to denote both words ( unigrams ) and phrases (n-grams). |
Phrase Ranking based on Relevance | Topics in most topic models like LDA are usually unigram distributions. |
Phrase Ranking based on Relevance | For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution. |
Phrase Ranking based on Relevance | Yet another thread of research post-processes the discovered topical unigrams to form multi-word phrases using likelihood scores (Blei and Lafferty, 2009). |
Definitions | Given a ciphertext ffv , we define the unigram count Nf off 6 Vf as1 |
Definitions | Similarly, we define language model matrices S for the unigram and the bigram case. |
Definitions | The unigram language model Sf is defined as |
Introduction | In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP). |
Conclusion | However, oovs can be considered as n-grams (phrases) instead of unigrams . |
Conclusion | In this scenario, we also can look for paraphrases and translations for phrases containing oovs and add them to the phrase-table as new translations along with the translations for unigram oovs. |
Experiments & Results 4.1 Experimental Setup | Table 4: Intrinsic results of different types of graphs when using unigram nodes on Europarl. |
Experiments & Results 4.1 Experimental Setup | Type Node \ MRR % \ RCL % \ Bipartite unigram 5.2 12.5 bigram 6.8 15.7 Tripartite unigram 5.9 12.6 bigram 6.9 15.9 Baseline bigram 3.9 7.7 |
Experiments & Results 4.1 Experimental Setup | Table 5: Results on using unigram or bigram nodes. |
Reranking Features | Long-range Unigram . |
Reranking Features | in the parse tree: f(L2 «a left) = l and f(L4 «A turn) 2 l. Two-level Long-range Unigram . |
Reranking Features | Unigram . |
Experiments | 6This is because the weights of unigram to trigram features in a loglinear CRF model is a balanced consequence for maximization. |
Experiments | A unigram feature might end up with lower weight because another trigram containing this unigram gets a higher weight. |
Experiments | Then we would have missed this feature if we only used top unigram features. |
Method | Unigram QA Model The QA system uses up to trigram features (Table 1 shows examples of unigram and bigram features). |
Method | We drop this strict constraint (which may need further smoothing) and only use unigram features, not by simply extracting “good” unigram features from the trained model, but by retraining the model with only unigram features. |
Experiments | Besides unigram and bigram, the most effective textual feature is URL. |
Proposed Features | 3.1.1 Unigrams and Bigrams The most common type of feature for text classi- |
Proposed Features | feature selection method X2 (Yang and Pedersen, 1997) to select the top 200 unigrams and bigrams as features. |
Proposed Features | The top ten unigrams related to deceptive answers are shown on Table 1. |
New Sense Indicators | As such, we compute unigram log probabilities (via smoothed relative frequencies) of each word under consideration in the old domain and the new domain. |
New Sense Indicators | However, we do not simply want to capture unusual words, but words that are unlikely in context, so we also need to look at the respective unigram log probabilities: 635' and Eflgw. |
New Sense Indicators | From these four values, we compute corpus-level (and therefore type-based) statistics of the new domain n-gram log probability (Eflgw, the difference between the n-gram probabilities in each domain (623” — 6:51), the difference between the n-gram and unigram probabilities in the new domain (EQSW — 633‘”), and finally the combined difference: 623"” — [SSW + 63:: — 635’). |
Why Does Unsimplified Data Help? | This is particularly important for unigrams (i.e. |
Why Does Unsimplified Data Help? | Table 3 shows the percentage of unigrams , bigrams and trigrams from the two test sets that are found in the simple and normal training data. |
Why Does Unsimplified Data Help? | Even at the unigram level, the normal data contained significantly more of the test set unigrams than the simple data. |
Cue Discovery for Content Selection | .xm} consists of m unigram features representing the observed vocabulary used in our corpus. |
Experimental Results | We use a binary unigram feature space, and we perform 7-fold cross-va1idation. |
Prediction | One challenge of this approach is our underlying unigram feature space - tree-based algorithms are generally poor classifiers for the high-dimensionality, low-information features in a lexical feature space (Han et al., 2001). |
Prediction | splits than would unigrams alone. |
Experimental results | Note that unigrams in the models are never pruned, hence all models assign probabilities over an identical vocabulary and perplexity is comparable across models. |
Marginal distribution constraints | Thus the unigram distribution is with respect to the bigram model, the bigram model is with respect to the trigram model, and so forth. |
Model constraint algorithm | Thus we process each history length in descending order, finishing with the unigram state. |
Model constraint algorithm | This can be particularly clearly seen at the unigram state, which has an arc for every unigram (the size of the vocabulary): for every bigram state (also order of the vocabulary), in the naive algorithm we must look for every possible arc. |
Inference | where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1). |
Inference | where is the unigram count of character w, and n(t’) is the unigram count of tag 75, over all characters tokens (including 7.0). |
Inference | where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively. |
Task A: Polarity Classification | We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams. |
Task A: Polarity Classification | In this paper, we will refer from now on to n-grams as the combination of unigrams and bigrams. |
Task B: Valence Prediction | Those include n-grams ( unigrams , bigrams and combination of the two), LIWC scores. |
Multimodal Sentiment Analysis | We use a bag-of-words representation of the video transcriptions of each utterance to derive unigram counts, which are then used as linguistic features. |
Multimodal Sentiment Analysis | The remaining words represent the unigram features, which are then associated with a value corresponding to the frequency of the unigram inside each utterance transcription. |
Multimodal Sentiment Analysis | These simple weighted unigram features have been successfully used in the past to build sentiment classifiers on text, and in conjunction with Support Vector Machines (SVM) have been shown to lead to state-of-the-art performance (Maas et al., 2011). |
Experiments | As the backbone of our string-to-dependency system, we train 3-gram models for left and right dependencies and unigram for head using the target side of the bilingual training data. |
Introduction | In this way, we hope to upgrade the unigram formulation of existing reordering models to a higher order formulation. |
Related Work | Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007). |