Experiments and Discussions | We use R-l (recall against unigrams ), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams). |
Experiments and Discussions | Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams. |
Regression Model | (I) nGram Meta-Features (NMF): For each document cluster D, we identify most frequent (nonstop word) unigrams, i.e., vfreq {wiflzl C V, where 7“ is a model parameter of number of most frequent unigram features. |
Regression Model | We measure observed unigram probabilities for each 212,- E vfreq with pD(wi) = nD(w,-)/ 2'32, 7mm), where nD(w,-) is the number of times 212,- appears in D and |V| is the total number of unigrams . |
Regression Model | To characterize this feature, we reuse the 7“ most frequent unigrams , i.e., w,- E vfreq. |
Tree-Based Sentence Scoring | * sparse unigram distributions (siml) at each topic I on com: similarity between p(w0m,l 17 Com: vl) and p(wsn,l zsn : 17 Com: vl) |
Tree-Based Sentence Scoring | — siml: We define two sparse (discrete) unigram distributions for candidate 0m and summary 3,, at each node Z on a vocabulary identified with words generated by the topic at that node, v; C V. Given wom = {2111, ...,wl0m|}, let WOW; C wom be the set of words in am that are generated from topic zom at level I on path com. |
Tree-Based Sentence Scoring | The discrete unigram distribution pom; = p(w0m,l zom = l, cowvl) represents the probability over all words 2); assigned to topic zom at level 1, by sampling only for words in woml. |
Experiments | Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams , with and without learning the parametric edit distances. |
Learning | To find this maximizer for any given 7m, we need to find a marginal distribution over the edges connecting any two languages a and d. With this distribution, we calculate the expected “alignment unigrams.” That is, for each pair of phonemes cc and y (or empty phoneme 5), we need to find the quantity: |
Message Approximation | In the context of transducers, previous authors have focused on a combination of n-best lists and unigram back-off models (Dreyer and Eisner, 2009), a schematic diagram of which is in Figure 2(d). |
Message Approximation | Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009). |
Message Approximation | Another is to choose 7'(w) to be a unigram language model over the language in question with a geometric probability over lengths. |
Extensions of SemPOS | For the purposes of the combination, we compute BLEU only on unigrams up to fourgrams (denoted BLEUl, ..., BLEU4) but including the brevity penalty as usual. |
Extensions of SemPOS | This is also confirmed by the observation that using BLEU alone is rather unreliable for Czech and BLEU-l (which judges unigrams only) is even worse. |
Problems of BLEU | Fortunately, there are relatively few false positives in n-gram based metrics: 6.3% of unigrams and far fewer higher n-grams. |
Problems of BLEU | This amounts to 34% of running unigrams , giving enough space to differ in human judgments and still remain unscored. |
Introduction | where we explicitly distinguish the unigram feature function o; and bigram feature function Comparing the form of the two functions, we can see that our discussion on HMMs can be extended to perceptrons by substituting 2k wigbflwwn) and 2k wg¢%(w,yn_1,yn) for logp(:cn|yn) and 10gp(yn|yn—1)- |
Introduction | For unigram features, we compute the maximum, maxy 2k wligbflmw), as a preprocess in |
Introduction | In POS tagging, we used unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, and tag bigrams. |
Conditional Random Fields | Using only unigram features {fy,$}(y,$)€y>< X results in a model equivalent to a simple bag-of-tokens position-by-position logistic regression model. |
Conditional Random Fields | The same idea can be used when the set {My,$t+1}yey of unigram features is sparse. |
Conditional Random Fields | The features used in Nettalk experiments take the form fyflu ( unigram ) and fy/ww (bigram), where w is a n-gram of letters. |
Experimental results and discussions 6.1 Baseline experiments | Another is that BC utilizes a rich set of features to characterize a given spoken sentence while LM is constructed solely on the basis of the lexical ( unigram ) information. |
Experimental setup 5.1 Data | They are, respectively, the ROUGE-l ( unigram ) measure, the ROUGE-2 (bigram) measure and the ROUGE-L (longest common subsequence) measure (Lin, 2004). |
Proposed Methods | In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w) |
Proposed Methods | To mitigate this potential defect, a unigram probability estimated from a general collection, which models the general distribution of words in the target language, is often used to smooth the sentence model. |
Experimental Setup | The mapping of sentence labels to phrase labels was unsupervised: if the phrase came from a sentence labeled (1), and there was a unigram overlap (excluding stop words) between the phrase and any of the original highlights, we marked this phrase with a positive label. |
Experimental Setup | Our feature set comprised surface features such as sentence and paragraph position information, POS tags, unigram and bigram overlap with the title, and whether high-scoring tf.idf words were present in the phrase (66 features in total). |
Experimental Setup | We report unigram overlap (ROUGE-l) as a means of assessing informativeness and the longest common subsequence (ROUGE-L) as a means of assessing fluency. |