Abstract | The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model), or according to its expected impact on the retrieval effectiveness (using a regression model). |
Bigram Expansion Model for Alteration Selection | The query context is modeled by a bigram language model as in (Peng et al. |
Bigram Expansion Model for Alteration Selection | In this work, we used bigram language model to calculate the probability of each path. |
Introduction | The query context is modeled by a bigram language model. |
Introduction | which is the most coherent with the bigram model. |
Introduction | We call this model Bigram Expansion. |
Related Work | 2007), a bigram language model is used to determine the alteration of the head word that best fits the query. |
Related Work | In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates. |
Metric Design Considerations | Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores. |
Metric Design Considerations | where in our experiments, we set N =3, representing calculation of unigram, bigram , and trigram scores. |
Metric Design Considerations | To determine the number matchbi of bigram matches, a system bigram (lsipsi, l8i+1psi+1) matches a reference bigram (lripme +1 pm. |
Evaluation | The results are shown in figure 1 for the whole daughters scoring method and in figure 2 for the bigram method. |
Evaluation | Figure 2: Bigram ungeneralizability (devo.) |
Rule dissimilarity and generalizability | To do this, we can examine the weakest parts of each rule and compare those across the corpus, to see which anomalous patterns emerge; we do this in the Bigram scoring section below. |
Rule dissimilarity and generalizability | Bigram scoring The other method of detecting ad hoc rules calculates reliability scores by focusing specifically on what the classes do not have in common. |
Rule dissimilarity and generalizability | We abstract to bigrams , including added START and END tags, as longer sequences risk missing generalizations; e. g., unary rules would have no comparable rules. |
Distributed Clustering | With each word w in one of these sets, all words 1) preceding w in the corpus are stored with the respective bigram count N (v, w). |
Distributed Clustering | While the greedy non-distributed exchange algorithm is guaranteed to converge as each exchange increases the log likelihood of the assumed bigram model, this is not necessarily true for the distributed exchange algorithm. |
Exchange Clustering | Beginning with an initial clustering, the algorithm greedily maximizes the log likelihood of a two-sided class bigram or trigram model as described in Eq. |
Exchange Clustering | With N g” and N fuc denoting the average number of clusters preceding and succeeding another cluster, B denoting the number of distinct bigrams in the training corpus, and I denoting the number of iterations, the worst case complexity of the algorithm is in: |
Exchange Clustering | When using large corpora With large numbers of bigrams the number of required updates can increase towards the quadratic upper bound as N?” and N 5% approach NC. |
Predictive Exchange Clustering | Modifying the exchange algorithm in order to optimize the log likelihood of a predictive class bigram model, leads to substantial performance improvements, similar to those previously reported for another type of one-sided class model in (Whittaker and Woodland, 2001). |
Predictive Exchange Clustering | We use a predictive class bigram model as given in Eq. |
Predictive Exchange Clustering | Then the following optimization criterion can be derived, with F(C) being the log likelihood function of the predictive class bigram model given a clustering C: |
Abstract | We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. |
Abstract | The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass. |
Decoding to Maximize BLEU | The outside-pass Algorithm 1 for bigram decoding can be generalized to the trigram case. |
Experiments | Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169. |
Introduction | First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model, and the second pass expands these states into trigram-based |
Introduction | With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder. |
Multi-pass LM-Integrated Decoding | More specifically, a bigram decoding pass is executed forward and backward to figure out the probability of each state. |
Multi-pass LM-Integrated Decoding | We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass. |
Multi-pass LM-Integrated Decoding | If we can afford a bigram decoding pass, the outside cost from a bigram model is conceivably a |
Experiments | Consistent with findings in the literature (Cui et al., 2006; Dave et al., 2003; Gamon and Aue, 2005), on the large corpus of movie review texts, the in-domain-trained system based solely on unigrams had lower accuracy than the similar system trained on bigrams . |
Experiments | But the trigrams fared slightly worse than bigrams . |
Experiments | On sentences, however, we have observed an inverse pattern: unigrams performed better than bigrams and trigrams. |
Factors Affecting System Performance | System runs with unigrams, bigrams , and trigrams as features and with different training set sizes are presented. |
Integrating the Corpus-based and Dictionary-based Approaches | In the ensemble of classifiers, they used a combination of nine SVM-based classifiers deployed to learn unigrams, bigrams , and trigrams on three different domains, while the fourth domain was used as an evaluation set. |
Evaluation | The remaining 93 unlabeled games are used to train unigram, bigram , and trigram grounded language models. |
Evaluation | Only unigrams, bigrams , and tri—grams that are not proper names, appear greater than three times, and are not composed only of stop words were used. |
Evaluation | with traditional unigram, bigram , and trigram language models generated from a combination of the closed captioning transcripts of all training games and data from the switchboard corpus (see below). |
Linguistic Mapping | Estimating bigram and trigram models can be done by processing on word pairs or triples, and performing normalization on the resulting conditional distributions. |
Introduction | 0 The extension of the feature representation used by previous works with bigrams and trigrams and an evaluation of the benefit of using longer keywords in hedge classification. |
Methods | For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate. |
Results | These keywords were many times used in a mathematical context (referring to probabilities) and thus expressed no speculative meaning, while such uses were not represented in the FlyBase articles (otherwise bigram or trigram features could have captured these non-speculative uses). |
Results | One third of the features used by our advanced model were either bigrams or trigrams. |
Results | Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively). |
Learning what to transliterate | From the stat section we collect statistics as to how often every word, bigram or trigram occurs, and what distribution of name/non-name patterns these ngrams have. |
Learning what to transliterate | The name distribution bigram |
Learning what to transliterate | stat corpus bitext, the first word is a marked up as a non-name (”0”) and the second as a name (”1”), which strongly suggests that in such a bigram context, aljzyre better be translated as island or peninsula, and not be transliterated as Al-Jazeera. |
Transliterator | The same consonant skeleton indexing process is applied to name bigrams (47,700,548 unique with 167,398,054 skeletons) and trigrams (46,543,712 unique with 165,536,451 skeletons). |
Approach | Bigrams (B) - the text is represented as a bag of bigrams (larger n-grams did not help). |
Approach | Generalized bigrams (Bg) - same as above, but the words are generalized to their WNSS. |
Experiments | The first chosen feature is the translation probability computed between the B 9 question and answer representations ( bigrams with words generalized to their WNSS tags). |
Experiments | This is caused by the fact that the BM25 formula is less forgiving with errors of the NLP processors (due to the high idf scores assigned to bigrams and dependencies), and the WNSS tagger is the least robust component in our pipeline. |
Syllabification with Structured SVMs | For example, the bigram bl frequently occurs within a single English syllable, while the bigram lb generally straddles two syllables. |
Syllabification with Structured SVMs | Thus, in addition to the single-letter features outlined above, we also include in our representation any bigrams , trigrams, four-grams, and five-grams that fit inside our context window. |
Syllabification with Structured SVMs | As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set. |
Experiments | Local instances NonLocal instances Rule 10, 85 1 ParentRule 18, 019 Word 20, 328 WProj 27, 417 WordEdges 454, 101 Heads 70, 013 CoLenPar 22 HeadTree 67, 836 Bigram <> 10, 292 Heavy 1, 401 Trigram<> 24, 677 NGramTree 67, 559 HeadMod<> 12, 047 RightBranch 2 DistMod<> 16, 017 Total Feature Instances: 800, 582 |
Experiments | We also restricted NGramTree to be on bigrams only. |
Forest Reranking | 2 (d) returns the minimum tree fragement spanning a bigram , in this case “saw” and “the”, and should thus be computed at the smallest common ancestor of the two, which is the VP node in this example. |
Conclusion and future work | We then investigated adaptor grammars that incorporate one additional kind of information, and found that modeling collocations provides the greatest improvement in word segmentation accuracy, resulting in a model that seems to capture many of the same interword dependencies as the bigram model of Goldwater et al. |
Word segmentation with adaptor grammars | It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance. |
Word segmentation with adaptor grammars | This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model. |
Background 2.1 Dependency parsing | The algorithm then repeatedly merges the pair of clusters which causes the smallest decrease in the likelihood of the text corpus, according to a class-based bigram language model defined on the word clusters. |
Conclusions | To begin, recall that the Brown clustering algorithm is based on a bigram language model. |
Feature design | (2005a), and consists of indicator functions for combinations of words and parts of speech for the head and modifier of each dependency, as well as certain contextual tokens.1 Our second-order baseline features are the same as those of Carreras (2007) and include indicators for triples of part of speech tags for sibling interactions and grandparent interactions, as well as additional bigram features based on pairs of words involved these higher-order interactions. |