System Architecture | To derive word features, first of all, our system automatically collect a list of word unigrams and bigrams from the training data. |
System Architecture | To avoid overfitting, we only collect the word unigrams and bigrams whose frequency is larger than 2 in the training set. |
System Architecture | This list of word unigrams and bigrams are then used as a unigram-dictionary and a bigram-dictionary to generate word-based unigram and bigram features. |
Abstract | We consider bigram and trigram templates for generating potentially deterministic constraints. |
Abstract | bigram constraint includes one contextual word (w_1|w1) or the corresponding morph feature; and a trigram constraint includes both contextual words or their morph features. |
Abstract | precision recall F1 bigram 0.993 0.841 0.911 trigram 0.996 0.608 0.755 |
MWE-dedicated Features | We use word unigrams and bigrams in order to capture multiwords present in the training section and to extract lexical cues to discover new MWEs. |
MWE-dedicated Features | For instance, the bigram coup de is often the prefix of compounds such as coup de pied (kick), coup de foudre (love at first sight), coup de main (help). |
MWE-dedicated Features | We use part-of-speech unigrams and bigrams in order to capture MWEs with irregular syntactic structures that might indicate the id-iomacity of a word sequence. |
A Class-based Model of Agreement | The features are indicators for (character, position, label) triples for a five Character window and bigram label transition indicators. |
A Class-based Model of Agreement | Bigram transition features gbt encode local agreement relations. |
A Class-based Model of Agreement | We trained a simple add-1 smoothed bigram language model over gold class sequences in the same treebank training data: |
Capturing Paradigmatic Relations via Word Clustering | The quality is defined based on a class-based bigram language model as follows. |
Capturing Paradigmatic Relations via Word Clustering | The objective function is maximizing the likelihood H2121 P(w7;|w1, ..., wi_1) of the training data given a partially class-based bigram model of the form |
State-of-the-Art | Word bigrams : w_2_w_1, w_1_w, w_w+1, w+1_w+2; In order to better handle unknown words, we extract morphological features: character n- gram prefixes and suffixes for n up to 3. |
State-of-the-Art | (2009) introduced a bigram HMM model with latent variables (Bi gram HMM-LA in the table) for Chinese tagging. |
State-of-the-Art | Trigram HMM (Huang et al., 2009) 93.99% Bigram HMM-LA (Huang et al., 2009) 94.53% Our tagger 94.69% |
Conclusion | We have presented a noisy-channel model that simultaneously learns a lexicon, a bigram language model, and a model of phonetic variation, while using only the noisy surface forms as training data. |
Introduction | Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model). |
Introduction | Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability. |
Related work | In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model. |
Discussion | In the figure, phone bigram TF—IDF is labeled p2; phonetic alignment with dynamic programming is labeled DP. |
Experiments | The TF-IDF features used in the experiments are based on phone bigrams . |
Feature functions | In practice, we only consider n-grams of a certain order (e. g., bigrams ). |
Feature functions | Then for the bi-gram /1 iy/, we have TF/liy/(fo) = 1/5 (one out of five bigrams in 1—9), and IDF /1 iy / = log(2 / 1) (one word out of two in the dictionary). |
Experiments | In particular, we use the unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, all-number, punctuation, and tag bigrams for POS, CoNLL2000 and CoNLL 2003 datasets. |
Experiments | For supertag dataset, we use the same features for the word inputs, and the unigrams and bigrams for gold POS inputs. |
Problem formulation | Bigram features are of form fk (yt, yt_1, xt) which are concerned with both the previous and the current labels. |
Experimental Design | Consecutive Word/Bigram/Trigram This feature family targets adjacent repetitions of the same word, bigram or trigram, e.g., ‘show me the show me the |
Problem Formulation | The weight of this rule is the bigram probability of two records conditioned on their type, multiplied with a normalization factor 7». |
Problem Formulation | Rule (6) defines the expansion of field F to a sequence of (binarized) words W, with a weight equal to the bigram probability of the current word given the previous word, the current record, and field. |
Structure-based Stacking | 0 Character unigrams: ck (i — l S k: S i + l) 0 Character bigrams : ckck+1 (i — l S k: < i + l) |
Structure-based Stacking | 0 Character label bigrams : cgpdcgffi (i — lppd S |
Structure-based Stacking | 0 Bigram features: C(sk)C(sk+1) (i — [C S k; < 73 + lo), Tctb(5k)Tctb(3k+1) (i — 1ng g k; < i +1390), Tppd(5k)Tppd(3k+1) (73 — lgpd S k: < 73+ zgpd) |