Discussion and Future Work | In the current formulation of TESLA-CELAB, two n-grams X and Y are either synonyms which completely match each other, or are completely unrelated. |
Experiments | Compared to BLEU, TESLA allows more sophisticated weighting of n-grams and measures of word similarity including synonym relations. |
Experiments | The covered n-gram matching rule is then able to award tricky n-grams such as TE, Ti, /1\ [E], 1/13 [IE5 and i9}. |
Motivation | For example, between ¥_l—?fi_$ and ¥_5l?, higher-order n-grams such as and still have no match, and will be penalized accordingly, even though ¥_l—?fi_5lk and ¥_5l? |
Motivation | N-grams such as which cross natural word boundaries and are meaningless by themselves can be particularly tricky. |
The Algorithm | Two n-grams are connected if they are identical, or if they are identified as synonyms by Cilin. |
The Algorithm | Notice that all n-grams are put in the same matching problem regardless of n, unlike in translation evaluation metrics designed for European languages. |
The Algorithm | This enables us to designate n-grams with different values of n as synonyms, such as (n = 2) and 5!k (n = 1). |
Data and Approach Overview | We represent each utterance a as a vector wu of Nu word n-grams (segments), wuj, each of which are chosen from a vocabulary W of fixed-size V. We use entity lists obtained from web sources (explained next) to identify segments in the corpus. |
Data and Approach Overview | Web n-Grams (G). |
Experiments | Our vocabulary consists of n—grams and segments (phrases) in utterances that are extracted using web n-grams and entity lists of §3. |
MultiLayer Context Model - MCM | * Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions. |
MultiLayer Context Model - MCM | In (1) we assume that entities (E) are more indicative of the domain compared to other n-grams (G) and should be more dominant in sampling decision for domain topics. |
MultiLayer Context Model - MCM | During Gibbs sampling, we keep track of the frequency of draws of domain, dialog act and slot indicating n-grams wj, in M D, M A and MS matrices, respectively. |
Semantics via Web Features | As the source of Web information, we use the Google n-grams corpus (Brants and Franz, 2006) which contains English n-grams (n = 1 to 5) and their Web frequency counts, derived from nearly 1 trillion word tokens and 95 billion sentences. |
Semantics via Web Features | Using the n-grams corpus (for n = l to 5), we collect co-occurrence Web-counts by allowing a varying number of wildcards between hl and hg in the query. |
Semantics via Web Features | 2These clusters are derived form the V2 Google n-grams corpus. |
BLEU and PORT | translation hypothesis to compute the numbers of the reference n-grams . |
Experiments | Both BLEU and PORT perform matching of n-grams up to n = 4. |
Experiments | In all tuning experiments, both BLEU and PORT performed lower case matching of n-grams up to n = 4. |
Experiments | The BLEU-tuned and Qmean-tuned systems generate similar numbers of matching n-grams, but Qmean-tuned systems produce fewer n-grams (thus, shorter translations). |
Experiments | This is mainly due to the additional minimum support constraint we added which discards many noisy lexical entries from infrequently seen n-grams . |
Online Lexicon Learning Algorithm | and the corresponding navigation plan, we first segment the instruction into word tokens and construct n-grams from them. |
Online Lexicon Learning Algorithm | From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. We then update the co-occurrence counts between all the n-grams w and all the connected subgraphs 9. |
Online Lexicon Learning Algorithm | We also update the counts of how many examples we have encountered so far and counts of the n-grams w and subgraphs 9. |
Evaluation | Table 3: MWE identification with CRF: base are the features corresponding to token properties and word n-grams . |
MWE-dedicated Features | Word n-grams . |
MWE-dedicated Features | POS n-grams . |
Experimental Design | Field bigrams/trigrams Analogously to the lexical features mentioned above, we introduce a series of nonlocal features that capture field n-grams , given a specific record. |
Related Work | Local and nonlocal information (e.g., word n-grams , long- |
Results | The l-BEST system has some grammaticality issues, which we avoid by defining features over lexical n-grams and repeated words. |
Conclusion and Future Work | In this paper, we only tried Dice coefficient of n-grams and symmetrical sentence level BLEU as similarity measures. |
Features and Training | Tl(e,e') is the propagating probability in equation (8), with the similarity measure Sim(e,e') defined as the Dice coefficient over the set of all n-grams in e and those in e'. |
Features and Training | where N Grn(x) is the set of n-grams in string x, and Dice (A, B) is the Dice coefficient over sets A and B: |
Introduction | While traditional LMs use word n-grams , where the n — 1 previous words predict the next word, newer models integrate long-span information in making decisions. |
Introduction | For example, incorporating long-distance dependencies and syntactic structure can help the LM better predict words by complementing the predictive power of n-grams (Chelba and Jelinek, 2000; Collins et al., 2005; Filimonov and Harper, 2009; Kuo et al., 2009). |
Syntactic Language Models | Structured language modeling incorporates syntactic parse trees to identify the head words in a hypothesis for modeling dependencies beyond n-grams . |
Experiments | Feature templates such as rule n-grams and rule shapes only work if iterative mixing (algorithm 3) or feature selection (algorithm 4) are used. |
Introduction | Such features include rule ids, rule-local n-grams , or types of rule shapes. |
Local Features for Synchronous CFGs | Rule n-grams: These features identify n-grams of consecutive items in a rule. |