Abstract | Much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems associated with rich scoring functions . |
Abstract | In contrast, we demonstrate that highly expressive scoring functions can be used with substantially simpler inference procedures. |
Introduction | Dependency parsing is commonly cast as a maximization problem over a parameterized scoring function . |
Introduction | In this view, the use of more expressive scoring functions leads to more challenging combinatorial problems of finding the maximizing parse. |
Introduction | We depart from this view and instead focus on using highly expressive scoring functions with substantially simpler inference procedures. |
Boosting-style algorithm | The predictor CHEW“ returned by our boosting algorithm is based on a scoring function h: X x y —> R, which, as for standard ensemble algorithms such as AdaBoost,Ai/s a~convex combination of base scoring functions ht: h 2 23:1 atht, with at 2 0. |
Boosting-style algorithm | The base scoring functions used in our algorithm have the form |
Boosting-style algorithm | Thus, the~score assigned to y by the base scoring function ht is the number of positions at which y matches the prediction of path expert ht given input X. CHEW“ is defined as follows in terms of h or hts: |
Online learning approach | A collection of distributions 1P can also be used to define a deterministic prediction rule based on the scoring function approach. |
Online learning approach | The majority vote scoring function is defined by |
Abstract | Given the agenda graph and n-best hypotheses, the system can predict the next system actions to maximize multilevel score functions . |
Agenda Graph | the score function based on current input and discourse structure given the focus stack. |
Greedy Selection with n-best Hypotheses | Therefore, we need to select the hypothesis that maximizes the scoring function among a set of n-best hypotheses of each utterance. |
Greedy Selection with n-best Hypotheses | Secondly, the multilevel score functions are computed for each candidate node Ci given a hypothesis hi. |
Greedy Selection with n-best Hypotheses | Otherwise, the best node which would be pushed onto the focus stack must be selected using multilevel score functions . |
Discussion and Future Directions | Table 2: A summarized answer composed of five different portions of text generated with the SH scoring function ; the chosen best answer is presented for comparison. |
Experiments | At this point a second version of the dataset was created to evaluate the summarization performance under scoring function (6) and (7); it was generated by manually selecting questions that arouse subjective, human interest from the previous 89,814 question-answer pairs. |
Experiments | Figure 2: Increase in ROUGE-L, ROUGE-l and ROUGE-2 performances of the S H system as more measures are taken in consideration in the scoring function , starting from Relevance alone (R) to the complete system (RQNC). |
Experiments | In order to determine what influence the single measures had on the overall performance, we conducted a final experiment on the filtered dataset to evaluate (the SH scoring function was used). |
The summarization framework | 2.5 The concept scoring functions |
The summarization framework | Analogously to what had been done with scoring function (6), the (I) space was augmented with a dimension representing the |
The summarization framework | The concept score for the same BE in two separate answers is very likely to be different because it belongs to answers with their own Quality and Coverage values: this only makes the scoring function context-dependent and does not interfere with the calculation the Coverage, Relevance and Novelty measures, which are based on information overlap and will regard two BEs with overlapping equivalence classes as being the same, regardless of their score being different. |
Related Work | We operationalize these notions using a scoring function that quantifies the compatibility between arbitrary cluster pairs. |
Split-Merge Role Induction | Besides being inefficient, it requires a scoring function with comparable scores for arbitrary pairs of clusters. |
Split-Merge Role Induction | After each completion of the inner loop, the thresholds contained in the scoring function (discussed below) are adjusted and this is repeated until some termination criterion is met (discussed in Section 5.2.3). |
Split-Merge Role Induction | 5.2.2 Scoring Function |
Abstract | We propose a context-sensitive topical PageRank method for keyword ranking and a probabilistic scoring function that considers both relevance and interestingness of keyphrases for keyphrase ranking. |
Experiments | We have proposed a context-sensitive topical PageRank method (cTPR) for the first step of keyword ranking, and a probabilistic scoring function for the third step of keyphrase ranking. |
Method | While a standard method is to simply aggregate the scores of keywords inside a candidate keyphrase as the score for the keyphrase, here we propose a different probabilistic scoring function . |
Method | tion (8) into Equation (3) and obtain the following scoring function for ranking: |
Method | Our preliminary experiments with Equation (9) show that this scoring function usually ranks longer keyphrases higher than shorter ones. |
Introduction | and Imamura (2001) propose some score functions based on the lexical similarity and co-occurrence. |
Substructure Spaces for BTKs | The baseline system uses many heuristics in searching the optimal solutions with alternative score functions . |
Substructure Spaces for BTKs | The baseline method proposes two score functions based on the lexical translation probability. |
Substructure Spaces for BTKs | They also compute the score function by splitting the tree into the internal and external components. |
Collaborative Decoding | 2.2 Generic Collaborative Decoding Model For a given source sentence f, a member model in co-decoding finds the best translation 6* among the set of possible candidate translations if (f) based on a scoring function F: |
Collaborative Decoding | where CIDm (f, e) is the score function of the mth baseline model, and each Wk(e,17-[k (f)) is a partial consensus score function with respect to dk and is defined over e and 17-[k (f): |
Collaborative Decoding | Note that in Equation 2, though the baseline score function CIDm (f, 6) can be computed inside each decoder, the case of Wk (ail-[k (f)) is more complicated. |
Integrated Models | As explained in section 2, both models essentially learn a scoring function 3 : X —> R, where the domain X is different for the two models. |
Integrated Models | The graph-based model, MSTParser, learns a scoring function 3(2', j, l) E R over labeled dependencies. |
Integrated Models | The transition-based model, MaltParser, learns a scoring function 3(0, 25) E R over configurations and transitions. |
Two Models for Dependency Parsing | The simplest parameterization is the arc-factored model that defines a real-valued score function for arcs 3(2', j, l) and further defines the score of a dependency graph as the sum of the |
Two Models for Dependency Parsing | Given a real-valued score function 3(0, 25) (for transition 75 out of configuration 0), parsing can be performed by starting from the initial configuration and taking the optimal transition 75* = arg maxtET 3(0, 25) out of every configuration 0 until a terminal configuration is reached. |
Two Models for Dependency Parsing | To learn a scoring function on transitions, these systems rely on discriminative learning methods, such as memory-based learning or support vector machines, using a strictly local learning procedure where only single transitions are scored (not complete transition sequences). |
Introduction | defining a score function f (:13, y) and locating the |
Introduction | In HMMs, the score function f (:13, y) is the joint probability distribution over (3:, If we assume a one-to-one correspondence between the hidden states and the labels, the score function can be written as: |
Introduction | In the perceptron, the score function f (:13, y) is given as f(a:,y) = w - qb(a:,y) where w is the weight vector, and qb(a:, y) is the feature vector representation of the pair (:13, By making the first-order Markov assumption, we have |
Statistical Paraphrase Generation | The PTs used in this work are constructed using different corpora and different score functions (Section 3.5). |
Statistical Paraphrase Generation | Let (51,72) be a pair of paraphrase units, their paraphrase likelihood is computed using a score function ¢pm(§i,fi). |
Statistical Paraphrase Generation | Suppose we have K PTs, (ski, {1%) is a pair of paraphrase units from the k-th PT with the score function gbk(§ki, £191. |
Structured Taxonomy Induction | Each factor F has an associated scoring function W, with the probability of a total assignment determined by the product of all these scores: |
Structured Taxonomy Induction | We score each edge by extracting a set of features f (55¢, :33) and weighting them by the (learned) weight vector w. So, the factor scoring function is: |
Structured Taxonomy Induction | The scoring function is similar to the one above: |
Compressive Summarization | Here, we follow the latter work, by combining a coverage score function g with sentence-level compression score functions hl, . |
Compressive Summarization | For the compression score function, we follow Martins and Smith (2009) and decompose it as a sum of local score functions pmg defined on dependency arcs: |
Extractive Summarization | By designing a quality score function g : {0, 1}N —> R, this can be cast as a global optimization problem with a knapsack constraint: |
Extractive Summarization | Then, the following quality score function is defined: |
Conclusions and Future Work | Since our algorithm requires that the objective function is the sum of word score functions , our proposed method has a restriction that we cannot use an arbitrary monotone submodular function as the objective function for the summary. |
Introduction | By formalizing the subtree extraction problem as this new maximization problem, we can treat the constraints regarding the grammaticality of the compressed sentences in a straightforward way and use an arbitrary monotone submodular word score function for words including our word score function (shown later). |
Joint Model of Extraction and Compression | The score function is supermodular as a score function of subtree extraction3, because the union of two subtrees can have extra word pairs that are not included in either subtree. |
Joint Model of Extraction and Compression | Our score function for a summary 8 is as follows: |
Dependency Parser | We present here an algorithm that runs the parser in pseudo-deterministic mode, greedily choosing at each configuration the transition that maximizes some score function . |
Dependency Parser | Algorithm 1 takes as input a string 21) and a scoring function score() defined over parser transitions and parser configurations. |
Dependency Parser | The scoring function will be the subject of §4 and is not discussed here. |
Model and Training | We use a linear model for the score function in Algorithm 1, and define score(t, c) = {I} - gb(t, 0). |
Abstract | Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function . |
Introduction | Our tree-based methods rely on a scoring function that allows for easy and flexible tailoring of sentence compression to the summarization task, ultimately resulting in significant improvements for MDS, while at the same time remaining competitive with existing methods in terms of sentence compression, as discussed next. |
Sentence Compression | postorder) as a sequence of nodes in T, the set L of possible node labels, a scoring function 8 for evaluating each sentence compression hypothesis, and a beam size N. Specifically, O is a permutation on the set {0, l, . |
Sentence Compression | Thus, the decoder is quite flexible — its learned scoring function allows us to incorporate features salient for sentence compression while its language model guarantees the linguistic quality of the compressed string. |
Abstract | The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. |
Introduction | Moreover, in order to better combine the strengths of the two models, the proposed approach uses a joint scoring function in a log-linear combination form for the decoding in the segmentation phase. |
Semi-supervised Learning via Co-regularizing Both Models | 3.4 The Joint Score Function for Decoding |
Semi-supervised Learning via Co-regularizing Both Models | This paper employs a log-linear interpolation combination (Bishop, 2006) to formulate a joint scoring function based on character-based and word-based models in the decoding: |
Introduction | the retrieved questions is formalized using a scoring function . |
Problem Formulation | Based on the weight function, we define a scoring function for assigning a score to each question in the corpus Q. |
Problem Formulation | For each token 3,, the scoring function chooses the term from Q haVing the maximum weight; then the weight of the n chosen terms are summed up to get the score. |
Introduction | We use this ranking to learn a linear scoring function on pairs of documents given a bilingual query. |
Introduction | these heuristics and our learned pairwise scoring function , we can derive a ranking for new, unseen bilingual queries. |
Learning to Rank Using Bilingual Information | Then we learn a linear scoring function for pairs of documents that exploits monolingual information (in both languages) and bilingual information. |
Architecture of BRAINSUP | Each partially lexicalized solution is scored by a battery of scoring functions that compete to generate creative sentences respecting the user specification U, as explained in Section 3.3. |
Architecture of BRAINSUP | Concerning the scoring of partial solutions and complete sentences, we adopt a simple linear combination of scoring functions . |
Architecture of BRAINSUP | ,fk] be the vector of scoring functions and w = [2120, . |
Generalized Expectation Criteria | unlabeled data), a model distribution p A(y|x), and a score function 8: |
Generalized Expectation Criteria | In this paper, we use a score function that is the squared difference of the model expectation of G and some target expectation G: |
Generalized Expectation Criteria | The partial derivative of the KL divergence score function includes the same covariance term as above but substitutes a different multiplicative term: G / G ,\. |