Index of papers in Proc. ACL that mention
  • randomly sampled
Rehbein, Ines and Ruppenhofer, Josef
Related Work
Schein and Ungar observe that none of the 8 sampling methods investigated in their experiment achieved a significant improvement over the random sampling baseline on type b) errors.
Related Work
In fact, entropy sampling and margin sampling even showed a decrease in performance compared to random sampling .
Related Work
In the first setting, we randomly select new instances from the pool ( random sampling ; rand).
randomly sampled is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Dwyer, Kenneth and Kondrak, Grzegorz
Experimental setup
In order to speed up the experiments, a random sample of 2000 words was drawn from the pool and presented to the active learner each time.
Results
The dashed curves in Figure 1 represent the baseline performance with no clustering, no context ordering, random sampling , and ALINE, unless otherwise noted.
Results
For instance, on the Spanish dataset, random sampling reached 97% word accuracy after 1420 words had been annotated, whereas QBB did so with only 510 words — a 64% reduction in labelling effort.
Results
It is important to note that empirical comparisons of different active learning techniques have shown that random sampling establishes a very
randomly sampled is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
De Benedictis, Flavio and Faralli, Stefano and Navigli, Roberto
Comparative Evaluation
First, we randomly sampled 100 terms from our gold standard for each domain and each of the three languages.
Comparative Evaluation
Table 9: Number of domain glosses (from a random sample of 100 gold standard terms per domain) retrieved using Google Define and GlossBoot.
Comparative Evaluation
As for the precision of the extracted terms, we randomly sampled 50% of them for each system.
Experimental Setup
T0 calculate precision we randomly sampled 5% of the retrieved terms and asked two human annotators to manually tag their domain pertinence (with adjudication in case of disagreement; H = .62, indicating substantial agreement).
Experimental Setup
Precision was determined on a random sample of 5% of the acquired glosses for each domain and language.
randomly sampled is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
Training
To reduce computation, we employ NCE, which uses randomly sampled sentences from all target language sentences in Q as e‘, and calculate the expected values by a beam search with beam width W to truncate alignments with low scores.
Training
where 6+ is a target language sentence aligned to f+ in the training data, i.e., (f+, 6+) 6 T, e‘ is a randomly sampled pseudo-target language sentence with length |e+|, and N denotes the number of pseudo-target language sentences per source sentence f+.
Training
In a simple implementation, each 6— is generated by repeating a random sampling from a set of target words (V6) |e+| times and lining them up sequentially.
randomly sampled is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Krishnamurthy, Jayant and Mitchell, Tom
ConceptResolver
The regularization parameter A is automatically selected on each iteration by searching for a value which maximizes the loglikelihood of a validation set, which is constructed by randomly sampling 25% of L on each iteration.
Evaluation
Resolver precision can be interpreted as the probability that a randomly sampled sense (in a cluster with at least 2 senses) is in a cluster representing its true meaning.
Evaluation
To create this set, we randomly sampled noun phrases from each category and manually matched each noun phrase to one or more real-world entities.
Evaluation
To make this difference concrete, Figure 2 (first page) shows a random sample of 10 concepts from both company and athlete.
Introduction
Figure 2: A random sample of concepts created by ConceptResolver.
randomly sampled is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Paşca, Marius and Van Durme, Benjamin
Evaluation
The collection of queries is a random sample of fully-anonymized queries in English submitted by Web users in 2006.
Evaluation
To test whether that is the case, a random sample of 200 class labels, out of the 2,614 labels found to be potentially-useful specific concepts, are manually annotated as correct, subjectively correct or incorrect, as shown in Table 2.
Evaluation
Rather than inspecting a random sample of classes, the evaluation validates the results against a reference set of 40 gold-standard classes that were manually assembled as part of previous work (Pasca, 2007).
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Coppersmith, Glen and Van Durme, Benjamin
Conclusions and Future Work
This may be an effect of ‘sparseness’ of relevant user data, in that users talk about politics very sporadically compared to a random sample of their neighbors.
Identifying Twitter Social Graph
In the Fall of 2012, leading up to the elections, we randomly sampled n = 516 Democratic and m = 515 Republican users.
Identifying Twitter Social Graph
For each such user we collect recent tweets and randomly sample their immediate k = 10 neighbors from follower, friend, user mention, reply, retweet and hashtag social circles.
Identifying Twitter Social Graph
Similar to the candidate-centric graph, for each user we collect recent tweets and randomly sample user social circles in the Fall of 2012.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hashimoto, Chikara and Torisawa, Kentaro and Kloetzer, Julien and Sano, Motoki and Varga, István and Oh, Jong-Hoon and Kidawara, Yutaka
Experiments
For the test data, we randomly sampled 23,650 examples of (event causality candidate, original sentence) among which 3,645 were positive from 2,451,254 event causality candidates extracted from our web corpus (Section 3.1).
Experiments
Note that, for the diversity of the sampled scenarios, our sampling proceeded as follows: (i) Randomly sample a beginning event phrase from the generated scenarios.
Experiments
(ii) Randomly sample an effect phrase for the beginning event phrase from the scenarios.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Empirical Evaluation
the Max-Ent parameters A, we randomly sampled 500 terms from the held-out data (10 threads in our corpus which were excluded from the evaluation of tasks in §6.2, §6.3) appearing at least 10 times and labeled them as topical (361) or AD-expressions (139) and used the corresponding features of each term (in the context of posts where it occurs, §3) to train the Max-Ent model.
Empirical Evaluation
Instead, we randomly sampled 500 pairs (z 34% of the population) for evaluation.
Phrase Ranking based on Relevance
To compute coverage, we randomly sampled 500 documents from the corpus and listed the candidate n-grams3 in the collection of sampled 500 documents.
Phrase Ranking based on Relevance
We then computed the coverage to see how many of the relevant terms in the random sample were also present in top k phrases from the ranked candidate n-grams.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan
Discussion and Future Work
We therefore focused on comparing the performance of our two-level scheme with state-of-the-art prior topic-level and word-level models of distributional similarity, over a random sample of inference rule applications.
Experimental Settings
Rule applications were generated by randomly sampling extractions from ReVerb, such as ( ‘Jack’, ‘agree with’, ‘Jill ’) and then sampling possible rules for each, such as ‘agree with —> feel sorry for’.
Introduction
In order to promote replicability and equal-term comparison with our results, we based our experiments on publicly available datasets, both for unsupervised learning of the evaluated models and for testing them over a random sample of rule applications.
Results
However, our result suggests that topic-level models might not be robust enough when applied to a random sample of inferences.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pantel, Patrick and Fuxman, Ariel
Experimental Results
Table 3 lists query-product associations for five randomly sampled products along with their model scores from Pmle Pintp.
Experimental Results
We created two samples from the TEST dataset: one randomly sampled by taking click weights into account, and the other sampled uniformly at random.
Experimental Results
Table 3: Example query-product association scores for a random sample of five products.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ritter, Alan and Mausam and Etzioni, Oren
Experiments
For each of the 500 observed tuples in the test-set we generated a pseudo-negative tuple by randomly sampling two noun phrases from the distribution of NPs in both corpora.
Experiments
(2007) we randomly sampled 100 inference rules.
Experiments
We randomly sampled 300 of these inferences to hand-label.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mausam and Soderland, Stephen and Etzioni, Oren and Weld, Daniel and Skinner, Michael and Bilmes, Jeff
Empirical Evaluation
Ideally, we would like to evaluate a random sample of the more than 1,000 languages represented in PANDICTIONARY.5 However, a high-quality evaluation of translation between two languages requires a person who is fluent in both languages.
Empirical Evaluation
We provided our evaluators with a random sample of translations into their native language.
Empirical Evaluation
To carry out this comparison, we randomly sampled 1,000 senses from English Wiktionary and ran the three algorithms over them.
randomly sampled is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Pang, Bo
Introduction
In a random sample of recipe reviews from allrecipes.c0m, we found that 57.8% contain refinements of the original recipe.
Introduction
We created a new recipe data set, and manually labeled a random sample to evaluate our model and several baselines.
Models
In a manually labeled random sample of recipe reviews, we find that refinement segments tend to be clustered together in certain reviews (“bursty”), rather than uniformly distributed across all reviews.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming
Experiments
We use the Twigg SDK 7 to crawl all tweets from April 20th 2010 to April 25th 2010, then drop non-English tweets and get about 11,371,389, from which 15,800 tweets are randomly sampled , and are then labeled by two independent annotators, so that the beginning and the end of each named entity are marked with <TYPE> and </TYPE>, respectively.
Task Definition
This is based on an investigation of 12,245 randomly sampled tweets, which are manually labeled.
Task Definition
According to our investigation on 12,245 randomly sampled tweets that are manually labeled, about 46.8% have at least one named entity.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Flati, Tiziano and Navigli, Roberto
Experiment 1: Oxford Lexical Predicates
We show in Table 3 the precision@l<: calculated over a random sample of 50 lexical predicates.11 As can be seen, while the classes quality is pretty high with low values of k, performance gradually degrades as we let k increase.
Experiment 1: Oxford Lexical Predicates
Starting from the lexical predicate items obtained as described in Section 4.2, we selected those items belonging to a random sample of 20 usage notes among those provided by the Oxford dictionary, totaling 3,245 items.
Experiment 1: Oxford Lexical Predicates
For tuning 04 we used a held-out set of 8 verbs, randomly sampled from the lexical predicates not used in the dataset.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Park, Keun Chan and Jeong, Yoonjae and Myaeng, Sung Hyon
Conclusion and Future Work
The experimental results show that verb and verb phrase classification method is reasonably accurate with 91% precision and 78% recall with manually constructed gold standard consisting of 80 verbs and 82% accuracy for a random sample of all the WordNet entries.
Experience Detection
We randomly sampled l,000 sentences4 and asked three annotators to judge whether or not individual sentences are considered containing an experience based on our definition.
Lexicon Construction
We randomly sampled 200 items and examined how accurately the classification was done.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tanigaki, Koichi and Shiba, Mitsuteru and Munaka, Tatsuji and Sagisaka, Yoshinori
Conclusions
Moreover, our smoothing model, though unsupervised, provides reliable supervision when sufficiently random samples of words are available as nearby words.
Discussion
Generally speaking, statistical reliability increases as the number of random sampling increases.
Discussion
Therefore, we can conclude that if sufficiently random samples of nearby words are provided, our smoothing model is reliable, though it is trained in an unsupervised fashion.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Flati, Tiziano and Vannella, Daniele and Pasini, Tommaso and Navigli, Roberto
Phase 1: Inducing the Page Taxonomy
Taxonomy quality To evaluate the quality of our page taxonomy we randomly sampled 1,000 Wikipedia pages.
Phase 1: Inducing the Page Taxonomy
It was established by selecting the combination, among all possible permutations, which maximized precision on a tuning set of 100 randomly sampled pages, disjoint from our page dataset.
Phase 3: Category taxonomy refinement
Category taxonomy quality To estimate the quality of the category taxonomy, we randomly sampled 1,000 categories and, for each of them, we manually associated the super-categories which were deemed to be appropriate hypemyms.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
McIntosh, Tara and Curran, James R.
Seed diversity
We randomly sample Sgold from two sets of correct terms extracted from the evaluation cache.
Unsupervised bagging
One approach is to use uniform random sampling from restricted sections of Lhand.
Unsupervised bagging
We performed random sampling from the top 100, 200 and 500 terms of Lhand.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mitra, Sunny and Mitra, Ritwik and Riedl, Martin and Biemann, Chris and Mukherjee, Animesh and Goyal, Pawan
Conclusions
Through manual evaluation we found that the algorithm could correctly identify 60.4% birth cases from a set of 48 random samples and 57% split/join cases from a set of 21 randomly picked samples.
Evaluation framework
We selected 48 random samples of candidate words for birth cases and 21 random samples for split/join cases.
Evaluation framework
A further analysis of the words marked due to birth in the random samples indicates that there are 22 technology-related words, 2 slangs, 3 economics related words and 2 general words.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Data preparation
The parallel corpus is randomly sampled into two large and equally-sized parts.
Data preparation
The final test set is created by randomly sampling the desired number of test instances.
Experiments & Results
The final test sets are a randomly sampled 5, 000 sentence pairs from the 200, 000-sentence test split for each language pair.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bhagat, Rahul and Ravichandran, Deepak
Experimental Results
We estimate the quality of paraphrases by annotating a random sample as correct/incorrect and calculating the accuracy.
Experimental Results
We estimate the precision (P) of the extracted instances by annotating a random sample of instances as correct/incorrect.
Experimental Results
We randomly sampled 50 instances of the “acquisition” and “birthplace” relations from the system and the baseline outputs.
randomly sampled is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: