Abstract | We study non-neutral stochastic community models, and show that the presence of non-neutral processes is detectable if sample size is large enough and/or the amplitude of the effect is strong enough. |
Discussion | The power of a statistical test generally depends on three factors: first, the sample size ; second, statistical significance as measured by the threshold p-value used to assess significance; and third, the effect size, which quantifies departures from the null hypothesis. |
Discussion | Our results highlight the fact that the parameter I plays a more complicated role for these models than the sample size in standard power calculations, because the power does not always increase monotonically with I (Fig. |
Discussion | This means that the community size I plays a nonlinear role and is not a straight analogue of the sample size in standard statistical tests, so statistical power does not necessarily increase monotonically with I. |
Introduction | Our power calculation provides an estimate of the smallest sample size that is needed to detect non-neutrality of known intensity, and of the range strengths of non-neutrality needed to reject neutrality for a given species abundance data set. |
Power calculation for fixed non-neutral model parameters | The strength of non-neutral processes affects the sample size that is required in order to have a good chance of rejecting the neutral hypothesis (see Fig. |
Power calculation for fixed non-neutral model parameters | This appears counterintuitive because statistical power should increase monotonically with sample size . |
Statistical power calculation for fixed non-neutral model parameters | The power of a test will depend on the magnitude of the deviation from the null hypothesis —the so-called effect size—and on the quality of the data at hand, typically, sample size . |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | This result is likely due to the high noise level and low sampling size for that class, which dilutes discriminatory features between the mid/late exponential and stationary phases. |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | This is expected, since that class corresponds to samples that either are missing data or represent classes that have low sample sizes and are grouped together. |
Adjustment of batch-effects in the transcriptome compendium | To adjust the non-biological experimental variation with the consideration of large number of datasets with a few samples, we used ComBat that is developed under Bayesian framework and is known to be robust to outliers in small sample sizes [62]. |
Supporting Information | Classifier performance and sample size . |
Introduction | An estimator that produces estimates that are, on average, closer to the truth for a given sample size is said to be more efficient than other estimators. |
an | We drew 30 independent samples with sample sizes 11 = 250, 500, 1000, 2000, and 4000 from each model and computed the loss €(C, Z) for each of the five estimators. |
an | With increasing sample sizes , all estimators converged to the ground truth (zero loss) but the estimators with correct structure outperformed the others even for large samples. |
Author Summary | Most proposed methodologies require the collection of new data sets and thus are limited in sample size , making them difficult to validate. |
Discussion | The majority of currently suggested approaches are limited in sample size , and require the time consuming collection of large new data sets (such as multiple biopsies from single tumours) for validation and proof of concept. |
Introduction | The clinical assessment of intra-tumour heterogeneity also poses a significant challenge, with current eXperimental approaches requiring multiple biopsies per tumour leaving them severely limited in sample size [17—19]. |