Analysis of CD data—gene analysis | The results of the gene analyses of the CD data are summarized in Table 2, which shows the number of significant genes at a number of different p-value thresholds. |
Analysis of CD data—gene-set analysis | The comparison of competitive methods is somewhat more complicated, due to the fact that ALIGATOR, INRICH and MAGENTA all use discretization using a p-value cutoff. |
Analysis of CD data—gene-set analysis | For INRICH the results are strongly dependent on the SNP p-value cutoff used, with three significant gene sets at the 0.0001 cutoff but none at the higher ones, further emphasizing the problem of choosing the correct cutoff. |
Analysis of CD data—gene-set analysis | This suggests that the different methods, or methods at different p-value cutoffs, are sensitive to distinctly different kinds of gene set associations. |
Analysis of summary SNP statistics | For the mean 12 statistic, a gene p-value is then obtained by using a known approximation of the sampling distribution [20,21]. |
Analysis of summary SNP statistics | For the top 12 statistic such an approximation is not available, and therefore an adaptive permutation procedure is used to obtain an empirical gene p-value . |
Analysis of summary SNP statistics | The empirical p-value for a gene is then computed as the proportion of permuted top 12 statistics for that gene that are higher than its observed top 12 statistic. |
Gene analysis | The gene analysis in MAGMA is based on a multiple linear principal components regression [18] model, using an F-test to compute the gene p-value . |
Gene-set analysis | To perform the gene-set analysis, for each gene g the gene p-value pg computed with the gene analysis is converted to a Z—value 2g 2 (ID—1(1 — pg), where (ID—‘1L is the probit function. |
E 3 A A g Time s 'r r a E A AA Time Time | By definition, a p-value is the likelihood of obtaining a test statistic equal to or more extreme than the value that is observed if the null hypothesis is true—it increases cumulatively as one progresses through a set of rank ordered test statistics. |
E 3 A A g Time s 'r r a E A AA Time Time | For a dataset generated from this null model, the p-values should be uniformly distributed from 0 to 1, exclusive: the highest Kendall’s 1‘ out of N tests should have a p-value of 1 / (N + 1), the second highest test statistic has a p-value of 2/ (N + 1), and the ith highest test statistic has a p-value of i/ (N + 1) [35]. |
E 3 A A g Time s 'r r a E A AA Time Time | ITK_CYCLE computes the Kendall T values for all the reference time series against the signal of interest and then performs a selection step for the lowest p-value (i.e., the highest 1‘), which we refer to here as the “initial” p-value . |
Microarray metadataset | Choosing a Benjamini-Hochberg adjusted p-value cutoff of 0.05 (i.e., 5%), the number of genes and overlap between methods can be seen in Figs. |
Overview | However, for large time series the ITK_CYCLE null distribution is approximately normal, allowing for a convenient, fast p-value estimate. |
Simulated data benchmarks | However, this requires recomputing null distributions via MC sampling because the construction procedure introduces correlations between data points, resulting in p-value underestimates if not corrected. |
Biological validation analysis | First we identify the set of GO terms (pathways) that are significantly enriched within the given set of seed genes using Fisher’s exact test (Bonferroni corrected p-value <0.5). |
DIAMOnD implementation | (2) and consequently a lower p-value . |
DIAMOnD implementation | Similarly, between two proteins with the same number of connections to seeds k5, the one with lower k will result in lower p-value . |
DIAMOnD implementation | Finally, we calculate the exact p-value for the remaining nodes. |
Disease-gene associations | We use a genome-wide significance cutoff of p-value g 5 - 10—8. |
Interaction patterns of disease proteins within the Interactome | We found that only between ~ 1%-5% of the communities detected by the different methods are significantly enriched ( p-value < 0.05, Fisher’s exact test) with any set of disease proteins (Fig. |
Interaction patterns of disease proteins within the Interactome | To evaluate Whether a certain protein has more connections to seed proteins than expected under this null hypothesis, we calculate the connectivity p-value , i.e. |
Interaction patterns of disease proteins within the Interactome | 1H shows that the connectivity p-val-ues within the sets of known disease proteins are very significantly ( p-value < 10—241, Kolmogorov-Smirnov test) shifted towards smaller values when compared to the distributions expected for randomly scattered proteins. |
The DIAMOnD algorithm | lowest p-value ) is added to the set of seed nodes, increasing their number from so —>51 2 50+1. |
Validating disease modules | the number of seed genes 5 on which the p-value in Eqs. |
Cancer-type-specific domain mutation landscapes across 21 cancer types | We identified ~ 100 cancer-type-specific significantly mutated domain instances (SMDs) in 21 cancer types (S2 Table; P-value = 10—7, Fisher’s Exact test, False Discovery Rate (FDR) <0.05). |
Cancer-type-specific domain mutation landscapes across 21 cancer types | Enrichment for Cancer Census genes was both strong and significant (~ 12-fold enrichment; P-value 2 5X 10—34, Fisher’s Exact test), and suggests the remaining 54 genes that are not already known to be cancer drivers represent good candidates. |
Cancer-type-specific domain mutation landscapes across 21 cancer types | Of the 94 genes encoding cancer type-specific SMDs, 24 were found in the Sleeping Beauty dataset (~ 3-fold enrichment; P-Value 2 7X 10—06, Fisher’s Exact test). |
Cancer-type-specific positioning of mutations within a given gene | These 52 genes were enriched for evidence of involvement in cancer, with 16 being Cancer Census genes (enrichment factor ~ 11.9; P-value = 6.7 X1043, Fisher’s Exact test), and 15 being candidate cancer genes according to the Sleeping Beauty screen (enrichment factor ~ 4.5; P-value = 1.9 X10'6, Fisher’s Exact test). |
Cancer-type-specific significantly-mutated domain instance analyses | We chose a P-value threshold (OL = 10—7) yielding a false discovery rate (FDR) of less than 0.05. |
Cancer-type-specific significantly-mutated domain instance analyses | We made a heat map representation of the hierarchical clustering of SMDs in different cancers using the “heatmap.2” R package based on the —log ( P-value ) of each cancer-type-specific domain instance. |
Cancer-type-specific significantly-mutated position based mutational hotspot analyses | We calculated the mutational hotspots within each domain instance encoded by a single gene based on Fisher’s Exact test with a P-Value cutoff 0.01 (FDR <0.05). |
Data | If the null hypothesis of statistical independence of these two variables cannot be rejected in a chi-squared test using a p-value of p = 0.05 the seX ratio is set to zero, SR(x,t) = O. Lead/lag indicator. |
Data | The p-value for each lead and lag indicator is the probability of obtaining the observed values for I lead(d,-,x) and I lag(d,-,x) from the surrogate data. |
Further comorbidities | In the enlarged sample only one out of the 123 comorbidities using the inpatient sample has a p-value greater than 0.05 (M23), all other remain significant (p<0.05). |
Results/Discussion | Lead/lag behavior is identified for male and female DM1 and DMZ patients if the null hypothesis that the observed indicator values for I lead(di,x) and I lag(d,-,x) can be obtained from randomized surrogate data can be rejected with a p-value of p< 0.01. |
Supporting Information | For the age groups with the smallest p-value the relative risks RR, patient ages, and the corresponding p-values are shown for DM1 and DM2, respectively. |
Supporting Information | Comorbidity data for DMl patients, the relative risks RRI, the confidence intervals for RRI, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group. |
Supporting Information | Comorbidity data for DM2 patients, the relative risks RRZ, the confidence intervals for RRZ, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group. |
Enrichment of particular logic gates among consistent triplets by hyper-geometric test | Given a set of triplets (e.g., the triplets in Which RFl is MYC) and a particular logic gate g, we calculate a hyper-geometric enrichment p-Value to describe the enrichment of triplets consistent With the gate g as opposed to other gates as follows: The p-Value is equal kg is the number of triplets consistent With the gate g in the set, K is the total number of triplets consistent With the gate g, and N is the total number of triplets. |
Loregic applications for other regulatory features | Out of these, 162 are consistent with the AND gate (with enrichment by hypergeometric test p-value <1.3*10'3), and 159 are consistent with “T 2 RH” (with enrichment by hypergeometric test p-value <7.5*10'5) making them the dominant logic gates for yeast FFL. |
Loregic applications for other regulatory features | From these triplets, 446 match “T = RFZ” When RF2 is MYC (hy-pergeometric test p-value < 2.5*10'124), and 201 match “T = ~RF1+RF2” When RF1 is a miRNA and RF2 is MYC (hypergeometric test p-value < 4.1*10'25). |
Validation | For example, in analyzing 871 AND-gate-consistent triplets, we found that deleting either of their TFs gave rise to substantial down-regulation of their target genes, i.e., the logarithm expression fold changes were significantly less than zero (t-test p-value = 0.068). |
Validation | The two most enriched logic gates are “T = RF1” (133 triplets, hypergeometric test p-value < 4.3*10' 27) and “T = RF1+RF2 (OR)” (211 triplets, hypergeometric test p-value < 1.1*10'21) |
Discussion | The power of a statistical test generally depends on three factors: first, the sample size; second, statistical significance as measured by the threshold p-value used to assess significance; and third, the effect size, which quantifies departures from the null hypothesis. |
Power calculation for fixed non-neutral model parameters | For the LOGS metacommunity, and when the local dynamics are strictly neutral (7/ = 0 for model HL or c = 0 for model PC), the models are equivalent to the SNM, and the power is equal to the threshold p-value for statistical significance (0.05 in our study). |
Testing the neutral null model | To calculate the p-value of our test, we compare the value of a test statistic for the test data set with values of the test statistic for data sets generated by the null model. |
Testing the neutral null model | The p-value for the test is the fraction of neutral data sets Whose maximum likelihood is lower than the maximum likelihood for the test data set, i.e. |
Testing the neutral null model | The neutral model is rejected if the p-value is less than the chosen threshold for statistical significance, which we take to be 0.05. |
Insights into the functions of ORFs with peak in 3’UTR | with lowest p-value ) are identified for a range of 31 to 86 ORFs per GO term, with a mean value of 62.3 ORFs per GO term. |
Statistical analysis | All enrichment Widgets list a term, a count and an associated p-Value . |
Statistical analysis | The p-Value is the probability that result occurs by chance, thus a lower p-Value indicates greater enrichment Without corrections. |
Statistical analysis | The p-Value is calculated using the Hypergeometric distribution. |
PEACS: Algorithm | To calculate a p-Value , a Monte Carlo sampling algorithm was implemented. |
PEACS: Algorithm | The p-Value was defined as the rank of the real PEACS score in the null distribution diVided by 10,000. |
Results | The empirical p-value was then determined by ranking the PEACS score for the given perturbation relative to the PEACS scores generated by this Monte Carlo procedure. |
Supporting Information | Displayed are the PEACS scores, uncorrected p-Value, Bonferroni corrected p-Value , and significance (* = raw p<0.01; T = Bonfer-roni-corrected p<0.05) for genes with at least 3 knockdown conditions with 2-fold or higher knockdown. |
Pairwise covariation | Kendall's tau-b for each such pair is calculated with an accompanying Z-score from which a p-value can be calculated. |
Supporting Information | PR-PR pairs ranked by Fisher exact test p-Value calculated from 2013 HIVDB sequences. |
The pooled proportion PP _PS ><N5+Pf ><Nf | From these quantities, a Z-score and p-Value can be computed assuming a normal distribution using Z = (PfPs)/SE. |
The pooled proportion PP _PS ><N5+Pf ><Nf | A p-Value is computed for each mutation at all 599 positions for Which the mutation is detectable in at least 5 patients Who failed therapy. |
Supervised learning: Classification | The coefficients give the relative importance of each feature to the predictor; associated p-values indicate the confidence in those coefficient values (a large p-value indicates an unreliable estimate of the feature contribution). |
Unsupervised learning | Antibody feature:function and feature:feature correlations were computed over the set of 80 vaccinated subjects and assessed using Pearson correlation coefficient and p-value . |
Unsupervised learning | For each function and each group, the feature with the largest-magnitude feature:function correlation coefficient was identified; each such feature also had the best feature:function p-value within its group, < = 0.001. |