Abstract | This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. |
Abstract | The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. |
Gene-set analysis | As such, using this variable Z a very simple intercept-only linear regression model can now be formulated for each gene set 5 of the form Z5 2 flOT —|— 55, where Z5 is the subvector of Z corresponding to the genes in 5. |
Gene-set analysis | with elements 5g is then defined, with 5g 2 1 for each gene g in gene set 5 and 0 otherwise. |
Gene-set analysis | The parameter A in this model reflects the difference in association between genes in the gene set and genes outside the gene set , and consequently testing the null hypothesis fis = 0 against the one-sided alternative A > 0 provides a competitive test. |
Introduction | Gene-set analysis methods can be subdivided into self-contained and competitive analysis, with the self-con-tained type testing whether the gene set contains any association at all, and the competitive type testing whether the association in the gene set is greater than in other genes [7]. |
Introduction | However, one concern with most existing methods is that they first summarize associations per marker before aggregating them to genes or gene sets . |
Introduction | This makes it more difficult to determine the properties of the analysis such as how the significance of a gene set relates to the significance of its constituent genes or whether the analysis corrects for a polygenic architecture. |
Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | We then refined this gene set by fitting a Cox proportional hazards model on 5 year censored data using all the identified genes as covariates and deleting genes which were not significantly prognostic independently of others in the gene set . |
Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | Criticism of feature selection for prognostic classifiers based on gene sets ranked by correlation with outcome has stemmed from the considerable discordance of such features between data sets [47, 48]. |
Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | By using signalling entropy to refine the prognostic gene set we found that this gene set instability was reduced. |
Supporting Information | Genes utilised in the gene set enrichment analysis to identify gene sets associated with signalling entropy’s prognostic power in breast and lung cancer. |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | To determine which gene sets were enriched among the genes prognostically related to signalling entropy independently of other variables, we considered for breast cancer a list of 320 genes which were prognostic, independent of ER status and grade, and correlated with signalling entropy, again independently of ER status and grade, in both MEATBRIC datasets. |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | We performed a gene set enrichment analysis, using a Fisher’s Exact test, comparing each of these gene lists separately against the Molecular Signatures Database [50] (S6 Table shows the top 10 enriched gene sets for both gene lists). |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | The decision to use these gene sets for the enrichment screens, rather than the genes utilised to derive the SE scores was due to them being derived from multiple data sets and thus more robustly representative of signalling entropy’s prognostic associations. |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | To characterize the roles of the genes affected by cryptic 3’SS usage, we performed a gene set enrichment analysis for the 912 genes that contained the 619 proximal and 417 distal cryptic 3’SSs used significantly more often in the SF3BI mutant samples (SS File). |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | The gene set with the second smallest p-value consists of genes up-regulated in chronic myelogenous leukemia and the seventh gene set contains genes up-regulated in aggressive uveal melanoma samples (GSEA [21] , q < 1035). |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | These results may reflect the fact that we are more likely to identify cryptic 3’SSs in genes that are highly expressed which may bias such a gene set enrichment analysis. |
Cryptic 3’SSs are used infrequently relative to canonical 3’SSs | To investigate the potential role of NMD, we identified differentially expressed genes between the SF3BI mutant and wild-type samples in a joint analysis of all three cancers and performed a gene set enrichment analysis. |
Differential gene expression | Gene set enrichment analysis was performed using GSEA [21]. |
Gene set enrichment for genes with cryptic 3’SS usage | Gene set enrichment for genes with cryptic 3’SS usage |
Gene set enrichment for genes with cryptic 3’SS usage | We performed a gene set enrichment analysis using GSEA [21] for the genes that contained cryptic 3’SSs by combining the genes that contained the 619 proximal (S3 File) and the 417 dis |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | Accurate prediction of genetic and environmental parameters requires a small, informative gene set |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | In general, however, our results show that the subset of genes that is needed to achieve high balanced classification accuracy is neither a handful of biomark-ers, nor a large gene set , with all cases achieving near-optimal performance with 100 to 400 genes. |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | Altogether, the results suggest that multiple environmental and cellular features of an organism can be precisely predicted from a set of individual classifiers, by using a small, targeted gene set . |
Supporting Information | The intersection of the feature gene set when mutual information (MI) and differential expression (DEG) are used for ranking. |
Application to pathogen infection experiments | Finally, we observed an enrichment for low off-target siRNAs in this pathway when performing a gene set enrichment analysis [32] (see supplementary S9 Fig). |
Discussion | This selection step helps to achieve reasonably unbiased results with our model, but it also limits the gene sets we can analyze. |
Supporting Information | To see if KEGG pathways are affected differently by off-tar-geting siRNAs, we performed a gene set enrichment analysis [32] on the siRNA scores, using 810 Fig. |