Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | We then refined this gene set by fitting a Cox proportional hazards model on 5 year censored data using all the identified genes as covariates and deleting genes which were not significantly prognostic independently of others in the gene set . |
Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | Criticism of feature selection for prognostic classifiers based on gene sets ranked by correlation with outcome has stemmed from the considerable discordance of such features between data sets [47, 48]. |
Signalling entropy’s prognostic power in breast cancer can be represented by a small number of genes | By using signalling entropy to refine the prognostic gene set we found that this gene set instability was reduced. |
Supporting Information | Genes utilised in the gene set enrichment analysis to identify gene sets associated with signalling entropy’s prognostic power in breast and lung cancer. |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | To determine which gene sets were enriched among the genes prognostically related to signalling entropy independently of other variables, we considered for breast cancer a list of 320 genes which were prognostic, independent of ER status and grade, and correlated with signalling entropy, again independently of ER status and grade, in both MEATBRIC datasets. |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | We performed a gene set enrichment analysis, using a Fisher’s Exact test, comparing each of these gene lists separately against the Molecular Signatures Database [50] (S6 Table shows the top 10 enriched gene sets for both gene lists). |
The prognostic impact of signalling entropy is associated with genes involved in cancer stem cells and treatment resistance | The decision to use these gene sets for the enrichment screens, rather than the genes utilised to derive the SE scores was due to them being derived from multiple data sets and thus more robustly representative of signalling entropy’s prognostic associations. |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | To characterize the roles of the genes affected by cryptic 3’SS usage, we performed a gene set enrichment analysis for the 912 genes that contained the 619 proximal and 417 distal cryptic 3’SSs used significantly more often in the SF3BI mutant samples (SS File). |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | The gene set with the second smallest p-value consists of genes up-regulated in chronic myelogenous leukemia and the seventh gene set contains genes up-regulated in aggressive uveal melanoma samples (GSEA [21] , q < 1035). |
Cryptic 3’SS selection is limited to tumors with mutations in HEAT repeat hotspots | These results may reflect the fact that we are more likely to identify cryptic 3’SSs in genes that are highly expressed which may bias such a gene set enrichment analysis. |
Cryptic 3’SSs are used infrequently relative to canonical 3’SSs | To investigate the potential role of NMD, we identified differentially expressed genes between the SF3BI mutant and wild-type samples in a joint analysis of all three cancers and performed a gene set enrichment analysis. |
Differential gene expression | Gene set enrichment analysis was performed using GSEA [21]. |
Gene set enrichment for genes with cryptic 3’SS usage | Gene set enrichment for genes with cryptic 3’SS usage |
Gene set enrichment for genes with cryptic 3’SS usage | We performed a gene set enrichment analysis using GSEA [21] for the genes that contained cryptic 3’SSs by combining the genes that contained the 619 proximal (S3 File) and the 417 dis |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | Accurate prediction of genetic and environmental parameters requires a small, informative gene set |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | In general, however, our results show that the subset of genes that is needed to achieve high balanced classification accuracy is neither a handful of biomark-ers, nor a large gene set , with all cases achieving near-optimal performance with 100 to 400 genes. |
Accurate prediction of genetic and environmental parameters requires a small, informative gene set | Altogether, the results suggest that multiple environmental and cellular features of an organism can be precisely predicted from a set of individual classifiers, by using a small, targeted gene set . |
Supporting Information | The intersection of the feature gene set when mutual information (MI) and differential expression (DEG) are used for ranking. |