Categorization of gene expression data | WT samples were identified from experiments that didn’t undergo genetic and environmental perturbations from the three platforms (7 for Affymetrix E. coli An-tisense Genome Array, 6 for Affymetrix E. coli Genome 2.0 Array, and 6 for RNA-Seq ). |
Introduction | Indeed, after aggregating all high-throughput transcriptional data that is currently available for E. coli, the most well-studied model microbe, we are still limited to a few thousands microarray or RNA-Seq experiments that cover more than 30 strains, a dozen different media and a multitude of other genetic (knockout, over-expressions, re-wirings), or environmental (carbon limitation, chemicals, abiotic factors) perturbations. |
Introduction | microarrays vs. RNA-Seq ), in different labs and under different environmental conditions, appropriate normalization schemes are both of paramount importance and with an added complexity. |
Introduction | To achieve this, we have extended, normalized and annotated a compendium that was compiled recently [29] to incorporate all published high-quality Affymetrix mi-croarray and RNA-Seq datasets in E. coli (2258 samples in total, Fig. |
Methods | We downloaded 83 RNA-Seq E. coli transcriptional profiles from 17 different GEO entries [30] that correspond to 8 strains, LB and MOPS media in wild-type (WT), gene knockouts (KOs), double KOs and environmental perturbations. |
Methods | The resulting RNA-Seq dataset was composed of 64 samples of 4725 genes. |
Methods | We integrated the RNA-Seq dataset (64 samples) to the E. coli Microarray Compendium (EcoMAC) that consists of 2198 microarrays of 4189 genes for which raw files were downloaded and normalized by RMA (robust multichip average) method [29]. |
Abstract | Next, using human ENCODE ChlP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. |
Discussion | Given the multitude of high quality expression (e.g., RNA-seq, small RNA-seq ), and regulation (e.g., ChIP-seq, CLIP-seq, DNase-seq) datasets available, Loregic can be further used to study cooperations among other regulatory elements such as splicing factors, long non-coding RNAs, etc., or RF cooperations during other biological processes such as embryonic developments for the model organisms in modENCODE project [44]. |
Gene expression, transcription factor and miRNA datasets | In the study of gene expression in human leukemia, we obtained RNA-seq RPKM expressions from The Cancer Genome Atlas Data Portal [51] for 19,798 protein-coding genes and 705 miRNAs across 197 and 188 AML samples, respectively. |
Introduction | In this study, we use data derived from ChIP-Seq and RNA-Seq eXperiments to predict the cooperative patterns between RFs as they co-regulate the eXpression of target genes. |
Introduction | On a genome-wide scale ChIP-Seq provides regulatory information about wiring between RFs and targets, while RNA-Seq provides gene eXpression data; by combining these two data types we are able to go beyond the regulatory activities of individual RFs and investigate the relationships between higher order RF groups. |
Read alignment | RNA-seq reads were aligned to the human genome (hg19) using STAR 2.3.0e (—alignSIDBo-verhangMin 1—seedSearchStarthaX 12—alignSplicedMateMameinOveerate 0.08—out-FilterScoreMinOverLread 0.08—outFilterMatcthinOverLread 0.08—outFilterMultimameaX 100—outFilterIntronMotifs RemoveNoncanonicalUnannotated—outSIfilterOverhangMin 6 6 6 6) and a splice junction database consisting of junctions from Gencode, UCSC knownGene, AceVieW, lincRNAs, and H-InV [19,35—39]. |
Supporting Information | Number of uniquely mapped RNA-seq reads from STAR alignment. |
Supporting Information | Grey bars indicate frequency of SF3BI mutant allele in RNA-seq data. |
Supporting Information | SF3BI mutated samples have columns for frequency of SF3BI mutation in RNA-seq data, mutation type, codon change and Whether the mutation is in the HEAT 5—9 repeats. |
Signalling Entropy | We note that not all proteins in the PIN have a corresponding probe in the microarray or sequence in the RNA-seq data, consequentially the PIN we consider is the maximally connected component of the original PIN after the removal of missing proteins. |
Signalling entropy is prognostic in stage I lung adenocarcinoma | To evaluate the clinical associations of our measure we first computed signalling entropy for each mi-croarray sample in The Director’s Challenge dataset profiling 398 tumours [42], and for the 455 lung adenocarcinoma RNA-seq tumour samples downloaded from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov/). |
Signalling entropy is prognostic in stage I lung adenocarcinoma | It is of particular note that signalling entropy is significantly prognostic if computed from either microarray or RNA-seq data sets, this result attests to the biological relevance of our measure which is not masked by experimental technique. |