Abstract | Robust methods for identifying patterns of expression in genome-wide data are important for generating hypotheses regarding gene function. |
Author Summary | Understanding how such rhythms couple to biological processes requires statistical methods that can identify cycling time series in typical genome-Wide data. |
Conclusions | In this paper, we compare methods for detecting rhythmic time series in genome-wide expression data. |
Discussion | These approaches are general and can be applied to detecting periodic behavior in a wide range of contexts, but we focus on time series representative of genome-wide expression data. |
Discussion | [28] recently reviewed a number of earlier studies of rhythm detection methods and selected four algorithms for comparison (de Lichtenberg, Lomb-Scargle, ITK_CYCLE, and persistent homology) based on their mathematical properties and applicability to genome-wide expression data. |
Discussion | By contrast, here we focus on discovering rhythmic time series that represent only a fraction of a genome-wide dataset. |
Introduction | Despite the decreasing cost of measuring transcript levels, profiling time series genome-wide continues to present formidable challenges: tissue-specific samples are difficult to collect, and, in contrast to imaging, measuring transcript levels is destructive in nature, requiring separate samples for each time point. |
Simulated data benchmarks | We use it to further assess the importance of considering asymmetric waveforms, and we eXplore how multiple hypothesis correction impacts the results when the true positives represent a relatively small fraction of the simulated time series, as we eXpect to be the case in genome-wide studies. |
Simulated data benchmarks | Furthermore, we focus on genome-wide experiments where the experimental design is such that there is no meaningful difference between data collected over multiple periods and data collected at the same sampling rate in replicate over a single period. |
Simulated data benchmarks | This composition was chosen to be reflective of a genome-wide dataset. |
Abstract | We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. |
Author Summary | To overcome this limitation and design cost-efficient studies, we developed a two step method: sequencing of relatively few members of a well-characterized founder population followed by pedigree-based whole genome imputation of many other individuals with genome-wide genotype data. |
Framework Genome-Wide Genotypes | Framework Genome-Wide Genotypes |
Introduction | To address the limitations of LD- and pedigree-based imputation methods, we developed PRIMAL (Bediggee Mputation &gorithm), a fast phasing and imputation algorithm, to assign genotypes at 7 million bi-allelic variants that were discovered in the whole genome sequences of 98 Hutterites to an additional set of 1,317 Hutterites who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs). |