Index of papers in PLOS Comp. Biol. that mention
  • sequencing data
William F. Flynn, Max W. Chang, Zhiqiang Tan, Glenn Oliveira, Jinyun Yuan, Jason F. Okulicz, Bruce E. Torbett, Ronald M. Levy
Abstract
To examine covariation of mutations between two different sites using deep sequencing data , we developed an approach to estimate the tight bounds on the two-site bivariate probabilities in each viral sample, and the mutual information between pairs of positions based on all the bounds.
Conventional sequence data
Conventional sequence data
Conventional sequence data
Variation in the deep sequencing data was compared to protease sequence variation in the Stanford HIV Database and Gag/Gag-Pol sequence variation in the Los Alamos National
Conventional sequence data
Entries with available nucleotide data were translated using IUPAC standard protein codes and, if any ambiguities existed in the translated sequence or nucleotide data was unavailable for that sequence, corresponding protein sequence data was used to fill in any ambiguities in the translated sequence.
Correlation analysis in using bound estimates protease captures known pair correlations
Findings in these publications potentially serve as a benchmark that can be used to estimate how well we are able to recover information about correlated mutations from protease and gag deep sequencing data using the bounding procedure.
Covariation of mutations in Gag-protease proteins
Identifying pairs of correlated mutations from deep sequencing data is not as straightforward as when given conventional multiple sequence alignments.
Deep sequencing data
Deep sequencing data
Discussion
Previous to this study, there was no direct method to extract two-site frequency counts from viral deep sequencing data of gag and pol given the absence of sequence linkage due to the short sequencing reads.
Discussion
Nevertheless, the procedure we have developed for identifying covariation from deep sequencing data with short reads used on single site mutation frequencies and bounds on joint marginals serves as a good starting point upon which future studies may eXpand datasets containing many deep sequenced samples.
Mutations in protease and gag
In Fig 2, the observed variation in the deep sequencing data (top) is shown above with the variation present in 2378 drug-naive gag sequences from the Los Alamos HIV sequence database (bottom) (http://www.
Mutations in protease and gag
We identified considerably more mutations from our deep sequencing data as compared with LANL data in the following regions of gag: in matrix both near the matrix/capsid (MA/ CA) cleavage site and scattered throughout the central portion of MA; in p2 and nucleocapsid (NC) on either side of the p2/NC cleavage site; and throughout the first half of p6.
sequencing data is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Gabriel R. A. Margarido, David Heckerman
Abstract
The method can be used for whole genome shotgun (WGS) sequencing data .
Acknowledgments
The switchgrass sequence data were produced by the US Department of Energy Ioint Genome Institute http://www.jgi.doe.gov/ in collaboration with the user community.
Simulations
We simulated coverage levels varying from 10X per copy, which is typically less than optimal, to a coverage of 75X per haploid copy, which is higher than the usually employed datasets, although currently practicable given the continuously decreasing costs of next-generation sequencing data .
Switchgrass Dataset
Next, we downloaded from the NCBI Sequence Read Archive whole genome shotgun reads from the same genotype, obtained through the Illumina HiSeq 2000 platform, in a total of 106.4 Gb of sequence data , and aligned all read pairs against the reference genome.
Wheat Dataset
To investigate the effectiveness of ConPADE in that situation, as a validation procedure, we initially applied it to sequence data from the large arm of chromosome 5D—that is, chromosome 5 from the subgenome D. This data contains 236.8 Mb of sequence, with a contig L50 of 2,647 bp, and is expected to cover roughly half of the complete long arm of chromosome 5D.
sequencing data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Stuart Aitken, Shigeyuki Magi, Ahmad M. N. Alhendi, Masayoshi Itoh, Hideya Kawaji, Timo Lassmann, Carsten O. Daub, Erik Arner, Piero Carninci, Alistair R. R. Forrest, Yoshihide Hayashizaki, Levon M. Khachigian, Mariko Okada-Hatakeyama, Colin A. Semple , the FANTOM Consortium
Discovery of non-coding RNA genes active in the immediate-early response
Of the four established ID-miRs for which we have CAGE data for the precursor, only two (hsa-mir-320a and hsa-mir-155) satisfied the expression criterion in the small RNA sequencing data .
peak category.
Mature miRNA expression in MCF7 cells in response to HRG in the small RNA sequencing data .
peak category.
Eleven ID-miRs were present in the small RNA sequencing data With expression above the minimum threshold.
sequencing data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Christopher DeBoever, Emanuela M. Ghia, Peter J. Shepard, Laura Rassenti, Christian L. Barrett, Kristen Jepsen, Catriona H. M. Jamieson, Dennis Carson, Thomas J. Kipps, Kelly A. Frazer
Abstract
Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3’ splice sites (3’SSs) are used in cancers with SF3B1 mutations.
Code, data, and reproducibility
Sequencing data is available through dbGaP (phs000767).
Sample selection
RNA sequencing data was downloaded from CGHub [32].
sequencing data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: