Abstract | Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. |
Abstract | We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. |
Abstract | Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. |
Author Summary | The distribution of haplotypes and species tend to be heavy tailed. |
Author Summary | Accurate measures of diversity are difficult to achieve given that a limited number of common hap-lotypes represent the majority of the population, whereas the major contributor to haplo-type diversity comes from unique haplotypes that are “rare” and present in only a fraction of the population. |
Author Summary | We here use a power-law methodology that accommodates heavy-tails to estimate both the population coverage by ethnicity in the US and the genetic diversity of alleles and haplotypes . |
Abstract | Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. |
Discussion | In the cancer framework, a single haplotype is usually expected to be present in multiple copies. |
Introduction | insights about how haplotypes influence phenotype. |
Introduction | Our goal is to identify the number of potentially collapsed haplotypes in any given contig, affording information for subsequent efforts aimed at properly separating distinct genomic segments. |
Switchgrass Dataset | A closer look at the reads aligned against a region containing some of the variants in that contig provides a picture of how the alleles are organized in haplotypes (S7 Fig). |
Switchgrass Dataset | Interestingly, in this case, most minor alleles are linked to each other in the same reads, forming a single haplotype . |
Switchgrass Dataset | This haplotype is present in a roughly 1:5 ratio with regards to the underlying reference sequence. |