Discussion | Our concern is based upon the mechanics of the EM algorithm , which trim the tail of the haplotype frequency distribution numerous times during the estimation process to generate a parsimonious and tractable candidate set; without trimming the number of potential haplotypes that greatly exceeds the number of donors. |
Haplotype and Allele Frequency Data | SiX-locus high resolution HLA A~C~B~DRBX~DRB1~DQB1 (where DRBX = DRB3/4/ 5) haplotype frequencies were estimated across 21 race groups using an EM algorithm and 6.59 million donor HLA typings from the Be The Match Registry: complete details regarding the data and estimation are provided by Gragert [11]. |
Haplotype and Allele Frequency Data | In brief, an EM algorithm was utilized to resolve uncertainties in allele and chromosomal phase for a mixed resolution set of donor HLA typings (serology, sequence-specific oligonucleotide/primer, and sequence based typing). |
Haplotype and Allele Frequency Data | The only notable change from the Gragert methodology was that in the last iteration of the EM algorithm , a winner-take-all approach was applied where each donor contributed 1 unit of probability mass to their most likely pair of haplotypes; this is opposed to each donor assigning 1 unit of probability mass across a range of haplotype pairs—consistent with their HLA typing—in proportion to their conditional likelihood. |
Introduction | Given a proper candidate set of haplotypes, EM algorithms work well to estimate the distribution of this defined population, which becomes the reference data for computing accurate donor/patient match predictions. |
Introduction | Currently, this exhaustive set of haplotypes is heavily trimmed prior to estimation with the EM algorithm , but the trimming strategies are not quantitatively informed; Variation in Diversity Among Populations—ancestral migration patterns have resulted in different patterns of allele and haplotype diversity among ethnic groups, implying that sampling depth requirements are likely to vary by population. |
Methodology Validation | In order to further validate the methodology in a realistic regime, we used 7.8 million samples from Be the Match Registry, which identified 88,621 haplotypes as estimated in the EM algorithm from the European American population, and used a subsample of half a million haplotypes to estimate the parameters of the distribution (Xmax, Xmin and a). |
Implementation | For realistic data sets of 300 features and 200 cells per knockdown, NEMiX estimation took on average nine minutes for S-gene networks, with an average of 13 iteration steps until convergence of the EM algorithm . |
Introduction | For inference of the hidden pathway state, we developed an EM algorithm [21]. |
NEMix inference | For this task we have developed an EM algorithm . |
Network inference under unknown pathway activity | For the knockdown probability of the hidden variable, p0, we implemented an EM algorithm , which estimates jointly p0 from each cell’s observation and the connections of observations to signaling genes, 9. |
Simulation study | For NEMs and sc-NEMs, we used maximum likelihood estimation to infer 6 and in the NEMix it is estimated by in an EM algorithm . |
Supporting Information | The EM algorithm was used to assign cells into the GFP+ mode (colored). |
Supporting Information | Statistics from the EM algorithm were used to extract the mean (circle) and major axes (lines) of the GFP+ mode. |
Supporting Information | EM algorithm was used to assign cells to the GFP+ subpopulation (colored). |