Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
Ethan S. Sokol, Sandhya Sanduja, Dexter X. Jin, Daniel H. Miller, Robert A. Mathis, Piyush B. Gupta

Abstract

To circumvent this difficulty we have developed a method that identifies cell-state regulators without requiring any markers of differentiation, termed Perturbation-Expression Analysis of Cell States (PEACS). We have applied this marker-free approach to screen for transcription factors that regulate mammary stem cell differentiation in a 3D model of tissue morphogenesis and identified RUNX1 as a stem cell regulator. Inhibition of RUNX1 expanded bipotent stem cells and blocked their differentiation into ductal and lobular tissue rudiments. Reactivation of RUNX1 allowed exit from the bipotent state and subsequent differentiation and mammary morphogenesis. Collectively, our findings show that RUNX1 is required for mammary stem cells to exit a bipotent state, and provide a new method for discovering cell-state regulators when markers are not available.

Author Summary

Even in cases Where markers have been identified, they often only enrich for certain cell states and do not uniquely identify states. While useful in some contexts, such enriching markers are ineffective tools for discovering genes that regulate the transition of cells between states. We present a method for identifying these cell state regulatory genes Without the need for predetermined markers, termed Perturbation-Expression Analysis of Cell States (PEACS). PEACS uses a novel computational approach to analyze gene eXpression data from perturbed cellular populations, and can be applied broadly to identify regulators of stem and progenitor cell self-renewal or differentiation. Application of PEACS t0 mammary stem cells resulted in the identification of RUNXI as a key regulator of eXit from the bipotent state.

Introduction

This unique regenerative ability can be recapitulated in culture models, Where single stem cells, but not differentiated cells, form tissue rudiments in three-dimensional extracellular matrices. These tissue rudiments, or organoids, exhibit many of the topological, functional and phenotypic traits of the corresponding tissue. For example, mammary stem cells form ducts and lobules in collagen matrices that resemble structures present in the breast [1—3] , While colon stem cells form mini-crypts in Matrigel that resemble analogous structures in the small intestine [4].

In systems With well-defined markers of stem, progenitor and differentiated states, this can be accomplished by inhibiting candidate genes and assessing the resulting effects on cell state proportions [5]. However, for many tissues markers of stem cells and early progenitors are not available, and even in cases Where such markers are available they often only enrich for states of interest. This lack of defining markers has complicated efforts to screen for cell-state regulators, because changes in the number of cells expressing an enriching marker may not quantitatively reflect changes in the stem or progenitor cell types of interest.

Application of PEACS to mammary stem cells led to the discovery of a novel role for RUNXl in exit from the bipotent state. We anticipate that PEACS Will be useful in the many contexts Where defining markers are not available, and have implemented the algorithm as a software tool available to the scientific community.

Results

Perturbation-Expression Analysis of Cell States (PEACS)

First, populations of stem cells propagated in culture are heterogeneous, and invariably include early progenitors and other more differentiated cell types. While typically considered a drawback of maintaining stem cells in culture, this heterogeneity is essential for the computational analysis underlying PEACS. Second, experimental conditions that perturb transitions between stem and progenitor states will also perturb the relative proportions of stem and progenitor cells in a heterogeneous population of cells. For example inhibiting a gene required for stem cell self-renewal will reduce the proportion of stem cells in a heterogeneous population, with a concomitant relative increase in progenitors or other more differentiated cell types.

However, without knowing either the cell state proportions or the gene-expression vectors of the individual states, it may appear that there is insufficient information to make such an inference. The solution lies in a third key observation: the gene-expression profiles (vectors) of heterogeneous populations of cells are weighted linear combinations of the expression profiles (vectors) of the component states within the population, with the weights in this linear combination corresponding to cell-state proportions. In other words, the gene-expression signal of the population is a linear mixture of component signals, the latter of which are unknown. The key is to deconvolute this signal (Fig 1).

The most commonly used algorithm to infer linear components, SVD/PCA, iteratively minimizes the reconstruction error of a mixed signal, under the constraint that the component newly identified in a given iteration be orthogonal to all of the previously identified components. Given the immense success of SVD/ PCA in solving many problems across diverse fields, we decided to assess its effectiveness for our problem. A second algorithm, NMF, reconstructs mixed signals by identifying components which have only nonnegative loadings. Some researchers have found this nonnegative constraint to be appealing, since negative loadings of genes can be difficult to interpret biologically; for this reason we also included this method for comparison. A third algorithm, ICA, does not require that the constituent components be orthogonal to one another—and instead identifies components by maximizing their independence in a statistical sense. ICA has proven useful for deconstructing mixed signals (e.g., audio) into their constituent parts.

Experimentally defining cell-state proportions would make it possible to assess, for each algorithm, how well it identified changes in cell-state proportions across experimental conditions. To generate such idealized experimental conditions we mixed three different breast cancer cell lines (T47D, SUM159, MDA-MB-231) in defined proportions—for example 1:1:1, 1:2:2, 1:1:0—with 10 mixtures in total. In this idealized experiment the three cancer lines represented different “cell states” that were mixed in defined proportions to create heterogeneous populations (Fig 2A; T47D 2 State A, MDA-MB-231 2 State B, SUM159 2 State C). We isolated total mRNA from these heterogeneous populations and profiled the expression of 17 differentiation-related genes and GAPDH, thereby generating a gene-expression profile for each heterogeneous population (SI Table). Lastly, we applied SVD, NMF and ICA to the gene expression matrix to assess the relative performance of these algorithms in identifying changes in cell-state proportions.

SVD/PCA successfully identified components that closely correlated with the proportions of the cell states in our idealized experiment: the first component exhibited a strong negative correlation with the fraction of cells within the population in State A (r2 = 0.92), while the second component correlated with the fraction of cells in State B (r2 = 0.47). Additionally, the replicates for each perturbation clustered closely together in the space spanned by these first two components identified by the SVD/PCA algorithm (Fig 2B right). Moreover, the first two SVD components together explained ~90% of the variation in the gene-expression data (as can be seen by the Scree plot in SIA Fig), which is consistent with the two degrees of freedom inherent in the design of this idealized experiment. In contrast to SVD/PCA, the two components identified by NMF both correlated strongly with the fraction of cells in State A (r2 = 0.92 and 0.92 respectively) —with component 1 correlating negatively with the proportion of cells in State A, and component 2 correlating positively with the proportion of cells in State A (Fig 2C, S 1E Fig); neither NMF component 1 nor 2 was correlated with states B or C (all r2 < 0.43; SIE Fig). For this analysis the NMF factorization was performed with parameter k = 2, because the two components together explained over 95% of the variance in the gene-expression data (SIB Fig). As was the case for the SVD/PCA algorithm, the replicates for each perturbation clustered closely in the space spanned by the two components identified by NMF; this strongly suggested that the components identified by the algorithm reflected biological signal rather than experimental noise. Unlike the SVD/PCA and NMF algorithms, the first two ICA components did not correlate with the fraction of cells in any of states A, B or C (all pairwise r2 < 0.13, Fig 2D, SlF Fig). Moreover, in almost all cases the various replicates for a given perturbation did not cluster together in the space spanned by the first two components identified by ICA (Fig 2D right). Collectively these observations indicated that both the SVD/PCA and NMF algorithms ef-fectivelyidentif1ed components that correlated strongly with cell-state proportions, while ICA failed to do so. Moreover, these observations showed that only the SVD/PCA components spanned the 2 degrees of freedom inherent this idealized experiment, which, by design, involved cellular populations that were mixtures of exactly 3 cell states.

We could directly compare gene loadings in the various components with gene expression in the various states because the gene-expression profiles of the pure states were known in our idealized experimental conditions (Fig 2E). This comparison revealed that genes with the highest loadings in SVD component 1 were uniquely expressed or repressed in state A; this was consistent with the observation that this component tracked with the fraction of cells in state A. Similarly, NMF components 1 and 2—both of which also tracked with state A—identif1ed a very similar set of genes uniquely expressed by state A (Fig 2E). A key difference, however, was that unlike SVD component 1, which included positive and negative loadings corresponding respectively to genes down or up in state A, both of the NMF components had only positive gene loadings—with NMF component 1 having positive gene loadings for the genes down in state A, and NMF component 2 having positive loadings for the genes up in state A (Fig 2E). In contrast, SVD component 2 identified the only two genes that were strongly differentially expressed between states B and C (HOXA5, FOXOI; Fig 2E); these two genes, HOXA5 and FOXOI, were respectively down and up in state B relative to state C, and were expressed near median levels in state A. Thus, the highest loadings of SVDl in this idealized experiment marked genes differentially expressed between luminal and basal cells, including the established luminal markers GATA3 and STAT5A. More generally, these findings suggested that the highest loadings in the SVD component vectors may serve to identify markers of specific cell states in contexts where such markers are not known.

For this purpose the Euclidian metric, which corresponds to the natural notion of ‘distance’ in 1, 2 and 3-dimensional space, was attractive for several reasons. First, we expect distances in SVD space to scale linearly with the extent of the change in cell state proportions. Consistent with this, analysis of the SVDl v SVD2 replicate plot for the idealized experiment (Fig 2B right panel) revealed that small perturbations in cell state proportions (e.g. 1:1:1 to 1:2:2) resulted in small distances in component space, whereas large changes in cell state proportions (e.g. 1:1:1 to 0:1:1) resulted in large distances in SVD component space. Second, the Euclidian metric makes it straightforward to quantify how noise in the various dimensions impacts the reliability of multidimensional distance estimates. We therefore used the Euclidean metric to compare distances between samples in the space spanned by the first k SVD components, where k was chosen using the standard approach of looking for an ‘elbow’ in the corresponding Scree plot. To account for biological variability across replicates (or different shRNAs targeting the same gene), we defined the PEACS score as the Euclidean distance divided by the standard error about the mean for each set of replicates (Fig 3).

Empirical p-values for PEACS scores were determined by Monte Carlo sampling: for a given perturbation with n replicates, a null distribution was obtained by randomly sampling n expression profiles from the experimental data, calculating a PEACS score, and iterating this process 10,000 times to generate a PEACS score null distribution. The empirical p-value was then determined by ranking the PEACS score for the given perturbation relative to the PEACS scores generated by this Monte Carlo procedure.

Application of PEACS to a Mammary Stem Cell Model

When seeded into a three-dimensional collagen matrix, MCFlOA cells form ductal, lobular, and ductal-lobular tissue rudiments (Fig 4A—4C). These tissue rudiments are monoclonal, indicating that they arise from single stem cells, and are morphologically similar to structures present in the human mammary gland (Fig 4C; 82 and S3 Fig; 81 Movie).

We next inhibited these factors with 3—5 shRNAs targeting each TF, with two biological replicates per shRNA, resulting in a total of 240 genetically perturbed lines. For each genetically perturbed line, we then profiled the expression of all 39 factors and housekeeping genes using high-throughput qRT-PCR. These experiments generated a large data matrix with rows corresponding to gene expression values, and columns corresponding to shRNA perturbations.

Application of the PEACS algorithm to this filtered data matrix produced a score that quantified the extent to which TF inhibition affected cell-state proportions. Based on this PEACS score, most genetic perturbations had small effects on cell state proportions, which were comparable to the effects of hairpins that did not successfully knockdown their targeted genes (Fig 5A, S3 Table). When inhibited, several genes caused large, reproducible changes in cell state proportions, which could be seen when the perturbations were plotted in 3D SVD component space or as PEACS scores (Fig 5A and 5B). We used the first three SVD components for this analysis because the elbow of the Scree plot occurred at three dimensions (SIC Fig).

Identification of the glucocorticoid receptor (NR3C1), the highest-scoring factor, was significant because of its established role in regulating mammary ductal differentiation and lactation [7]. TCF3, the third-highest scoring factor, was recently reported to be a mammary stem cell regulator [8]. RUNXI, which was the second-highest scoring factor, is mutated in a subset of breast cancers but has not been previously implicated as a regulator of mammary stem cell biology [9—11]. Since the other hits identified by PEACS were established regulators of mammary stem cells or differentiation, we suspected that RUNXI might also play a role in one or both of these processes, and therefore decided to further explore its function.

We therefore investigated the loadings of SVD component 1 to identify the genes that have the highest contribution to this component (Fig 5D). The highest loadings of SVD component 1 were ETSI, HIFIA, HOXA5, NFYA, RUNXI, YY1, and RBI. As expected, these genes were significantly decreased in the RUNXI knockdown condition compared to perturbation conditions that did not change SVD component 1 (Fig 5D). While we do not know what the state corresponding to SVD component 1 is, these markers may be useful for future studies investigating mammary lineages.

RUNX1 Is Required for Mammary Stem Cells to Differentiate

RUNXl inhibited cells formed spheres that did not hollow (Fig 6D), indicating that they were not mature lobules, and rarely formed ducts or ductal-lobular rudiments (71% reduction relative to control); the rare ducts that did form were shorter in length (25% reduction) and did not exhibit the branched morphology seen in wild type structures (Fig 6A, 6C). As a control, cells that were either mock-infected or expressed a control shRNA were not affected in their ability to form tissue rudiments. These results indicated that RUNXl is required for mammary cells to differentiate into ducts and mature lobules. To assess if the phenotype caused by RUNXl inhibition was reversible, we generated an MCFIOA line in which RUNXl could be reversibly inhibited by a doxycyline (dox)inducible shRNA (Fig 7B). When cultured in collagen in the presence of dox, these MCFIOA cells formed solid spheres and few ducts, recapitulating the phenotype observed above when RUNXl was constitutively inhibited by shRNAs. When RUNXl was reeXpressed by Withdrawing dox, the spheres rapidly sprouted ducts and began to hollow—often Within 12—24 hours (Fig 7A). This finding indicated that the RUNXl inhibited spheres were still capable of forming both ducts and lobules upon RUNXl reeXpression, raising the possibility that these spheres might consist of bipotent cells reversibly arrested in their differentiation.

Inhibition of RUNX1 Traps MCF10A Mammary Stem Cells in a Bipotent State

Parental MCFIOA cells largely lose this ability upon differentiating in collagen (Fig 7C). We seeded cells With dox to form RUNXl inhibited spheres, harvested and dissociated the spheres by treatment With collagenase and trypsin, and then reseeded single cells into collagen With or Without dox. Cells reseeded in dox again gave rise to solid spheres. However, those reseeded Without dox formed lobules and ducts that matured into complex ductal-lobular structures (Fig 7C), doing so with efficiency comparable to that of parental MCFIOA cells maintained in 2D culture. These observations strongly suggested that parental MCFIOA cells dissociated from tissue rudiments lost the ability to reseed tissue rudiments because they had differentiated and lost stem and progenitor activity; in contrast, cells Within RUNXl inhibited MCFIOA spheres maintained their ability to reseed tissue rudiments because they did not differentiate in collagen and remained bipotent.

Primary Human Mammary Stem Cells Require RUNX1 to Differentiate

To this end we isolated primary human breast epithelial cells from reduction mammoplasty tissue samples, modulated RUNXl expression, and assessed stem and progenitor cells using colony forming assays (Fig 8A) [12—14]. In these assays the majority of stem and progenitor cells form colonies containing differentiated luminal or basal cells. However a fraction of bipotent stem cells proliferate but do not differentiate; these form micro-colonies of 2—16 cells that remain uncommitted and co-express both luminal and basal markers. Inhibiting RUNXl expression caused a 2-fold increase in the number of stem cell micro-col-onies, suggesting that this transcription factor was required for primary human breast stem cells to differentiate in culture (Fig 8B). Consistent with this interpretation, inhibiting RUNXl expression reduced the number of differentiated colonies by nearly 90%, while its over-expres-sion led to a 300% increase in differentiated colonies.

For this experiment we first infected primary cells with the dox-inducible shRUNXl lentivirus, and plated cells with dox to assay for colony-forming ability. After micro-colonies of stem cells had formed (7 days after plating), we removed the dox so that RUNXl would be reexpressed. We found that reexpressing RUNXl caused the stem cell micro-colonies to differentiate within 48—96 hours, and resulted in the formation of heterova-lent colonies that included both bipotent stem cells and lineage-committed basal and luminal cells (Fig 8C). These heterovalent colonies were never observed in colony-forming assays with control primary cells, or in assays with primary cells in which RUNXl had been stably inhibited. Collectively, these findings indicate that RUNXl inhibition enables primary breast stem cells to expand in an uncommitted state while retaining the functional ability to differentiate in culture.

Discussion

We validated PEACS by applying it to a mammary stem cell model with shRNAs as a source of perturbations. In this context, the method identified several established regulators (e.g., NR3C1 and TCF3) of mammary stem cell biology, as well as a novel gene, RUNXl , which had not previously been implicated as a mammary stem cell regulator. Followup studies revealed that inhibiting RUNXl prevented mammary stem cells from differentiating, indicating that this gene is required for stem cells to eXit a bipotent state. Although our study focused on shRNA perturbations, there is every reason to believe that PEACS would be equally effective for gene over-expression or chemical perturbations.

PEACS differs from these in three important ways. First, the goal of PEACS is to specifically identify perturbations that influence how cells transition between differentiation states; we are not aware of other methods that do this. Second, the method does not require any markers of stem, progenitor or differentiated states. Third, our method analyzes bulk populations of cells to identify changes in cell state ratios, rather than analyzing large numbers of single cells. We anticipate that this marker-free approach will be particularly useful in the many contexts where stem, progenitor, and differentiated cells have been identified functionally, but where markers that distinguish these states are not yet available. It is worth emphasizing that, although markers that enrich for stem and progenitor states have been identified in many systems, few systems offer markers that sort stem or progenitor cells to purity; this latter ability is essential if these markers are to be used to identify genes that regulate state transitions. In cases where such markers are in fact available—or when they are used to define states defacto without consideration of the underlying biology—we have previously shown that a Markov model can be used to quantify the rates of transition between states, and predict the equilibrium proportions of cell states [20].

Stem cells have a strong tendency to differentiate when propagated in culture, even under conditions that are intended to maintain them in an undifferentiated state. This problem has been observed with human ES cells, HSCs, and many other stem cell types. We have shown that primary human mammary stem cells can be eXpanded in a bipotent state by transiently inhibiting RUNXI; moreover these cells spontaneously differentiate once RUNXl eXpression is re-estab-lished. A chemical compound that inhibits RUNXl could therefore be used to propagate mammary stem cells in culture. It will be of interest to examine if inhibiting RUNXl can also prevent other types of stem cells from eXiting a bi or multipotent state. In support of this possibility, a dominant-negative RUNXl translocation has been found in a subset of leukemias, and eXpression of this protein blocks the differentiation of leukemic cells and promotes the self-renewal of hematopoietic stem cells [21,22]. Additionally, a RUNXl ortholog, Runt, has been shown in planaria to be required for neoblast stem cells to differentiate at wound sites [23]. Taken together, these observations suggest the intriguing possibility that this function of RUNXl/Runt is conserved across species and cell types.

Methods

Ethics Statement

Exemption status for human research was obtained from the Committee on the use of Humans as Experimental Subjects (COUHES) at MIT, based on de-identification of the samples. All patient samples are de-identified prior to distribution for research use. The data collected and stored is limited to basic demographic data, specimen handling information (ex: related to chain of custody), specimen quality data, and histopathologic data. At no time is any patient identifier provided to any researcher.

Primary Cells, Cell Lines, Tissue Culture, and Lentiviral Production

Primary tissues were obtained with consent in compliance with laws and institutional guidelines, as approved by the Institutional Review Board of Maine Medical Center. Organoids were aliquoted in 1:1 DMEM/Hams-F12 media supplemented with 5% calf serum, 10 ng/mL insulin, 10 ug/mL epidermal growth factor, 10 ug/mL hydrocortisone, and 10% DMSO and stored in liquid nitrogen. Doxycycline (dox), where applicable, was used at a concentration of 4ug/ ml.

Constitutive shRNA plasmids in a pLKO.1 vector were obtained from the Broad Institute RNAi consortium (https://www.broadinstitute.org/rnai/trc3), and inducible hairpins (dox “ON”; pTRIPZ vector) were obtained from Thermoscientific. OvereXpression constructs were obtained through gateway cloning of the appropriate ORF into the pLenti6.2-cch-3XFLAG-V5 construct.

PEACS: Perturbations

One day after infection, cells were selected with 5ug/ml puromycin containing media. Two days later RNA was collected with the Qiagen RNeasy 96 Biorobot 8000 kit and cDNA synthesized with the iScript cDNA synthesis kit (BioRad 170—8890).

PEACS: Expression Profiling by qPCR

The 39 TFs profiled were selected by profiling gene-expression in MFCIOAs and selecting all TFs implicated in differentiation that were confirmed to be expressed by qPCR. The cDNA was pre-amplified for 14 cycles with a mix of 41 primer sets (39 TFs, BTub, and GAPDH) and mastermix, then treated with Eon. Prior to analysis with PEACS, the data matrix with the Flui-digm CT values was normalized to GAPDH and median normalized by gene such that the median CT value for each gene was 0. For the idealized experiment, gene expression was profiled using standard qPCR and the 17 genes profiled were randomly selected transcription factors expressed by MCFIOA cells and implicated in differentiation.

PEACS: Algorithm

Let M be a data matrix of perturbation-expression values With rows corresponding to perturbations and columns corresponding to the genes Whose expression was profiled. We used the reduced singular value decomposition to transform M, so that

. ., an and 1/1,. . ., vn are respectively the left and right eigenvectors corresponding to the singular values oi of the reduced singular value decomposition of M. We have assumed here that n < m, i.e., the number of perturbations exceeds the number of genes Whose eXpression is profiled. Because the dimensions of 11,- and v,- are (mxl) and (nxl), respectively, and the a,- are scalars, M can be Viewed as a weighted sum of the rank-one matrices uivfi. We then use the first k singular values and vectors to reconstruct a low-rank approximation of M: The value k is chosen using a Scree plot, as described in the main text. In the case of k = 3: The gene-expression vector for the perturbation p can therefore be approximated by the following weighted sum of the first 3 SVD eigenvectors:

. ., vk. Again for k = 3: These coordinates in SVD-space are plotted as ‘Component scores’ in Figs 2B, 3, and 5A. Finally, to determine the PEACS score, we first calculate the Euclidean distance between the Finally, the PEACS score is calculated by dividing the distance in (Eq 6) by the standard error across replicates for a given perturbation.

For each set of n perturbations, a null distribution of PEACS scores was obtained by sampling n random perturbations 10,000 times without regards for perturbation labels. The p-Value was defined as the rank of the real PEACS score in the null distribution diVided by 10,000. The PEACS code for MATLAB is available as a supplemental file (82 Text) and on our lab website at: http://guptalab.wi.mit.edu/.

Collagen Culture

7.5X103 MCFIOA cells were resuspended in 0.2ml of collagen solution (1.25mg/ml rat tail collagen I in PBS, brought to pH 7.3 With 0.1N NaOH) and plated on a single chamber of a 4-cham-ber slide. Collagen was polymerized for 2 hours at 37°C, after Which they were detached and cultured in 1ml of MCFIOA medium.

Reseeding Assay

The structures were collected by centrifugation (500 RPM, 5 min), resuspended in 0.25% trypsin, and incubated for 20—25 minutes at 37°C. Cells were counted in trypan blue, spun down (5OORPM, 5 min), and resuspended in MCFIOA media; 7500 liVing cells were reseeded into a new collagen pad.

Immunofluorescence

Pads were permeabilized using 0.1% TritonX- 100 and incubated with blocking solution (PBST with 10% goat serum and 3% BSA) for 1 hr at room temperature and stained with the appropriate primary antibody in blocking buffer for 1—2 hours at room temperature or overnight at 4°C. The samples were washed with PBS, and incubated with an Alexa Fluor-labeled secondary antibody. Samples were washed, stained with 1ug/ml DAPI. Images of phalloidin-AF594 and DAPI- stained collagen structures were analyzed by image segmentation software (CellProfiler; [25] ), with an analysis pipeline that differentially detected lobules and ducts based on size, area and form factor adjustments.

Colony Assay

Primary human organoids were thawed and plated on a 10cm dish in 10ml of RMFC (DMEM + 10% Calf Serum) media for 1—2 hours. The non-adherent fraction, fibroblast reduced organoids, was collected, spun 10 minutes at 233 gravity, resuspended in cold PBS and passed 10 times through an 18-gauge needle. The organoids were once again pelleted 5 minutes at 335 gravity, resuspended in 2ml of 0.05% trypsin, and incubated 10 minutes at 37°C. We then added 8ml of RMFC media and 0.5mg of DNaseI (Roche 10104159001). The cell suspension was passed through a 40 um filter and the cells counted. Thirty thousand cells were plated per well of a 6-well plate in MEGM, and assayed for cytokeratin eXpression after 7—11 days, using CK8/ 18 antibody (Vector VP-C407) and CK14 antibody (Thermo 9020-P). Some plates were Visualized using IHC, while others were Visualized using IF with AF488/AF555 conjugated secondary antibodies. |mmunohistochemistryNVestern Blot

The plates were incubated overnight at 4°C with 1:750 CK8/ 18 antibody. The plates were incubated with 1:200 ocMouse-IgG-HRP (Vector BA-2000) for 30 minutes, and stained with DAB according to the manufacturer’s protocol (Vector ABC elite PK-6100; Vector ImmPACT DAB SK-4105). Excess avidin/biotin was blocked with the Vector AV-idin/ Biotin blocking kit SP-2001. Plates were re-blocked for 1 hour in PBS + 1% BSA and 2% goat serum, then incubated for 1 hour at room temperature with 1:750 CK14 antibody in PBS + 1% BSA then incubated at room temperature with ocRabbit-IgG-HRP (Vector BA-1000) for one hour. The plates were then stained with VIP according to the manufacturer’s protocol (Vector ABC elite PK-6100; Vector ImmPACT VIP SK-4605), washed with water and stored dry. Western blots were performed with standard procedures. RUNX1 was blotted with 1:1000 Ab23980 (AbCAM).

Supporting Information

81 Text. Document containing the description of how the MCFlOA organoids were ana-81 Table. Genes probed by qPCR in the idealized experiment referenced in Fig 2. (DOCX)

Developmental transcription factors expressed in MCFlOA cells that were targeted with shRNAs. (DOCX)

PEACS output from MCFlOA perturbations. Displayed are the PEACS scores, uncorrected p-Value, Bonferroni corrected p-Value, and significance (* = raw p<0.01; T = Bonfer-roni-corrected p<0.05) for genes with at least 3 knockdown conditions with 2-fold or higher knockdown. The negative control sets were generated by taking three random sets of 5 hairpins where the targeted gene was not successfully knocked down. P-Values were obtained through Monte Carlo resampling of the PEACS scores, as described in the text.

Scree plots and explained variance plots were used to decide on dimensions for SVD and NMF, respectively. These results are displayed as scatter plots Where (A) the X-aXis contains the SVD number and the y-aXis denotes the variance explained by each SVD in the ideal experiment or (B) the X-aXis contains the rank used by the NMF algorithm and the y-aXis shows the fraction explained by all the components of the factorization in the ideal experiment. Similarly, (C) A scree plot of the SVD results from the MCFIOA experiment was plotted to decide on dimensionality, Where axes are as noted in (A). The results of first and second dimensions of (D) SVD, (E) NMF, and (F) ICA deconvolution were plotted 82 Fig. MCFlOA tissue rudiments express mammary gland markers. Day 8 collagen cultures were stained for basal marker (CK14) and luminal markers (CK8/ 18, MUCl and CSNZ). Nuclei were stained With DAPI. Scale bar, 20 pm.

MCFIOA cells infected with a pool of red, green, and blue Viruses were seeded into collagen matrix. The structures were Visualized in the red, green, and blue channel (overlay shown) at 2 (A) and 6 days (B), revealing monoclonal lobules and monoclonal ducts with occasional fusions. Images were acquired at

Acknowledgments

We acknowledge the Biomicrocenter at MIT for assistance With the Biomark/Fluidigm system, Tom DiCesare for graphical assistance, and Wendy Salmon for microscopy assistance.

Author Contributions

Performed the experiments: ESS SS RAM. Analyzed the data: ESS SS DX] DHM RAM PBG. Contributed reagents/ materials/analysis tools: ESS SS DX] RAM PBG. Wrote the paper: ESS SS DX] DHM RAM

Topics

stem cells

Appears in 40 sentences as: Stem Cell (1) stem cell (14) Stem Cells (3) Stem cells (1) stem cells (30)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. The search for genes that regulate stem cell self-renewal and differentiation has been hindered by a paucity of markers that uniquely label stem cells and early progenitors.
    Page 1, “Abstract”
  2. We have applied this marker-free approach to screen for transcription factors that regulate mammary stem cell differentiation in a 3D model of tissue morphogenesis and identified RUNX1 as a stem cell regulator.
    Page 1, “Abstract”
  3. Inhibition of RUNX1 expanded bipotent stem cells and blocked their differentiation into ductal and lobular tissue rudiments.
    Page 1, “Abstract”
  4. Collectively, our findings show that RUNX1 is required for mammary stem cells to exit a bipotent state, and provide a new method for discovering cell-state regulators when markers are not available.
    Page 1, “Abstract”
  5. The discovery of stem cell regulators is a major goal of biological research, but progress is often limited by a lack of definitive markers capable of distinguishing stem cells from early progenitors.
    Page 1, “Author Summary”
  6. PEACS t0 mammary stem cells resulted in the identification of RUNXI as a key regulator of eXit from the bipotent state.
    Page 2, “Author Summary”
  7. Adult stem cells are functionally defined based on their ability to regenerate tissues.
    Page 2, “Introduction”
  8. This unique regenerative ability can be recapitulated in culture models, Where single stem cells , but not differentiated cells, form tissue rudiments in three-dimensional extracellular matrices.
    Page 2, “Introduction”
  9. For example, mammary stem cells form ducts and lobules in collagen matrices that resemble structures present in the breast [1—3] , While colon stem cells form mini-crypts in Matrigel that resemble analogous structures in the small intestine [4].
    Page 2, “Introduction”
  10. Given their potential for regenerative medicine, there is significant interest in identifying genes that regulate self-renewal or differentiation of stem cells .
    Page 2, “Introduction”
  11. However, for many tissues markers of stem cells and early progenitors are not available, and even in cases Where such markers are available they often only enrich for states of interest.
    Page 2, “Introduction”

See all papers in April 2015 that mention stem cells.

See all papers in PLOS Comp. Biol. that mention stem cells.

Back to top.

ICA

Appears in 10 sentences as: ICA (10)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. A third algorithm, ICA , does not require that the constituent components be orthogonal to one another—and instead identifies components by maximizing their independence in a statistical sense.
    Page 3, “Results”
  2. ICA has proven useful for deconstructing mixed signals (e.g., audio) into their constituent parts.
    Page 3, “Results”
  3. Although our goal in developing PEACS was to apply it in settings where neither the state expression vectors nor cell-state proportions are known, to assess the effectiveness of the algorithms described above (SVD, NMF, ICA ) we needed an idealized context in which cell-state proportions could be experimentally defined.
    Page 4, “Results”
  4. Lastly, we applied SVD, NMF and ICA to the gene expression matrix to assess the relative performance of these algorithms in identifying changes in cell-state proportions.
    Page 4, “Results”
  5. The results of the SVD, NMF and ICA analyses are presented in Fig 2B—2D.
    Page 4, “Results”
  6. Unlike the SVD/PCA and NMF algorithms, the first two ICA components did not correlate with the fraction of cells in any of states A, B or C (all pairwise r2 < 0.13, Fig 2D, SlF Fig).
    Page 4, “Results”
  7. Moreover, in almost all cases the various replicates for a given perturbation did not cluster together in the space spanned by the first two components identified by ICA (Fig 2D right).
    Page 4, “Results”
  8. Collectively these observations indicated that both the SVD/PCA and NMF algorithms ef-fectivelyidentif1ed components that correlated strongly with cell-state proportions, while ICA failed to do so.
    Page 4, “Results”
  9. SVD, NMF, and ICA results.
    Page 18, “Supporting Information”
  10. The results of first and second dimensions of (D) SVD, (E) NMF, and (F) ICA deconvolution were plotted 82 Fig.
    Page 19, “Supporting Information”

See all papers in April 2015 that mention ICA.

See all papers in PLOS Comp. Biol. that mention ICA.

Back to top.

gene expression

Appears in 7 sentences as: Gene Expression (1) gene eXpression (1) gene expression (5)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. PEACS uses a novel computational approach to analyze gene eXpression data from perturbed cellular populations, and can be applied broadly to identify regulators of stem and progenitor cell self-renewal or differentiation.
    Page 1, “Author Summary”
  2. Lastly, we applied SVD, NMF and ICA to the gene expression matrix to assess the relative performance of these algorithms in identifying changes in cell-state proportions.
    Page 4, “Results”
  3. We could directly compare gene loadings in the various components with gene expression in the various states because the gene-expression profiles of the pure states were known in our idealized experimental conditions (Fig 2E).
    Page 4, “Results”
  4. These experiments generated a large data matrix with rows corresponding to gene expression values, and columns corresponding to shRNA perturbations.
    Page 8, “Application of PEACS to a Mammary Stem Cell Model”
  5. Microfluidic qPCR was carried out according to the manufacturer’s Protocol (Protocol 37: Fast Gene Expression Analysis Using EvaGreen on the BioMark or BioMark HD System).
    Page 15, “PEACS: Expression Profiling by qPCR”
  6. For the idealized experiment, gene expression was profiled using standard qPCR and the 17 genes profiled were randomly selected transcription factors expressed by MCFIOA cells and implicated in differentiation.
    Page 15, “PEACS: Expression Profiling by qPCR”
  7. Thus the gene expression data for each perturbation p is mapped into the space spanned by linear combinations of the first k gene-expression SVD eigenvectors 1/1,.
    Page 16, “PEACS: Algorithm”

See all papers in April 2015 that mention gene expression.

See all papers in PLOS Comp. Biol. that mention gene expression.

Back to top.

transcription factors

Appears in 6 sentences as: transcription factor (2) transcription factors (4)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. We have applied this marker-free approach to screen for transcription factors that regulate mammary stem cell differentiation in a 3D model of tissue morphogenesis and identified RUNX1 as a stem cell regulator.
    Page 1, “Abstract”
  2. As a first step, we used gene-expression profiling to identify 39 developmentally implicated transcription factors (TFs) expressed in MCFIOA cells (82 Table).
    Page 8, “Application of PEACS to a Mammary Stem Cell Model”
  3. Inhibiting RUNXl expression caused a 2-fold increase in the number of stem cell micro-col-onies, suggesting that this transcription factor was required for primary human breast stem cells to differentiate in culture (Fig 8B).
    Page 12, “Primary Human Mammary Stem Cells Require RUNX1 to Differentiate”
  4. MCFIOA cells were seeded onto a 96 well plate at a density of 7500 cells per well and infected the next day with hairpin lentiVirus targeting an expressed developmental transcription factor .
    Page 15, “PEACS: Perturbations”
  5. For the idealized experiment, gene expression was profiled using standard qPCR and the 17 genes profiled were randomly selected transcription factors expressed by MCFIOA cells and implicated in differentiation.
    Page 15, “PEACS: Expression Profiling by qPCR”
  6. Developmental transcription factors expressed in MCFlOA cells that were targeted with shRNAs.
    Page 18, “Supporting Information”

See all papers in April 2015 that mention transcription factors.

See all papers in PLOS Comp. Biol. that mention transcription factors.

Back to top.

experimental conditions

Appears in 4 sentences as: experimental conditions (4)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. Second, experimental conditions that perturb transitions between stem and progenitor states will also perturb the relative proportions of stem and progenitor cells in a heterogeneous population of cells.
    Page 2, “Results”
  2. Experimentally defining cell-state proportions would make it possible to assess, for each algorithm, how well it identified changes in cell-state proportions across experimental conditions .
    Page 4, “Results”
  3. To generate such idealized experimental conditions we mixed three different breast cancer cell lines (T47D, SUM159, MDA-MB-231) in defined proportions—for example 1:1:1, 1:2:2, 1:1:0—with 10 mixtures in total.
    Page 4, “Results”
  4. We could directly compare gene loadings in the various components with gene expression in the various states because the gene-expression profiles of the pure states were known in our idealized experimental conditions (Fig 2E).
    Page 4, “Results”

See all papers in April 2015 that mention experimental conditions.

See all papers in PLOS Comp. Biol. that mention experimental conditions.

Back to top.

Monte Carlo

Appears in 4 sentences as: Monte Carlo (4)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. Empirical p-values for PEACS scores were determined by Monte Carlo sampling: for a given perturbation with n replicates, a null distribution was obtained by randomly sampling n expression profiles from the experimental data, calculating a PEACS score, and iterating this process 10,000 times to generate a PEACS score null distribution.
    Page 6, “Results”
  2. The empirical p-value was then determined by ranking the PEACS score for the given perturbation relative to the PEACS scores generated by this Monte Carlo procedure.
    Page 6, “Results”
  3. To calculate a p-Value, a Monte Carlo sampling algorithm was implemented.
    Page 17, “PEACS: Algorithm”
  4. P-Values were obtained through Monte Carlo resampling of the PEACS scores, as described in the text.
    Page 18, “Supporting Information”

See all papers in April 2015 that mention Monte Carlo.

See all papers in PLOS Comp. Biol. that mention Monte Carlo.

Back to top.

p-Value

Appears in 4 sentences as: p-Value (4) p-value (1)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. The empirical p-value was then determined by ranking the PEACS score for the given perturbation relative to the PEACS scores generated by this Monte Carlo procedure.
    Page 6, “Results”
  2. To calculate a p-Value , a Monte Carlo sampling algorithm was implemented.
    Page 17, “PEACS: Algorithm”
  3. The p-Value was defined as the rank of the real PEACS score in the null distribution diVided by 10,000.
    Page 17, “PEACS: Algorithm”
  4. Displayed are the PEACS scores, uncorrected p-Value, Bonferroni corrected p-Value , and significance (* = raw p<0.01; T = Bonfer-roni-corrected p<0.05) for genes with at least 3 knockdown conditions with 2-fold or higher knockdown.
    Page 18, “Supporting Information”

See all papers in April 2015 that mention p-Value.

See all papers in PLOS Comp. Biol. that mention p-Value.

Back to top.

differentially expressed

Appears in 3 sentences as: differentially expressed (3)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. One potential explanation for why the SVD and NMF components tracked cell-state proportions is that the components were identifying genes differentially expressed between cell states.
    Page 4, “Results”
  2. In contrast, SVD component 2 identified the only two genes that were strongly differentially expressed between states B and C (HOXA5, FOXOI; Fig 2E); these two genes, HOXA5 and FOXOI, were respectively down and up in state B relative to state C, and were expressed near median levels in state A.
    Page 6, “Results”
  3. Thus, the highest loadings of SVDl in this idealized experiment marked genes differentially expressed between luminal and basal cells, including the established luminal markers GATA3 and STAT5A.
    Page 6, “Results”

See all papers in April 2015 that mention differentially expressed.

See all papers in PLOS Comp. Biol. that mention differentially expressed.

Back to top.

expression profiles

Appears in 3 sentences as: expression profiles (2) Expression Profiling (1)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. The solution lies in a third key observation: the gene-expression profiles (vectors) of heterogeneous populations of cells are weighted linear combinations of the expression profiles (vectors) of the component states within the population, with the weights in this linear combination corresponding to cell-state proportions.
    Page 2, “Results”
  2. Empirical p-values for PEACS scores were determined by Monte Carlo sampling: for a given perturbation with n replicates, a null distribution was obtained by randomly sampling n expression profiles from the experimental data, calculating a PEACS score, and iterating this process 10,000 times to generate a PEACS score null distribution.
    Page 6, “Results”
  3. PEACS: Expression Profiling by qPCR
    Page 15, “PEACS: Expression Profiling by qPCR”

See all papers in April 2015 that mention expression profiles.

See all papers in PLOS Comp. Biol. that mention expression profiles.

Back to top.

linear combination

Appears in 3 sentences as: linear combination (2) linear combinations (2)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. The solution lies in a third key observation: the gene-expression profiles (vectors) of heterogeneous populations of cells are weighted linear combinations of the expression profiles (vectors) of the component states within the population, with the weights in this linear combination corresponding to cell-state proportions.
    Page 2, “Results”
  2. Several computational algorithms have been designed precisely for this purpose—to infer the constituent components of mixed signals—under the assumption that the mixed signal is a weighted linear combination of constituent components.
    Page 3, “Results”
  3. Thus the gene expression data for each perturbation p is mapped into the space spanned by linear combinations of the first k gene-expression SVD eigenvectors 1/1,.
    Page 16, “PEACS: Algorithm”

See all papers in April 2015 that mention linear combination.

See all papers in PLOS Comp. Biol. that mention linear combination.

Back to top.

single cells

Appears in 3 sentences as: single cells (3)
In Perturbation-Expression Analysis Identifies RUNX1 as a Regulator of Human Mammary Stem Cell Differentiation
  1. To directly examine this possibility we assessed Whether single cells from RUNXl inhibited spheres could form tissue rudiments When seeded into collagen.
    Page 10, “Inhibition of RUNX1 Traps MCF10A Mammary Stem Cells in a Bipotent State”
  2. We seeded cells With dox to form RUNXl inhibited spheres, harvested and dissociated the spheres by treatment With collagenase and trypsin, and then reseeded single cells into collagen With or Without dox.
    Page 10, “Inhibition of RUNX1 Traps MCF10A Mammary Stem Cells in a Bipotent State”
  3. Third, our method analyzes bulk populations of cells to identify changes in cell state ratios, rather than analyzing large numbers of single cells .
    Page 12, “Discussion”

See all papers in April 2015 that mention single cells.

See all papers in PLOS Comp. Biol. that mention single cells.

Back to top.