Observing cells undergoing DSBR in microfluidic wells
The present work builds on a cellular assay for studying DSBR in yeast cells. The assay relies on a bipartite overlapping GFP gene, inserted in a yeast chromosome whose two halves are separated by an intervening 100-bp sequence that contains (CGG)33, (GAA)33, (CTG)33 trinucleotide repeats or a non-repeated sequence (NR) [12]. The different conditions will be hereafter referred to as CGG, GAA, CTG, or NR strains, respectively. A DSB is made within this intervening sequence by either Streptococcus pyogenes Cas9 or Francisella novicida Cpf1 endonucleases [22, 23] (Fig. 1a). Cas9 is a class 2 type II endonuclease, whereas Cpf1 is a class II type V enzyme [24]. They use different PAMs and different gRNAs and exhibit very different structures and biochemical properties. The endonucleases and gRNAs are carried by different plasmids in modified yeast cells, with the endonuclease being under the control of a galactose-regulatable promoter [25]. Endonuclease expression is induced by switching cells from glucose to a galactose-containing medium. This change produces a metabolic switch, slowing down cell division while switching metabolism to galactose utilization [26].
Once the DSB is induced, a series of events takes place, as shown in Fig. 1a, yellow box. DSB resection—following the break—generates two single-stranded DNA ends whose overlapping halves may anneal with each other, thus reconstituting a functional GFP gene. Once the GFP gene is reassembled (i.e., completed DSBR), downstream processes are carried out (as shown in Fig. 1a, blue box), including transcription, mRNA export, and translation, until GFP is expressed and the cell becomes green. Due to checkpoint activation following DSB [27], the cell cycle is transiently halted, so that cells cannot divide with a broken chromosome. This assay is functional and has already shown different efficacies of endonucleases on trinucleotide repeats depending on the stability of secondary structures formed by the gRNA [12].
In vivo observations of single yeast cells undergoing DSBR were envisioned to understand different cell aspects: how likely individual cells were to break and repair, and how these steps integrated within the broader cell cycle. This experiment was enabled by the use of a microfluidic device (Fig. 1b): by confining single cells within microfluidic wells, it was possible to observe individual cell divisions and DSBR completion using time-lapse microscopy (Fig. 1c, d). Moreover, by tracking the progeny of each cell, it was possible to link the emergence of these population dynamics with the scale of individual cellular events.
Microfluidic devices have been proposed before for studying individual yeast cells [28]. For example, Jo et al. [29] developed one for analyzing the replicative lifespan of single cells, while Charlebois and collaborators [30] used individual cell traps for observing the expression of a reporter gene on cells upon changes of temperature. In this study, a microfluidic device with similar geometry to the one presented by Amselem et al. [19] was adapted to observe the yeast cells undergoing DSBR in real time. It consisted of a long and wide chamber (6 × 14 mm) of height h = 30 μm, with one inlet and one outlet. The chamber floor was patterned with a two-dimensional array of 1032 cubic wells of l = 100 μm edge length. Space between the wells was set to d = 120 μm (Fig. 1b).
A typical experiment started by suspending yeast cells at a concentration of 5 cells per nanoliter in a galactose-containing medium (at time \(t_0\)), in order to express the endonuclease. This cell suspension was then rapidly introduced into the microfluidic chip, where the individual cells sedimented into the wells. The well occupancy did not have a homogeneous distribution; wells typically contained from 0 to 5 cells. Only populations that started with a single cell were selected for our analysis in order to monitor the lineage of individual cells. The growth of populations and their GFP expression over time was tracked by time-lapse microscopy (Fig. 1c). For each well, the total number of cells (\(N_{\text {tot}}^{\text {local}}\)) and GFP+ cells (\(N_{\text {GFP}}^{\text {local}}\)) were counted at each time point, yielding a single growth curve per well, as described in the “Methods” section (Fig. 1d). Measurements were collected on samples ranging from 80 to 150 wells in each microfluidic experiment. We define \(t_{\text {lag}}\) the moment at which cells start dividing after \(t_0\) and \(t_{\text {GFP}}\) when they start expressing GFP (Fig. 1d).
The time evolution of DSBR dynamics was studied for 8 different combinations, i.e., two endonucleases (Cas9 and Cpf1) and four target sequences (NR, CGG, GAA, and CTG), using the above analysis pipeline (Fig. 2a and Additional Movie). Generally, individual cells started dividing (\(t_\text{lag}\)) and expressing GFP (\(t_{GFP}\)) a few hours after galactose induction (\(t_0\)). The data for all the conditions are shown in Fig. 2b, where a variety of dynamics is observed for the different target-endonuclease combinations. Here, the access to the absolute number of cells allowed us to point out some important differences between the different conditions, as observed by the bold curves for the mean behavior in Fig. 2b. To be noted that even in identical experimental conditions, cells started dividing at different \(t_{\text {lag}}\), started expressing GFP at different \(t_{\text {GFP}}\), and formed populations of different sizes at \(t=24\) h, as can be seen in any subplot of Fig. 2b.
Strikingly, two cases (GAA-Cas9, CTG-Cas9) showed a strong slowing down of the exponential growth, while the cell numbers in most other cases grew exponentially. This slowing down might indicate a loss of fitness that is associated with the DSBR. Another observation concerned the delay between the growth of the population size and the detection of GFP+ cells. This time difference (\(t_\text{GFP} - t_\text{lag}\)) was in the range of 4–6 h for most conditions except for the condition CTG-Cpf1, where it was above 15 h, indicating different dynamics between the cell cycle and the DSBR process for these conditions. In the case of CGG-Cpf1, only one GFP+ cell was detected during the course of the experiment.
Compared with the diversity of DSBR efficacy that has been described previously [12], the current measurements highlight the variability of timing in the break and repair processes. This dynamic viewpoint motivated the development of a time-dependent ordinary differential equation (ODE) model, as described next.
ODE model built on molecular measurements provides rates of break and repair
Successful DSB induction and repair are the result of a series of molecular steps. In order to identify the relevant time scales in the process, we utilized a model which assumes that, upon induction, an initial population of “modified” cells (containing a specific microsatellite or a non-repeated sequence) has a constant per capita division rate \(\alpha\), while cells switch into a non-growing, broken state, at a rate \(\beta\). The “broken” cells can then become repaired at a rate \(\rho\) and once again begin to grow at a rate \(\alpha\) (Fig. 3a). All rates in the model can be understood as per unit time probabilities, e.g., \(\beta dt\) is the chance for a modified cell to under the broken state in a time interval dt. The fact that the division rate for modified and repaired cells is the same is consistent with population observations that the repeat does not hinder yeast cell replication [12]. This model can be written in terms of a Master equation, as described in detail in the Section 5.
We may also write these processes in the form of three coupled equations to describe these dynamics after a lag time \(\tau\). Before the lag time, we assume the cells undergo no growth, and therefore no DSBR. Letting m, b, and g be the number of “modified,” “broken” (with broken DNA after DSB), and “repaired” (GFP+) cells, we have the system of linear ODEs for the averages:
$$\begin{aligned} \frac{d}{dt}\langle m \rangle= & {} (\alpha - \beta ) \langle m\rangle ,\end{aligned}$$
(1)
$$\begin{aligned} \frac{d}{dt}\langle b \rangle= & {} \beta \langle m\rangle - \rho \langle b \rangle ,\end{aligned}$$
(2)
$$\begin{aligned} \frac{d}{dt}\langle g \rangle= & {} \alpha \langle g \rangle + \rho \langle b \rangle . \end{aligned}$$
(3)
Importantly, due to the linearity of the model, the average \(\langle \cdot \rangle\) can be understood either as an ensemble average over many experiments each consisting of a small number of cells or as the large population size limit of a single experiment.
Upon DSBR induction, since all events happen at different moments for different cells in the culture, a sample of the cell population should contain a mixture of the different states: intact cells, cells displaying a broken chromosome, and cells harboring a repaired chromosome. The dynamics of each of these sub-populations can be quantified by molecular analysis on cells sampled at different times in a growing culture, as shown in Fig. 3b. To that end, cells were collected every 2 h after galactose induction and whole genomic DNA was extracted (see Section 5). Hybridization with a probe specific for the GFP locus revealed three different types of signals on a Southern blot: a 3544-bp band corresponding to uncut DNA, a 2912-bp band representing the DSB, and a 3162-bp band representing the repaired and functional GFP gene (Fig. 3b). Values of the relative abundance of broken and repaired chromosomes are shown in Additional file 1: Fig. S1 and were taken from Poggi et al. [12], except for NR-Cpf1 which was redone here. The fraction of cells that are in the broken state remains low over the 12 h that the measurement is done, since it is a transient state. In contrast, the fraction of cells that have completed DSBR increases over time for almost all conditions, with the notable exception of CGG-Cpf1 and CTG-Cpf1. Note that the NR-Cpf1 case starts already with a comparatively large number of cells that have completed DSBR (40\(\%\) in comparison to less than 20\(\%\) for other cases). This is probably due to the leakiness of the Gal promoter that has a more pronounced effect in this strain background [25].
Using the Southern blot measurements (Fig. 3b), we performed Bayesian inference (see the review article [31]) of the parameters \(\alpha , \beta\), and \(\rho\), which yielded a posterior distribution \(P(\varvec{\theta }|\mathbf{X})\). The posterior distribution is defined as the distribution of the model parameters \(\varvec{\theta }\) conditioned on the observed data \(\mathbf{X}\):
$$\begin{aligned} P(\varvec{\theta }|\mathbf{X}) \propto P(\mathbf{X}|\varvec{\theta })P(\varvec{\theta }). \end{aligned}$$
(4)
Here, the likelihood function \(P(\mathbf{X}|\varvec{\theta })\) gives the distribution of the data given our parameters, where the data consist of the observed population fractions, \(\mathbf{X} = (m/N,b/N,g/N)\) where N is the total number of cells. For each measurement, we assume that the observed fraction is true fraction plus some Gaussian error. The predicted fraction is obtained by solving Eqs. (1), (2), and (3). We further assume that the measurement errors are uncorrelated between different cell states and times. This assumption is, strictly speaking, false, since even if the measurements of m and b are uncorrelated, the measurement errors in their fractions would be correlated. However, numerical experiments with simulated data revealed that the results were robust to this assumption (see Section 5.4 and Figs. S2 and S3).
The distribution \(P(\varvec{\theta })\) represents our priors on both the parameters of the ODE model, as well as the measurement error and lag time (\(\tau\)). With the exception of \(\tau\), we place so-called weakly informative priors on all parameters, that is, priors that only constrain the parameters to a physically reasonable range, rather than incorporating specific information from previous experiments. The same priors are used for \(\beta\) and \(\rho\), as not to favor either breaking or repair as the limiting process. In the case of \(\tau\), the prior is chosen to have a narrow distribution around the known value of the lag. The priors are described in detail in the Section 5.4.
The posterior distributions for the NR-Cas9 condition are shown in Fig. 3b. Comparing the posterior distribution to the prior indicates how much new information about the parameters is obtained from the data. In the case of \(\beta\) and \(\rho\), it can be seen that the data strongly constrain parameter values for many experiments, as evidenced by the fact that the posterior is much narrower than the prior. The value of \(\alpha\) however is less-well determined. This may be expected since the measurements provide ratios of the number of cell types, and not absolute numbers. This selectivity on \(\beta\) and \(\rho\) is reproducible for all cases, as shown in Additional file 1: Figs. S4 and S5 for all experimental conditions.
Next, the model predictions for the population fractions in the broken or repaired states can be compared with the experimental measurements, as shown in Fig. 3c for the NR-Cas9 case. Good agreement is found for most cases (Fig. S6), where the ODE model gives good predictions of the trends and resolves the time scales of broken and repaired fractions. These observations show that the low time-resolution molecular data are sufficient to estimate the parameters of the proposed ODE model, thus predicting the dynamic behavior of DSB induction and repair.
The posterior values of the breaking rate \(\beta\) and the repair rate \(\rho\) are displayed in Fig. S5, for the eight different conditions. From these data, it emerges that the breaking step is rate-limiting for most cases, with the repair happening at a higher rate for all cases. Besides the two very inefficient conditions (CGG-Cpf1 and CTG-Cpf1), three conditions could be described as efficient (NR-Cas9, CGG-Cas9, and NR-Cpf1), with mean values of \(\beta >0.1\) 1/h and a final fraction of repaired cells above 0.6 (Additional file 1: Figs. S5a and S6). The three remaining conditions had intermediate breaking rates \(\beta \simeq 0.09\) and a final fraction of repaired cells not exceeding 0.4. In contrast with the breaking rates, which did not show a strong difference between the two nucleases, the repair rates \(\rho\) were always faster to repair in the case of Cas9 with respect to Cpf1 (Additional file 1: Fig. S5b).
Global behavior
It is informative to begin by comparing the global behavior in the microfluidic device with the bulk measurements, before studying the lineages of individual cells. This is done by summing the time evolution in each of the individual wells and defining the global measures \(N^\text {global}_\text{tot}\) and \(N_\text {GFP}^\text {global}\), for the total number of cells and the total number of GFP+ cells in each microfluidic experiment. From these two numbers, a global fraction \(R^\text {global}=N_\text {GFP}^\text {global}/N^\text {global}_\text{tot}\) can be computed. This global fraction can be compared with the predictions of the ODE model using the parameter values obtained from the Bayesian fit of the molecular data described above.
The dynamics of \(R^\text {global}\), obtained by pooling the different cell positions on a single chip, can then be compared with the predictions of the ODE model. The comparison for all eight experimental conditions is shown in Fig. 4, where the black dots show the experimental measurements while the group of cyan lines show the predictions from the ODE model. In this figure, the measurements previously obtained by flow cytometry [12] are indicated with yellow circles, showing mostly a good agreement with the microfluidic and numerical results. Although the values of the parameters \(\alpha , \beta , \rho\), and \(\tau\) are obtained from the fitting molecular data from a very different setting, the simulated time evolution of the emergence of GFP+ cells matches the microfluidics experiments in most cases. A table containing the root mean square error (RMSE) between the simulated and experimental data is shown in Additional file 1: Table S1, where a higher RMSE value indicates a larger difference between experimental and simulated data.
In these comparisons, two conditions stand out as matching poorly with the ODE model, as can be seen in the Additional file 1: Table S1. The first concerns the NR-Cpf1 case, which grows faster in the experiments compared with the simulations. This is likely due to a leaky induction of Cpf1, which results in DSB induction before the switch to galactose media at \(t_0\). This mismatch between the beginning of the metabolic switch and the break and repair leads to a reduced delay between \(t_\text{lag}\) and \(t_\text{GFP}\) compared with other conditions, as observed by the early onset of the green curves in Fig. 2b. From a modeling point of view, this complexity would add an additional time scale that is not accounted for in the equations. The other case with a poor fit between the microfluidic experiments and the ODE model is CTG-Cas9. This case corresponds to a condition that has a reduced fitness at later times, as evidenced by the slowed growth of the population numbers. In this case (and also on GAA-Cas9 which poorly matches with the flow cytometry results), some individual cells show an abnormal growth in cell size and atypical shapes mostly correlated with being GFP+, as seen in Fig. 2a and in the Additional Movie. The relation between these morphological changes and their impact on the growth of the populations will be studied in detail below where we study the temporal evolution in individual wells.
DSBR dynamics at the single-lineage scale
The above description treats the microfluidic device as a single population. Further insight can be obtained by looking at the dynamics of the progeny of each one of the yeast cells, which shows individual transition events from the initial state (modified, GFP−) to the repaired state (GFP+). By the same token, studying the individual curves gives access to the heterogeneity that exists between different cells within a single experiment.
Typical measurements from three conditions are shown in Fig. 5. By looking at a few individual traces in the case of NR-Cas9 (A.a–e), two situations are dominant: In some wells, the initial cell divides without any of its daughters becoming GFP+ (Fig. 5A.a). The cell proliferation in locations where the repair does not take place tends to slow down after a few initial divisions, as shown by the slower increase of the blue dots. In other wells, the cells turn green some time after the initial division. The time delay between the initial division and the first detection of a GFP+ cell in each well (\(t_{\text {GFP}}\)-\(t_{\text {lag}}\)) is well-distributed around a mean at 5.6 h, as shown in Fig. 5A.b. This delay is consistent with the time required for the cells to translate the new gene and express sufficient GFP molecules to make it detectable. The number of cells turning green after the first detection of a GFP+ cell increases rapidly until it covers all cells within the particular well. This typical behavior is summarized by plotting a few representative curves of the local fraction \(R^{\text {local}}=(N^\text {local}_\text{GFP}/N^\text {local}_\text{tot})\), as shown in Fig. 5A.c. Here, again some lineages remain with a value of \(R^\text {local}=0\) until the end of the experiment but when \(R^\text {local}\) becomes positive it rapidly rises to a value near 1.
Taken together, the measurements of Fig. 5A.a–c indicate that DSB and DSBR take place very early in the lineage tree, possibly in the mother cell or its very first daughters, which explains the low value of the delay and the rapid increase in the number of GFP+ cells. As a result of these dynamics, the distribution of values of \(R^\text {local}\) at the end of the experiment (\(t=24\) h) is strongly bimodal. The statistics are dominated by the extreme values of \(R^\text {local}=0\) and \(R^\text {local}\simeq 1\) (Fig. 5A.d). The intermediate values of \(R^{\text {local}}\) correspond to curves that are in the transition between zero and one at the end of the experiment.
The experimental measurements can be compared with values computed from the stochastic version of the ODE model (see Section 5.5 ), using the same parameter values obtained from the Bayesian fitting in Section 2.2. A sample of the simulated trajectories is shown in Additional file 1: Fig. S7, while the distribution of final values of \(R^{\text {local}}\) is shown in Fig. 5A.e. These simulations reproduce well the tendency of the case NR-Cas9 towards \(R^\text {local}=1\), as seen by the peak in the histogram. Nevertheless, the simulations fail to reproduce the peak at \(R^{\text {local}}\).
The discrepancy between the model and experiments is due to the biological origin of the peak at \(R^{\text {local}}=0\), which corresponds either to cells totally escaping DSB or to broken cells unable to repair the DSB and therefore maintaining cell cycle arrest. This behavior does not correspond to different values of the parameters (\(\alpha , \beta , \rho\)) but rather to some dynamics that are not included in the theoretical model. Although the unbroken/unrepaired trajectories correspond to about \(30\%\) of the wells in the NR-Cas9 case, these positions contribute a small number to the total sum of cells in the experiment since these cells only go through a few division cycles. As a result, they are difficult to observe in the population-scale measurements, which explains the good agreement between the ODE model and global measurements in Fig. 4.
When the same analysis is made for CTG-Cpf1, very different dynamics and statistics are observed (in Fig. 5B.a–e). While the growth of individual lineages from single cells is generally similar to the previous case, the GFP+ cells appear less frequently and much later during the experiment (Fig. 5B.a, b). Indeed, the delay between the first division and the first GFP+ event, when it does occur, is broadly distributed between 5 and 20 h (Fig. 5B.c, Additional file 1: Fig. S7). Moreover, the traces of \(R^{\text {local}}\) do not rise sharply after the first GFP+ cells. Instead, in both experiments and in simulations, they show a much more gradual increase and only reach a small value at the end of the experiment (Fig. 5B.d, e). In this case, the computed growth curves and histogram of final values of \(R^{\text {local}}\) are in good agreement with the experimental measurements (Fig. 5B.e). These observations indicate that DSB and DSBR take place in cells long after the first division. As such, these events only affect a fraction of the progeny of the initial cell, which explains the slow rise of \(R^{\text {local}}\), while most of the lineage tree maintains an unbroken microsatellite.
Finally, a third type of behavior is observed when considering the CTG-Cas9 condition, as shown in Fig. 5C.a–e. Here, the GFP+ cells appear early after the first division (mean time delay is 6 h) but the increase in the number of GFP+ cells is irregular (Fig. 5C.b, c, Additional file 1: Fig. S7). However, this condition corresponds to more complex biological processes, since GFP+ cells display reduced fitness and division arrest after becoming GFP+ (Fig. 5C.a and Additional Movie). If this arrest occurs after the complete population is repaired, it leads to a value of \(R^\text {local}=1\) but on a static population of cells. In other cases, only some of the cells are repaired and slow down their divisions, which leads to a value of \(R^{\text {local}}\) that initially increases before decreasing again (Fig. 5C.c, Additional file 1: Fig. S7). These dynamics yield a large variety of outcomes for the final value of \(R^{\text {local}}\), which covers the whole range between zero and one (Fig. 5C.d, e).
In this last example, the comparison between experimental measurements and simulations from the stochastic model shows good agreement but care must be taken when comparing these two distributions. The peak at \(R^{\text {local}}=0\) is missing for the same reasons as in the NR-Cas9 case above. Moreover, cell cycle arrest of cells that become large is another particularity that is not included in the equations. As such, the model is missing two major specificities of the experiment. Contrary to the two examples discussed previously, the disagreement between the model and the experimental ingredients leads to a poor match in the global ratio (Fig. 4).
Relating the dynamics of individual lineages with the global population behavior
The information shown for three cases in Fig. 5 can be summarized for all conditions by plotting the time dynamics of cell populations as heat maps, as shown in Fig. 6. For each case, three quantities are represented by the color scheme: the number of cells over time (\(N_{\text {tot}}^{\text {local}}\)), the number of GFP+ cells over time (\(N_{\text {GFP}}^{\text {local}}\)), and the value of \(R^{\text {local}}\). The heat maps are constructed as explained graphically in Additional file 1: Fig. S8: each row represents the time evolution from a single well, with the wells ranked according to the total number of cells at \(t=24\) h. Therefore, rows near the top of the graphs represent small final colonies, while rows near the bottom correspond to the largest colonies at the end of the experiment.
Analysis of these heat maps allows us to classify the behavior of DSB and DSBR according to three typical cases.
-
1.
High-efficacy error-free repair, normal cell growth. The four conditions labeled with the star in Fig. 6a follow a high-efficacy situation. These conditions display first a strong correlation between the moment of the first division and the size at \(t=24\) h, as shown by the sideways slant of the pink border describing \(N_{\text {tot}}^{\text {local}}\). The second observation is the relatively narrow delay between the first division and the first GFP+ cell, as seen by the small distance between the black line and the left edge of the pink region in the middle heat map. This small delay indicates that the first repair takes place when there are only a few cells in the well. Finally, \(R^{\text {local}}\simeq 1\) for the bottom part of the heat maps, indicating that the largest individual populations are also the best repaired. This type of behavior is observed in four conditions: NR-Cas9, NR-Cpf1, CGG-Cas9, and GAA-Cpf1, and their progeny trees would resemble the ones illustrated in Fig. 6b, first panel.
-
2.
Low-efficacy error-free repair, normal cell growth. The two conditions labeled with a circle in Fig. 6a follow a scenario that is consistent with a late breaking of the microsatellite. In both of these conditions, the cell division begins in a similar fashion to the high-efficacy cases described above, with a strong correlation between the first division event and the final size of the colony, as seen in the shape of the \(N_{\text {tot}}^{\text {local}}\) heat maps. However, the time for the first GFP+ detection is very long compared with the high-efficacy cases. This long delay is an indication that the break and repair events happen after several cell divisions, as shown schematically in the middle panel of each condition. It is possible that the repair step is also poorly performed by the cells, although it is not possible to confirm this from the current experiments. As a result of this long delay, the values of \(R^{\text {local}}\) all remain small at \(t=24\) h, in line with the low value of \(R^{\text {global}}\) (Fig. 4). CGG-Cpf1 and CTG-Cpf1 show this type of behavior, and their progeny tree would be similar to the one illustrated in Fig. 6b, second panel.
-
3.
Error-prone repair, impaired cell growth. Different dynamics are evidenced by the analysis on the final two conditions, marked with the square in Fig. 6a. The appearance of GFP+ cells here is followed by a loss of fitness, marked by the slowing down or stopping of cell division. A consequence of this behavior is the broad distribution of wells that reach \(R^{\text {local}}\simeq 1\), both for small and large final colony sizes. In contrast with the previous cases, the well with a high value of \(R^{\text {local}}\) is distributed throughout the whole range of colony sizes. This is also the only condition for which the value of \(R^{\text {local}}\) is not monotonically increasing but sometimes decreases.