CRISPR technology allows researchers to edit genomes by altering DNA sequences and by thus modifying gene function. Its many potential applications include correcting genetic defects, treating and preventing the spread of diseases and improving crops.
Genome editing tools, such as the CRISPR-Cas9 technology, can be engineered to make extremely well-defined alterations to the intended target on a chromosome where a particular gene or functional element is located.
These are known as off-target activity. When targeting several sites in the genome, off-target activity can lead to translocations, unusual rearrangement of chromosomes, as well as other unintended genomic modifications.
Current measurement assays and data analysis methods for quantifying off-target activity do not provide statistical evaluation, are not sufficiently sensitive in separating signal from noise in experiments with low editing rates, and require cumbersome efforts to address the detection of translocations.
In the May 24th issue of the journal Nature Communications, a multidisciplinary team of researchers from the Interdisciplinary Center Herzliya and Bar-Ilan University report the development of a new software tool to detect, evaluate and quantify off-target editing activity, including adverse translocation events that can cause cancer.
Known as CRISPECTOR, the tool analyzes next generation sequencing data obtained from CRISPR-Cas9 experiments, and applies statistical modeling to determine and quantify editing activity. CRISPECTOR accurately measures off-target activity at every interrogated locus.
It further enables better false-negative rates in sites with weak, yet significant off-target activity. Importantly, one of the novel features of CRISPECTOR is its ability to detect adverse translocation events occurring in an editing experiment.
“In genome editing, especially for clinical applications, it is critical to identify low level off-target activity and adverse translocation events. Even a small number of cells with carcinogenic potential, when transplanted into a patient in the context of gene therapy, can have detrimental consequences in terms of cancer pathogenesis.
As part of treatment protocols, it is therefore important to detect these potential events in advance,” says Dr. Ayal Hendel, of Bar-Ilan University’s Mina and Everard Goodman Faculty of Life Sciences.
Dr. Hendel led the study, together with Prof. Zohar Yakhini of the Arazi School of Computer Science at Interdisciplinary Center (IDC) Herzliya. “CRISPECTOR provides an effective method to characterize and quantify potential CRISPR-induced errors, thereby significantly improving the safety of future clinical use of genome editing.”
Hendel’s team used CRISPR-Cas9 technology to edit genes in stem cells relevant to disorders of the blood and the immune system. In the process of analyzing the data, they became aware of the shortcomings of the existing tools for quantifying off-target activity and of gaps that should be bridged to improve applicability.
This experience led to the collaboration with Prof Yakhini’s leading computational biology and bioinformatics group.
Prof. Zohar Yakhini, of IDC Herzliya and the Technion, says, “In experiments utilizing deep-sequencing techniques that have significant levels of background noise, low levels of true off-target activity can get lost under the noise. The need for a measurement approach and related data analysis that are capable of seeing beyond the noise, as well as of detecting adverse translocation events occurring in an editing experiment, is evident to genome editing scientists and practitioners. CRISPECTOR is a tool that can sift through the background noise to identify and quantify true off-target signals.
Moreover, using statistical modeling and careful analysis of the data, CRISPECTOR can also identify a wider spectrum of genomic aberrations.
Genome Editing: A Transformative Technology
The application of genome editing is transforming agriculture, biomedical research, and healthcare. The many proposed purposes include the generation of more productive or robust crops and farm animals, animal hosts for the production of tissues for graft purposes and therapies that use ex vivo or somatic tissue engineering [1., 2., 3.].
The promise of applicability is turning into reality, as illustrated by the first nonrandomized Phase I clinical triali in which the use of clustered regularly interspaced short palindromic repeats (CRISPR)-engineered T cells was recently found to be safe [4.].
To date, >20 Phase I/II human clinical trials are underway for a broad range of diseases, including cancers, β-thalassemia, sickle cell disease, and Duchenne muscular dystrophy (summarized and discussed in [1.,5.]).
Genome editing is generally based on either zinc finger nucleases [6.], transcription activator-like effector nucleases [7.], or the CRISPR/CRISPR-associated (Cas) system (see Glossary) [8.]. These molecules act by inducing a double-stranded cut in a specific DNA sequence, which results in a genetic alteration as the gap is being repaired.
In the clinic, the initial applications aim for deletions of genomic DNA intervals and do not yet involve precision at the nucleotide level; thus, these can be executed through the sole delivery of a genome-editing nuclease. However, for more precise editing, such as the generation of point mutations or more intricate changes, or even accurate deletion of a genomic segment, single- or double-stranded DNA templates are also delivered, together with the nucleases, to direct the repair to result in a given sequence by homology directed repair (HDR) [9., 10., 11.] or nonhomologous end-joining [12.].
Base editors [13.] and prime editors [14.] are alternative strategies for more precise editing. Overall, the range of genome editing tools is ever increasing and their transformative potential across a range of fields of application is immense.
Genome Editing: A Disruptive but Still Erratic Technology
The safety of genome-editing technologies is just as critical as their efficiency for their successful application in health or agriculture. Common to all fields of application are the risks associated with undesired genetic changes that can be triggered by genome editing.
The potential for unwanted off-target nuclease activity was recognized early in the application of the CRISPR/Cas9 system as a genome-editing tool [15.]. The frequency of such events and the attached risks were the subjects of much debate [16., 17., 18.].
The general consensus is that, with careful molecular design, off-target events are rare and generally can be segregated away from the allele of interest in genome edited animals, unless the off-target region is in linkage disequilibrium with the target site; however, they are potentially more pernicious in cultured cells or in a somatic delivery system [19., 20., 21., 22., 23.]. In those cases, it is particularly essential that off-target events are captured [24.].
However, on-target effects and ectopic insertions of donor template are less predictable and have often been underestimated.
Nevertheless, on-target effects of genome-editing enzymes are also now better documented. These can take many forms: single nucleotide variations, indels, large and/or complex genomic rearrangements, segmental duplications, chromosomal translocation, terminal chromosomal truncation up to several megabases, or loss of one or both arms of a chromosome [25., 26., 27., 28., 29., 30., 31.] (Figure 1), and some mutagenic events are not compatible with efficiently populating a cell lineage in vivo [30.].
These effects are intimately linked to the kinetics of the enzyme’s interaction with the DNA, and with the DNA repair pathways [32.,33.]. This results in multiple cutting at the target site and can lead to the alteration of a larger than expected segment by ~1–2 kb up to 50 kb at a frequency of ~15–20% of the target DNA in somatic cells [25.,26.,29.]. In early embryos, this additionally translates to mosaicism of the mutated alleles [34., 35., 36.].
Similarly, repair with a template can result in a variety of sequence changes even when the insertion of the repair template is on target (Figure 1). It can yield unpredictable and sometimes complex events at the target site, such as partial insertion of the template, sequence duplications, inversions or rearrangements of the template in combination with endogenous sequences [35.,37., 38., 39.], as well as ectopic insertions of the repair template.
The delivery of a template for repair or vectors for nuclease expression can also yield ectopic insertions that could affect the safety of the approach [40.,41.]. The recent example of genome editing of the POLLED allele in cattle [42.], in which the full outcome had not been identified from the first analysis of the edited cows [43.], illustrates the difficulties involved in thoroughly identifying ectopic insertions of the repair template.
Additional studies show that pervasive insertion of the donor template across the genome can remain undetected with conventional methods, such as PCR and Sanger sequencing [39.]. These examples underline questions relating to the prevalence and type of on-target modifications and ectopic insertions of the donor DNA following codelivery with nucleases. They also articulate the importance of using the appropriate assays to evaluate the correctness of the resulting genome-editing event(s). (See Box 1).
Unknown Unintended Consequences of Genome Editing
As genome editing is increasingly used, further unexpected and potentially negative outcomes of its application are still being uncovered. For example, the perdurance and transmission of DNA double-stranded breaks (DSBs) is a phenomenon so far overlooked and forms a molecular basis for several mixed alleles arising from a single cutting event [31.,44.].
Also, the occurrence of potentially extensive gene conversion [45.] of edited alleles went unnoticed until recently. This involves the transfer of DNA from one genomic location to another by homologous recombination. Examples of gene conversion are the transfer of the DNA from delta-hemoglobin to beta-hemoglobin in the case of dividing or nondividing cells, or transfer to the use of the paternal allele as a repair template in the case of embryonic cells [46., 47., 48.].
Gene conversion could result in a partial or full repair of the allele; it was hypothesized that such a mechanism could be utilized as an internal template repair for precision editing, but this is still disputed [47.]. In particular, because conversion tracks may expand well beyond the targeted region, the resulting loss of heterozygosity represents an additional risk for clinical application [31.,49.].
Finally, and importantly, it is now becoming increasingly apparent that genomic segments can be inadvertently altered at comparatively large distances from the cutting site [29.,50.]. The frequency of such outcomes and the genetic range susceptible to alteration following genome-editing intervention remain to be fully appraised.
All of these poorly understood consequences pertain to changes to the DNA sequence. Other potential unknown consequences of on- and off-target effects could have an entirely different molecular basis, such as deregulation of the chromatin environment or the 3D organization of the nucleus, which could change the genome stability or gene expression. The incidence of such potential consequences is as yet largely unexplored,
The Context of the Genome Editing Application Changes the Question
In all instances, the challenge is to fully apprehend the editing outcomes that may have adverse consequences. This is likely to require the application of a suite of molecular techniques to interrogate the different artefactual features that can be encountered in genome editing. These features can be diverse in scale (single base to megabase) and may involve additional template insertions. All of these outcomes can occur at the targeted site or ectopically. S
econd, to add more complexity, the extent and nature of lesions vary considerably depending on whether the editing occurs in nondividing somatic, dividing somatic, or germinal (early embryonic) cells. In the case of euploid clonal cell populations or the progeny of founder animals, there are only two alleles for each autosomal locus and, therefore, a maximum of two variants may need to be identified. By contrast, animals born from genome editing of early embryos are generally mosaic [34.,35.,37.].
Modification of pools of cultured cells yield heterogenous cell populations [18.], and tissues modified by somatic modification [51.] represent yet a larger degree of genetic complexity. In all these examples, multiple and potentially diverse allelic variants are represented at different frequencies. It is critical to elaborate a clear strategy that takes into account these different degrees of genetic complexity to uncover, fully characterize, or ideally prevent, these unwanted events.
The stakes are high because the impact of incomplete characterization of editing effects is potentially important in all areas of application: in the laboratory, the risk is of irreproducible or artefactual research. Therefore, information obtained with genetically edited founder animals (likely to be mosaic) must be interpreted on the basis of the intrinsic genetic complexity of these animals and the genetic content of progeny must be extensively revalidated. Equally, interpretation of data obtained with edited culture cell pools (where repair may result in many different alleles with various rates) requires an understanding of the genetic composition of these complex cell populations, such as mosaicism [25.]. When editing is used for the production of agricultural products (plants or animals), it remains unclear whether uncontrolled outcomes may pose a risk to the users. Such variability may prevent licensing for commercialization by regulators or negatively affect the confidence of consumers in the safety of those products [43.,52.]. For use in the clinic, in tissue engineering or by somatic delivery, the degree of variability of genomic outcome that may be acceptable in terms of safety remains to be appreciated and may not represent an insurmountable obstacle [4.]. However, the range of edited sequences that can result from a given therapeutic intervention must still be thoroughly understood to evaluate the associated benefit–risk balance [24.]. Finally, an inability to fully validate the consequences of CRISPR/Cas activity throughout the embryo represents a practical barrier to germline editing in the clinic [53.].
In all applications of genome editing (whether in biomedical research, agricultural production, or the clinic) a thorough evaluation of these outcomes is necessary for a realistic appraisal of the benefit–risk ratio associated with the use of the technology. The required level of investigation depends on the specific application, but all demand the ability to anticipate the whole range of potential (wanted and unwanted) consequences of each genome-editing intervention.
Strategy for Validation
Capturing the variability of genome-editing outcomes requires increasing investment in resources as the attention extends away from the target site: (i) as a minimum, amplification and sequencing of the target site and of chromosomally linked potential off-target sites should be achieved; (ii) quantification of the number of copies of deleted segments or donor template to capture on target duplications and ectopic integrations should be included. (This is also essential for all applications.); (iii) the use of more elaborate assays to inform on potential larger-scale chromosomal rearrangements is desirable, because an increasing number of examples have been identified in which additional sequence changes away from the cutting site have been found; and (v) where required, analysis should be extended to the whole genome to predict or capture potential off-target sites.
Equally, a pragmatic approach to the interrogation of genome editing takes into account the likely genetic complexity of the edited material (whether all cells have identical genomes or constitute a genetically diverse population) and the context of application (see Box 1). For example, mosaic founder small laboratory animals will be bred, thus allowing for the segregation of most unwanted edits at the next generation. Therefore, it is only essential to search for the presence of an allele of interest and linked off-target effects. Definitive characterization of the model can await transmission of the allele of interest to the subsequent generation.
By contrast, the whole gamut of mutations arising from CRISPR/Cas activity is to be considered when this technique is used in large livestock (because associated financial and welfare costs are high and timelines extended by long gestations) or for somatic treatment in the clinic [24.,43.]. Therefore, depending on the genome-editing application, the strategy for validation will either aim to identify the presence of a specific variant, seek to capture complexity, or definitively ascertain an entire genetic make-up. In summary, the genotyping strategy will take into account the ability of each molecular assay to cope with the genetic complexity of the material and will customize effort for the context of utilization.
Factors that Should Be Assessed when Validating Genome-Editing Outcomes
1. What is/are the new genetic modification(s) on target? What is the length of the interval potentially altered by the intervention?
2. Are there sequence changes in potential off-target sites? Are these sites physically linked to the target locus?
3. What is the number of copies of the mobilized segments (deleted or introduced as template)?
4. Can the purpose of the application cope with potential unwanted changes in the genome?
Capturing the Variability of Genome Editing Outcome on Target
Appraising the on-target outcome of CRISPR/Cas activity was initially perceived as a straightforward exercise and, therefore, was performed by a simple set of standard molecular biology protocols: surveyor assays or PCR amplification and Sanger sequencing, with, in some instances, prior cloning of the PCR product [9.] (see Table 1).
Although such approaches are generally sufficient to detect the presence of the desired mutation [36.], they do not support capturing the entire range of sequences that arises from the CRISPR/Cas mutagenesis effect on target sites in materials of complex genetic make-up. For example, larger deletions that include the sequences annealed by at least one of the PCR primers used for genotyping are not detected [26.].
Equally, low frequency events may be overlooked, or relevant cell lineages may be inaccessible for sampling or under-represented in samples; for example, a variant may not be detected within the somatic cells of a founder animal (ear biopsy) but may be identified in their progeny [37.].
Table 1Methods for Analyzing Loci Targeted by Genome Editing
|PCR and Sanger sequencing||Easy to implement||Can be difficult to interpret and may overlook some alleles||[9.]|
|PCR, subcloning, and sequencing||Easy to implement, appropriate to disentangle mosaic cell population||Work intensive and may overlook some alleles||[9.]|
|Surveyor assay (i.e., T7 nuclease assay)||Easy to implement||Does not provide sequence granularity||[9.]|
|Southern blot||Interrogates a large genomic segment||Does not provide sequence granularity||[41.,54.]|
|FISH||Interrogates a large genomic segment||Does not provide sequence granularity||[55.]|
|Fiber-FISH||Interrogates a large genomic segment||Does not provide sequence granularity||[28.]|
|dPCR or qPCR||Detects duplications||Does not provide sequence granularity||[27.,39.,62.]|
|TLA||Interrogates a large genomic segment||Expensive to implement||[56.]|
|PCR and short read based-NGS||Appropriate to analyze many genome-editing samples||Expensive to implement unless large numbers of samples analyzed||[57.]|
|PCR and long read based-NGS||Appropriate to disentangle mosaic cell population and interrogates a large genomic segment||Expensive to implement unless large numbers of samples analyzed||[26.,58.]|
Southern blot analysis appraises a wider genetic interval and can identify genomic changes away from the immediate vicinity of the targeted sequence [41.,54.]. Cytogenetic methods and fiber fluorescence in situ hybridization (fiber-FISH) [28.,55.] support the survey of an even broader region and can identify unwanted insertions or deletions of genetic material as well as large-scale sequence rearrangements.
Given the variability of the outcome and the length of the modified segments, a fuller examination requires more elaborate and expensive assays, such as targeted locus amplification (TLA) [56.] or, in the case of material with a complex genetic make-up, high-throughput short-read [57.] or long-read [26.,58.] sequencing. For the latter two, targeted sequencing can rely on the isolation of the loci of interest by simple PCR [26.,57.,58.], but this limits the size of the interval that can be interrogated. Other approaches for larger template isolation are emerging to lift this constraint (e.g., biotin-labeled probes [59.] and Cas9-aided capture [60.,61.]) (see Table 1).
Scanning the Genome for Wider Consequences of CRISPR/Cas Activity
Genome-editing nucleases are powerful tools to introduce sequence changes at a target locus, but they can also lead to changes in other similar sequences genome wide. Copy counting of a deleted segment will inform on the possibility that an unexpected rearrangement has occurred instead of the simple removal of an interval of interest [27.,36.]. Equally, copy counting of the DNA template (single- or double-stranded) will identify additional integrations.
Digital PCR (dPCR) generally is a straightforward assay for this, but standard quantitative PCR (qPCR) can also be used [62.]. On a larger genomic scale, comparative genomic hybridization (CGH) arrays and FISH enable a whole genome to be surveyed and can identify large sequence alterations away from the nuclease cutting site [29.].
Whole-genome sequencing allows for a broad and unbiased capture of the genome-editing outcome [63.] but such an approach is expensive and inadequate for complex genetic materials, such as heterogeneous cultured cell populations, or when genome editing nucleases are used with somatic delivery.
The complexity of this question demanded the development of bespoke analysis approaches for more effective identification of off-target effects. Many solutions have evolved, based on sequencing of captured susceptible sites (e.g., GUIDE-seq [64.], CIRCLE-seq [65.], LAM–HTGTS [66.], UDiTaS™ [67.], Digenome-seq [68.] and CHANGE–seq [69.]). An alternative method captures off-target CRISPR/Cas activity ‘red-handed’ by detecting the DSB repair complex MRN, binding to genomic DNA using ChIP-seq, a method called DISCOVER-seq [70.]. Methods are continuously evolving in particular to address the remaining challenges of capturing the rarer events in samples of high genetic complexity and of eliminating bias towards particular types of sequence modifications.
No Single Assay Captures All the Potential Outcomes of Genome Editing
Crucially, no single technology is able to capture all of the unexpected sequence changes that can result from genome editing. Targeted sequencing using Sanger, or next-generation methods [26.,58.,71.], affords validation of the targeted locus to the single-base level, but only reports on sequence variation at loci that are chosen as relevant and on the integrity of an interval of a limited size; neither do these techniques identify additional sequence changes elsewhere.
Droplet digital PCR [27.,39.] or even Southern blot analysis [41.] help to identify unexpected copy numbers of given sequences, but do not report on the exact sequences. Neither of these techniques unravels the complexity of nonclonal materials that contain many genetic identities. Technologies based on the visualization of chromosome segments with fluorescent probes permit the survey of large regions but generate data of low resolution.
All strategies to identify distal or off-target activity also have sensitivity limitations and biases. Sanger sequencing can be applied to many off-target sites, but the loci for analysis must be predicted. Protocols based on Sanger or short-read sequencing do not readily identify structural variations [72.].
By contrast, FISH cytogenetic analysis and the elegant variation of DNA combing allow for the documentation of large structural variation at the expense of the granularity of sequencing information. Methods for capturing potential off-target sites [64., 65., 66., 67., 68., 69.] may not reveal all events, whereas DISCOVER-seq technology [70.] captures events contemporary to the assay, but not those that occurred earlier in the time course of the intervention.
Finally, even whole-genome sequencing technologies, when they can be afforded, will have their own biases that leave remaining ambiguities in terms of nuclease activity consequences: short read-based sequencing is likely to miss structural rearrangements, whereas long-read sequencing can lack accuracy if ample coverage is not obtained. With either modality, achieving close to a full genome sequence requires heavy investment, even with genetically homogeneous material.
Nevertheless, characterization of potentially complex genome editing events is essential to establish the reliability of genome edited materials and their suitability for their intended use. Defining the appropriate validation strategy will determine the best possible combination of assays in terms of their scope and available resources, and requires the anticipation of potential outcome, genetic complexity of the edited material, and essential quality criteria for a given application. Understanding the frequency of the different variants that result from each genome-editing application will also underpin the development of refined strategies for a more exact outcome in future attempts [58.].
Preventing the Damage
Whilst it is important to identify unwanted events, their prevention seems even more desirable. To alleviate the risk of unwanted outcomes following CRISPR/Cas9 editing, many strategies have been proposed. Early in the development of the technology, predicting the mutagenesis pattern of the guide RNA through its computational design was a major focus for precision editing. Guide efficiency, as well as the prediction of mutagenesis effects and their potential off-target effects, is now better understood. In addition, noncanonical single guide RNA (sgRNA), Cas9 variants selected for enhanced specificity [73.], and nickases [74.] were used in initial strategies to achieve accurate interventions, in many cases to the detriment of efficiency.
An alternative approach aims to focus activity on the desired target by increasing protospacer adjacent motif (PAM) selectivity, thereby diminishing activity at some other nontarget sites that harbor alternative PAM variants [75.]. In addition, temporally controlling its activity by using ribonucleoprotein or a ubiquitin-proteasome degradation signal could help to restrict extensive DNA cutting [76.]. The introduction of spatial control by expressing Cas9 in a specific cell type or targeting its delivery could also reduce the risk of DNA damage [24.,77.]. Finally, competition with inactive ribonucleoproteins (RNPs) targeting off-target sites has been proposed as a means to focus genome editing onto the target site [78.].
Where a DNA template is used, tipping the balance in favor of HDR against nonhomologous end-joining and other repair events is beneficial to ensuring quality. This may be achieved by codelivery of HDR effectors [79.], by pharmacological intervention using small-molecule compounds [80.] (although this may reduce cell viability), or by directing Cas protein expression to specific cell-cycle phases [81.,82.].
The choice of the repair template is also of primary importance to reduce or eliminate the prevalence of ectopic insertions. For example, circular DNA repair templates [11.] or templates tethered to the CRISPR complex [11.,83.] could result in a higher proportion of on-target integrations compared with double-stranded and single-stranded linear DNA donors.
Equally, delivery of the template at a lower concentration would result in a lower copy number being ectopically inserted across the genome [84.], although this may affect overall efficiency of the genome-editing attempt. All of these techniques have been shown to enhance the frequency, or proportion, of desired outcomes, but none of them guarantees it. Thorough monitoring of outcomes remains essential in all cases and for all applications.
New generations of genome editing tools, such as base editing [13.] and prime editing [14.], represent further progress towards controlling genome editing outcomes, but are not yet capable of absolute precision editing.
reference link : https://www.cell.com/trends/genetics/fulltext/S0168-9525(20)30247-X
More information: Ido Amit et al, CRISPECTOR provides accurate estimation of genome editing translocation and off-target activity from comparative NGS data, Nature Communications (2021). DOI: 10.1038/s41467-021-22417-4