How AI can accelerate R&D for cell and gene therapies – McKinsey

09 Sep 2023

Novel modalities carry huge potential.¹ Within oncology, for example, cell therapy is expected to become the third-largest segment across all modalities (behind antibodies and small molecules) by 2030, with 35 percent CAGR in sales over 2021–30 (Exhibit 1). Gene- and RNA-based therapies, on the other hand, are unlikely to play a major role in the short to medium term, although there are currently more than one hundred such assets in Phase I–III studies.
Bringing novel cell and gene therapy (CGT) modalities to patients successfully remains challenging. Notable headwinds include the complexity and heterogeneity of the solution space, manufacturing and supply chain challenges (especially for personalized therapies), and the difficulty of appropriately matching therapies to the suitable patient endotypes. Moreover, while AI applications are taking off in the wider biopharmaceutical R&D context, companies are only starting to explore how to apply their potential to CGT.
This article is a collaborative effort by Mayank Bhandari, Amelia Chang, Thomas Devenyns, Alex Devereson, Alberto Loche, and Lieven Van der Veken, representing views from McKinsey’s Life Sciences Practice.
There is significant untapped opportunity in the industry to scale AI within the CGT value chain. Biotechnology companies enabled by machine learning (ML) that focus on novel modalities are still rare. Moderna is perhaps the most mature example, with a strongly articulated ten-year vision to have digital and analytics at its core to boost its mRNA platform.²
In the past three to five years, additional earlier-stage companies—including Modulus Therapeutics, Outpace Bio, and Serotiny in the cell therapy space; Dyno Therapeutics and Patch Biosciences in the gene therapy and adeno-associated virus (AAV) space; and Anima Biotech in mRNA-based therapeutics—have started to emerge. While the fairly limited scale of CGT over the next ten years could slow the acceleration of AI-driven companies that focus purely on these modalities, the upside may be significant, given the recent wider acceleration of AI in biopharma R&D.
Applying AI to R&D for novel therapeutic modalities brings three principal challenges:
Despite these challenges, using AI in R&D could further accelerate CGT innovation. The field is maturing rapidly and has started to receive an influx of talent and venture funding, with further proof points for its applicability and scalability expected soon. What, then, are the relevant use cases?
Let’s explore three different novel pharma modalities: mRNA-based therapeutics and vaccines, viral therapeutics (such as AAV gene therapy), and ex vivo therapeutics, focusing on chimeric antigen receptor (CAR) T cells. AI can facilitate development of a novel therapy throughout the R&D value chain in a variety of stages, including target identification, payload design optimization, translational and clinical development, and end-to-end (E2E) digitization (see sidebar, “Summary of major AI use cases across the cell and gene therapy value chain”).
Looking along the length of the cell and gene therapy (CGT) value chain, from target identification to clinical development, multiple AI use cases are available. While several use cases are general to all modalities, others are confined to one or more of the following specific areas: mRNA-based therapeutics, viral therapeutics, and ex vivo therapeutics (such as chimeric antigen receptor [CAR] T cells).
Applying AI to R&D for CGT begins with target identification. Here, the biggest challenge centers on selecting the appropriate target to optimize the probability of therapeutic success. Given the heavily personalized nature of most CGT and significant resource investment downstream, it is critical to have robust algorithms that enhance both speed and accuracy at this stage. AI and ML models can be used in various ways.
For viral therapeutics that aim to edit the genome, algorithms to predict CRISPR target sites can help identify genomic sites with genetic sequences or epigenetic features that permit increased efficiency of editing with minimal off-target activity. Older algorithms are hard coded to predict sites based on a set of known binding rules. Newer models based on ML and deep learning are trained on real-world experimental data and outperform older models.³
For therapies that aim to harness the immune system to target specific cancer cells or pathogens (such as mRNA-based vaccines or CAR T-cell therapies), AI and ML can be used to predict tumor epitopes that could be bound by the therapeutic molecule. For CAR T-cell therapies, for example, AI and ML can be used to facilitate the identification of appropriate antigens and binding sites, thereby enabling the design of CARs that have improved on-target activity and minimal cytotoxicity.⁴For example, the ML framework CIBERSORTx can infer gene expression profiles specific to cell type without the physical cell isolation from the tumor and can link phenotypic states with distinct driver mutations and tumor responses with immune checkpoint blockades. For more, see Aaron M. Newman et al., “Determining cell type abundance and expression from bulk tissues with digital cytometry,” Nature Biotechnology, July 2019, Volume 37.
Algorithms that predict protein structure (such as the AlphaFold Protein Structure Database and system) can be used to model how patient-specific mutations affect protein structure and thus CAR binding. Newer functional foundation models (such as ProteinBERT) go beyond the structure to estimate these functional properties of interest directly.⁵ Once a set of possible candidates has been identified, AI and ML can be used to facilitate mass in silico screening of thousands of CAR constructs to identify candidates with high tumor-specific binding affinity and concomitant ability to activate the immune system.
Similar techniques are relevant to construct personalized mRNA- or DNA-based cancer vaccines. They identify the antigens of an individual’s tumor that could solicit the desired immune system response (for example, through epitope prediction). Spatial transcriptomics—visualizing gene expression at different tumor locations at a single-cell resolution—brings a spatial dimension to these efforts, facilitating the understanding of interactions among cell subtypes to find novel targets for cancer therapy discovery.
After the identification of an appropriate lead target, the next stage involves optimizing payload design. Here, the challenge is to modulate the functional activity and tissue specificity of the therapeutic molecule while minimizing unwanted effects (such as activation of the immune system). AI and ML models can be used to screen high numbers of candidates rapidly and select designs that fulfill the desired criteria, similar to their use in target identification.
AI and ML models can be used to screen high numbers of candidates rapidly and select designs that fulfill the desired criteria, similar to their use in target identification.
To be most effective, the models should be part of an AI-enabled closed-loop research system, with initial primary screening results automatically fed into an ML pipeline. This pipeline starts to learn how the assay responds to each payload based on its computational features. It then suggests a next batch of optimized payload candidates for experimentation. Resulting experimental data are in turn automatically fed back to continue the learning, closing the research system.
For the closed loop to work, at least three elements should be in place:
Exhibit 2 illustrates how different computational and ML components could work within a closed loop for CGT lead optimization. Starting from the actual payload design (DNA, RNA, or protein), it is important to be able to explore the allowed design space computationally through in silico mutations. From there, molecular structure can be computationally inferred and a whole range of payload properties predicted. Finally, payload function can be measured through the relevant assays, whether via genome-activity-editing assays, transcriptomics, protein expression, or tissue specificity. The results can then be linked back to the original sequence, structure, and properties to understand (via ML) what drives function and suggest new payload designs to test.
Delivery vehicle design could similarly be part of an AI-enabled closed-loop research system. For instance, AI and ML could be used in vehicle design to increase AAV capsids’⁶ tissue specificity, load capacity, and stability:
A similar concept applies to lipid nanoparticles, although the backbone is chemistry based and exploring the relevant design space is exponentially harder.
The development of chemistry, manufacturing, and controls (CMC) processes for these novel pharma modalities might be particularly well suited to an in silico process development approach, given the modalities’ platform-like nature and the relative independence of each molecule design. This approach encompasses the virtual design of production methods and equipment (instead of extensive lab optimization and screening experiments) to optimize production processes using a digital twin. The digital twin is built using a mechanistic model of each process step and complemented by statistical models based on previous process runs to reduce development costs, enable rapid scale-up and minimal tech transfer, and accelerate time to market.
During the translational and clinical development stage, AI and ML can assist in getting CGT to the clinic by minimizing safety risk in clinical trials and increasing the overall probability of success. Preclinically, this starts with finding translational biomarkers indicative of future trial success, as well as a way to simulate patient heterogeneity through more complex preclinical assays. Although using AI to optimize trial design is not specific to novel modalities, it may be of particular importance given their association with typically small patient population sizes, long treatment processes, and potential for severe adverse events.
AI and ML algorithms can help identify the right patients, estimate optimal dosing, and predict severe adverse events based on patient profile and real-world data on response to similar treatments. Models can be trained to screen patient records for comorbidities and to use genetic profiles to identify the patient subgroups that will have the greatest response to the therapy. To enable this type of precision medicine, building up large integrated clinicogenomic databases for disease areas of interest is required.
Finally, digitization across the entire E2E chain can add value—for example, by linking data from preclinical studies to trials, CMC readouts, and manufacturing batch records, allowing the tracing of a therapeutic design from its inception onward. It can also facilitate long-term tracking and certification of patient outcomes, which are important for establishing patient, healthcare provider, and payor confidence.
Long-term follow-up may also become important as innovative payment models arise to address CGT-specific payer challenges. Finally, detailed tracking of the E2E supply process can improve patient safety and outcomes. This is particularly important for personalized CAR T-cell therapy, with which maintaining a clear chain of identity and custody is important to ensure that a patient receives their own modified cells.⁸
The CGT AI opportunity is predicated on operating within an industrialized framework, allowing for scalability, adaptability, and sustainability. This includes an experimental data generation engine that is both well oiled and tightly embedded in a closed loop to cope with long and expensive manufacturing timelines. Data across the value chain (for example, between research and CMC) need to be easily linkable, as fields are much more interconnected and interdependent than for classic modalities, with potentially significant variations on a batch-by-batch basis. This includes a focus on designing E2E ML operations (MLOps) solutions, integrated into the research system and driven by user experience. Finally, specific data science, engineering, chemistry, functional biology, and disease expertise could come together to tackle challenges at the edge of scientific understanding.
Data across the value chain need to be easily linkable, as fields are much more interconnected and interdependent than for classic modalities.
Companies are putting these enablers in place in different ways, each with upsides and downsides. Broadly, they are pursuing three main approaches—externalization of capabilities, selective partnership, and internalization of capabilities—across a spectrum of collaboration with biotech start-ups, each involving different risk profiles, talent considerations, and potential width of capabilities. Of course, a few companies take a mixed approach across archetypes, depending on the modality or therapeutic area.
Some biopharma companies active in CGT opt to externalize capabilities in applying AI and ML to their R&D processes. Given that these technologies are at an early stage, an advantage of this approach is to derisk and compartmentalize. It leverages these technologies from a partner with the right expertise and talent for a well-defined scope and milestones to sharpen focus and move more rapidly, which is especially relevant for novel modalities with an unproven record with greater inherent drug discovery risk.
However, there is no buildup of internal AI and ML capabilities, plus a risk of the biotech start-up learning and benefiting more from the partnership than the other way around, including potential loss of intellectual property. In short, while outsourcing AI capabilities could be a straightforward strategy in the short term to minimize a company’s risk or could be an option for modalities outside of a company’s core focus, this does pose the real risk of losing scientific edge within a company’s core R&D engine over the long term.
Other biopharma companies use a selective-partnering approach with a clear path toward internalization of capabilities. The approach’s advantages are similar to those of the externalization of capabilities archetype, offering a way to tap quickly into the best expertise and talent available while being able to derisk and focus. Moreover, there is a clear (albeit longer) path toward internalization of these capabilities and the talent supporting them. However, it also means there is likely limited incentive to be at the forefront of innovation and internally a lack of focus on company-wide assets and capabilities.
A third group works to develop and internalize capabilities to set up AI-enabled closed-loop research systems for novel modalities. If done right, this archetype allows for a broad base of digital, data, and analytics capabilities, which can power a company-wide R&D transformation. The focus could typically be on transversally applicable and generalizable tech across many teams, such as automated image segmentation and labeling and protein-structure prediction. This industrialized internal backbone could then allow to plug and play cutting-edge external technologies.
Disadvantages are typically an overreliance on internal expertise, leading to a slower innovation pace, slow buildup of necessary and sparsely available talent, conflicts with existing R&D priorities, endless proof of concepts without bringing the solution to users at scale, and a tendency for long parallel transformation programs at high costs. One way to overcome them is to apply a methodology based on quarterly value releases. It starts from a specific business or scientific need for which there is a conviction that a digital or analytics solution could deliver value. It aims to bring horizontal building blocks together vertically across teams (such as blueprint, data, analytics, tech, and change management groups) and rigorously deliver value to end users in short 90-day cycles. End users are involved along the way to define the need and cocreate the solution.
Opportunities for applying AI are coming of age now—with growing examples of impact—at a tipping point supported by an explosion of biological data, increasing computational power, next-generation in vitro models, wet-lab automation, and strong initial clinical proof points. Moreover, the next five years will be critical to prove the sustainability of CGT as broadly applicable therapeutic modalities.
For oncology alone, more than 500 assets based on complex modalities are currently in preclinical and clinical development, and as many as 80 could get to market by 2030. Embedding digital and analytics in R&D is crucial to making this a success and to capturing value for patients. AI and advanced analytics are poised to become vital enablers for boosting the return on R&D spending in the CGT value chain by increasing speed, reducing clinical failures, cutting costs across the R&D value chain, and enabling sustainable tech platforms.
Mayank Bhandari is a consultant in McKinsey’s London office, where Alex Devereson is a partner; Amelia Chang is a consultant in the Boston office; Thomas Devenyns is an associate partner in the Geneva office; Alberto Loche is an associate partner in the Zurich office; and Lieven Van der Veken is a senior partner in the Lyon office.

source

Leave a Reply Cancel reply