A recent Scientific Reports study provides next-generation sequencing (NGS) guidelines for antibody discovery.
Study: Insights into next-generation sequencing guided antibody selection strategies. Image Credit: motorolka / Shutterstock.com
One of the most common technologies used to generate antibody lead candidates is the in vitro display method. Herein, appropriate libraries are used to select antibodies with the required properties for therapeutic applications. During a selection campaign, a selective pressure or target concentration technique is applied.
A recent study indicated that an antibody library equipped with sequential in vitro phage and yeast display can identify drug-like leads with ideal binding affinities and developable properties. This library successfully identified 31 anti-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibodies in less than one month. Some antibodies exhibited the potency to neutralize live viruses with high affinities and outstanding biophysical properties.
Although colony picking is an effective method to select therapeutic antibody candidates in a relatively short period, it is associated with an inherent bias towards more abundant clones in the selection process. High throughput picking campaigns rarely process through clonal dominance.
The NGS method unveiled a nonlinear association between diversity and sequencing depth. This technique has demonstrated that more substantial sequencing reads are required for marginal diversity gains in selection campaigns.
It is important to understand the degree to which increased diversity is genuine and not a consequence of sequencing error. Likewise, it is imperative to consider whether NGS heuristics, computational tools, and machine learning (ML) can differentiate between functional and artifactual clones.
Long-read sequencing overcomes the traditional limitations of early NGS platforms based on short reads of single domains or complementarity-determining regions (CDRs). ML has been used in antibody discovery and molecular engineering, during which researchers have used this technology for the prediction of antigen binders from in silico libraries, elucidating vital functional representations of B-cell receptors (BCRs), and identification of molecular structures for developable properties.
There are two approaches for ML applications, including supervised and unsupervised approaches. ML algorithms have been developed to minimize the loss of predicted labels or values that enable accurate prediction of experimental data.
About the study
The current study evaluated whether the ML method and heuristics can be applied to NGS datasets resulting from in vitro discovery campaigns to aid in lead prioritization for antibody discovery. The current study addressed some vital questions regarding the use of NGS in discovery campaigns.
All questions were based on the context of SARS-CoV-2 infection. The main objective of this study was to determine broad principles applicable to all selection campaigns.
A total of three selection campaigns were conducted using the single chain variable fragment (scFv) Gen3 semi-synthetic library platform against the S1 monomer and receptor binding domain (RBD) of SARS-CoV-2. Antibodies were selected using biotinylated proteins using two rounds of phage scFv, followed by yeast display. Proteins that bound with all three targets were selected and subjected to NGS using 5′ and 3′ in-line NGS barcodes.
Two antigen concentrations for three target antigens were analyzed. Furthermore, random colonies based on one nanomolar (nM) sorted populations for the three targets were sequenced using Sanger sequencing.
Researchers explored whether NGS-identified clone abundance was associated with the random screening findings. The selection outputs followed a power law; if NGS-derived frequency was considered as ground truth, the Sanger clones must appear at the upper-frequency threshold in the NGS population. NGS clone abundance was also highly linked with random screening.
Incorporating NGS into the discovery campaign enabled the isolation of over 30 antibodies with affinities below 100 pM. A greater epitope diversity was observed among those identified by NGS, thus highlighting the importance of NGS during the discovery campaign in identifying antibody properties linked to binding affinity and epitope diversity.
The number of reads required to obtain desirable antibody diversity was estimated. For example, 1,000 unique HCDR3s upon repeat selection for three targets at 10 nM and one nM affinity would require 215-402 thousand sequence reads for each target population.
Relative abundance and fold enrichment could be used to distinguish between binder and non-binder antibodies. When antibodies were analyzed following a single 10 to one nM selective step, a weak to moderate association was found between affinity, abundance, and enrichment.
The current study presents a simple approach to identifying antibodies in the population by selecting the top clone for every cluster. This enables the segregation of binders according to paratope diversity and minimizes the number of aberrant sequences. ML algorithms were used to classify antibodies as binders or non-binders and enhance correlations to affinities.
AbScan within the AbXtract module was developed based on amino acid chemical properties for an unbiased clustering approach.
The current study highlights the benefits of NGS in antibody discovery. NGS data is beneficial for assigning selected antibodies to HCDR3 clusters, as this approach offers additional epitopic and paratopic diversity. Thus, NGS data can provide important insights and recommendations for effective therapeutic antibody discovery.
- Erasmus, M. F., Ferrara, F., D’Angelo, S., et al. (2023) Insights into next generation sequencing guided antibody selection strategies. Scientific Reports 13(1);1-16. doi:10.1038/s41598-023-45538-w