An interview with Mingjie Xie, conducted by James Ives, MPsych Please give an overview of next generation protein sequencing (NGPS) and how it can be used to decipher antibody proteins.
Next-Generation Protein Sequencing or NGPS is the technology to derive the full protein sequences directly from protein samples using mass spectrometry. The goal of the technology is to accurately determine the primary sequence of the protein in the given sample.
Credit: ustas7777777/ Shutterstock.com
Sequencing antibody proteins is one of the best applications for the NGPS technology. Antibodies are a special type of proteins generated by our immune system to protect ourselves against foreign agents.
The immune system works in a fascinating way. It is able to quickly respond to an antigen by generating antibodies specifically bound to it. This is done, in part, by the so called V(D)J recombination mechanism, which results in 'random' sequences in certain areas of the antibody proteins, known as the Complementarity-determining regions, or CDRs.
This randomness makes each antibody a unique one and effectively makes any sequences in any databases untrustworthy. For this reason, NGPS shall be used to determine the heavy and light chain sequences.
The general concept and procedure for our REmAb™ antibody protein sequencing contains four major steps.
Step one, we digest the antibody protein using multiple enzymes. We carefully choose the set of enzymes that have very different cutting rules to increase the digestion site diversity.
Step two, we run the samples through high mass accuracy instrument, in our case, a Thermo Q Exactive, and generate hundreds of thousands of tandem spectra.
Step three, we perform
de novo peptide sequencing using our own Novor engine to convert the spectrum into peptide sequences.
Finally in step four, we assemble those peptide sequences into the full protein sequence.
What is the current the gold standard used to detect antibody proteins and how does it compare to NGPS?
The answer is, it depends. Currently if you have access to the cell line expressing the antibody, the go to method is to use the DNA/RNA sequencing based technologies to indirectly derive the antibody protein sequences.
If you don't have the access to the cell line, the next-generation protein sequencing is the only way to get accurate antibody sequence information.
What are the key advantages of next generation protein sequencing? What disadvantages are there?
When the cell line is accessible, both the DNA sequencing and the NGPS technologies can be used to determine the antibody sequences.
Proteins are the functional molecules in the cells. The next-generation protein sequencing (NGPS) utilizes mass spectrometry technology to directly measure the proteins or polypeptides. Because of this direct measurement, it can detect unexpected variants, post translational modifications and glycosylations etc. Those protein level changes cannot be detected by DNA sequencing technology.
In the antibody sequencing application, because of the 'randomness' of the variable region, especially the CDR region sequences, DNA sequencing technology may not work on some antibodies at all.
Being an emerging technology, NGPS's throughput is still considerably lower than DNA sequencing. The cost had dropped significantly in the past two years, but it is still seen as a barrier for wider adoption when the cell line is accessible.
What services does RapidNovor offer for antibody protein sequencing?
We offer our REmAb™ sequencing service in three different packages.
The REmAb™ basic package will give you accurate heavy and light chain full-length sequences. But there is a caveat; Leucine and Isoleucine have identical mass. It will only be inferred in the basic package. This means clients are encouraged to express multiple forms of the antibody to cover all Leucine/Isoleucine positions in the CDR regions to ensure the success of binding.
The second package is REmAb™ with WILD™ technology. WILD™ stands for W-ion Isoleucine/Leucine Determination. This technology utilizes the most advanced mass spectrometer in the market and can determine Leucine and Isoleucine accurately based on the w-ion observed in the experiments.
With the WILD™ technology, w-ions are used to determine each and every Leucine/Isoleucine position in the variable region of the heavy and light chain sequences, leaving no need to express multiple forms in most cases.
The third package is an all-in-one sequencing plus expression service. Clients can use this service as a one-stop-shop: they will receive recombinant antibody proteins that work just like the original ones.
Please give a brief overview of W-ion Isoleucine Leucine Determination (WILD™), how does it make antibody protein sequencing more specific/accurate?
W-ion Isoleucine/Leucine Determination or WILD™ is the first commercially available service to accurately distinguish the two isomeric amino acids in a high-throughput manner using mass spectrometry.
Although the mechanism that uses w-ions to differentiate the two amino acids apart had been described in scientific literature 30 years ago, it only became somewhat practical in recent years with the advancement of the instrumentation and thus several papers have been published with different experiment protocols to address the problem.
The slowness of the commercial development of this technology is in part due to the lack of necessity in real applications. For example, database searching is one of the most commonly used method to identify proteins.
Non-antibody proteins are mostly quite conservative in term of their primary sequences, they don't change or mutate very often. For this reason, trusting the amino acids in the sequence database makes sense.
But sequencing antibody proteins is a totally different story, due to the presence of the hyper-variable CDR regions. Peptide
de novo sequencing is the ultimate method to figure out the sequences in the CDRs, with only one issue left, the isomeric Leucine and Isoleucine.
Our WILD™ technology filled the last piece of the puzzle. WILD™ technology is developed based on the same mechanism published in the paper, with the experiment and data analysis protocols tuned to be robust and high-throughput.
With WILD™ we now can achieve 100% sequence accuracy without any caveats.
Why is it important to have accurate determination of isomeric amino acids like Isoleucine and Leucine?
Although Isoleucine and Leucine have the same molecular weight, they have different structures. This difference in structure will cause differences in biological activities. In the context of antibodies, Isoleucine and Leucine in the CDR regions may significantly affect the antibodies' binding affinity and specificity.
Prior to the arrival of WILD™, expression of multiple forms of the antibody is generally required to make sure the real one is included. This is not always economical or even practical. The number of forms required to be expressed is exponential compared to the number of Isoleucine/Leucine positions in the six CDRs.
In some unfortunate cases, the cost to express all the permutated forms can be prohibitive. In a recent project we completed, there were 9 Isoleucine/Leucine positions in the CDRs. Without WILD™, the customer would have had to express 512 forms to cover all possibilities.
WILD™ brought this number down to 1. This not only significantly reduced the expression cost, but also saved them large amounts of time and effort so that they could focus on their research.
How accurate is antibody protein sequencing? Does it rely on homology sequences?
This is a great question. Antibody protein sequencing, when done correctly, is very accurate. This has been confirmed by many customers. The recombinant antibodies behave exactly the same as the original ones.
I am glad that you bring up the homology sequence question. This is actually one of the most frequently asked questions by our customers. The answer is NO. We do NOT rely on homology sequences. This is in fact a key differentiation for our REmAb™ sequencing technology.
Relying on homology sequences in the sequencing process is dangerous and may introduce errors, especially in and around the CDR regions. As mentioned earlier, the randomness nature of the sequences in the CDR regions makes all database sequences untrustworthy. We have seen this many times when 'fixing' the sequences initially sequenced by others.
Many subtle differences in the sequences, such as a swap of a few amino acids, may not affect the protein coverage in peptide mapping at all.
To ensure the sequencing accuracy, we have developed a quality score for each amino acid in the antibody protein sequences in our REmAb™ platform. The score is a reflection of the factual evidence we observed from the data.
It tells our scientists in real-time if any part of the antibody sequences may contain errors, or in some other cases less confident due to the insufficient signal in the data. We have also instilled several best practices in our sequencing process to further ensure the quality.
1. Examine individual amino acids
2. Pay attention to the CDRs, especially the heavy chain CDR3
3. Distinguish Leucine and Isoleucine using the w-ion method What applications are there for NGPS? What further applications are there with added WILD™ specificity?
As an emerging new technology, the applications of the next generation protein sequencing are still largely unexplored. We have been investing heavily in marketing to let more scientists know about the existence of the technology. Scientists will invent new ways of using it.
We have rescued many useful antibodies for our customers where the antibodies were initially generated many years ago and the cell lines were lost or no longer traceable.
We have helped customers sequence antibodies that are difficult to sequence using DNA sequencing techniques. We have also helped customers confirm the sequences they already have by blindly and independently sequencing the antibody proteins.
The WILD™ technology helps remove the last caveat of the next generation protein sequencing. It removes uncertainties on the isomeric amino acids assignments and make most of the applications more practical and economical.
What does the future hold for NGPS? What advancements do you hope to see in the near future?
The advancement of the next generation protein sequencing will be in two dimensions: the throughput and the complexity.
On the throughput dimension, the automation in experiments and data analysis will play an important role. We are investing on robotics and automated liquid handling to standardize our mass spec experiments. We are also developing automated sequencing algorithms that are able to reduce the data analysis time from days to minutes.
On the complexity dimension, the ultimate goal is to sequence the polyclonal antibodies from blood. To achieve the goal, significant investment in the research and development of the experiments and algorithms are required.
The next generation protein sequencing, particularly our REmAb™ technology, is ground breaking. This is the first time in the world that scientists can reliably and routinely sequence any given proteins.
This is analogous to the invention of genome sequencing technology, which enabled a whole industry and fundamentally changed the way people conduct biology and health research. Now we are in a similar situation where high-throughput protein sequencing has become possible. We are well-positioned to make this change to the scientific community.
Where can readers find more information?
www.rapidnovor.com, is a great resource for information related to the next generation protein sequencing, particularly in the antibody application. About Mingjie Xie
Mr. Mingjie Xie, MSc, MBA, is the co-founder and CEO of Rapid Novor Inc. He is a computer scientist by training andreceived his MSc degree from Western University in the field of bioinformatics.
He received his MBA degree from Richard Ivey School of Business to pursue his interests in business. Prior to co-founding Rapid Novor Inc, Mingjie is the COO of a bioinformatics software company.