An interview with Prof. Neil Kelleher, conducted by Alina Shrourou, BSc
It has been announced that you will be doing a talk as part of the “Structural Mass Spectrometry and Top Down Proteomics of Proteoforms and Their Complexes” symposia at Pittcon 2018. Please can you outline the project you are working on and will be discussing during your talk?
I’m Neil Kelleher and I am a Professor at Northwestern University. I'm giving an award talk at Pittcon 2018 in Orlando, Florida, as a recipient of Pittcon’s “Advances in Measurement Science Lectureship Awards”. This is due to my commitment to help form a community called The Consortium for Top-Down Proteomics.
Our mission is to advance the measurement of human proteins with greater precision, to bring to the world the benefits of absolute molecular specificity when it comes to interrogating proteins at the molecular level. Hence, the full expression of this vision is to sequence the human proteome, and that's what the Cell-Based Human Proteome Project is about. It brings us into a really interesting, open, and provocative conversation about science and technology.
The Cell-Based Human Proteome Project is something I proposed in 2012 and have been advancing since then with support from the Consortium for Top Down Proteomics and the Paul G. Allen Frontiers Program. The proposal was to map 250,000 proteoforms in 4,000 different cell types.
We have already determined the human genome. What is the importance in understanding proteins at the same level?
The genome is the blueprint. Now 20 years later, we know there's about 20,000 human genes. They create millions of different molecules in all the different cell types.
When you start talking about disease mechanisms, the precision with which we understand the biology driving disease is related to the precision with which we analysed the proteins involved. So to is our ability to detect and treat diverse disease types and sub-types.
Biologists would agree that proteins are the mediators of much of what we call a disease phenotype. For example, looking at the outward expression of cancer cells growing in someone's organs - that phenotype is a combination of the genes and the oncogenes driving the cancer, what type of cancer it is, how to defeat it and shrink the tumor – all of these things involve proteins which make up a specific disease phenotype for a particular individual. One must fully understand the proteins in order to understand and treat the disease in attempts save that person.
What is a proteoform and how are they used in the field of proteomics?
A proteoform is the exact molecular composition of a protein molecule, and it is the unit of currency that the proteomic community is beginning to measure and share. It can be composed of several sources of variation that make biology so enigmatic and difficult to pin down.
Proteins vary due to the number of processes that can occur to them, including polymorphisms, mutations, alternative splicing, isoforms and post-translational modifications. One simple way to describe all of this variation is the term "proteoform". Everybody in proteomics is picking up the word – it’s very much becoming less of just a “word” and more of a “movement” in proteomics.
The Cell- Based Human Proteome Project | Neil Kelleher | TEDxNorthwesternU
How can a protein be measured to determine its proteoforms?
The measurement strategy goes right to the heart of measurement science, and we are advocating for a new platform to improve proteomics: call it “Proteomics 2.0”.
We embrace the idea that instead of inferring proteoforms from the bottom-up, we measure them directly, using top-down mass spectrometry. This is the idea of weighing the whole protein first and then degrading it once you have mapped what proteoforms exist directly at the proteoform level.
The measurement approach that you use really impacts how you view protein diversity. The field has largely been driven by bottom-up proteomics. Therefore it's been enlightening to use a new approach in order to investigate what's happening in the regulation of protein-based biology.
How is science being held back by not knowing all forms of a protein?
Basic biomedical research is all about increasing precision, about the molecules of life. As you have hypotheses about some mechanistic aspects of cell biology, by knowing the proteins precisely and having the reference catalogue of human proteoforms, this reference would be as enabling in the long run, as the human genome has been from genetics to clinical medicine.
The other thing that you would get is more measurements per dollar. If you have the reference list of known proteoforms and cell types, you can expanding the types of reagents that you could create and buy. Economies of scale, volume and cost drops, would definitely follow this kind of project.
Overall, it's about elevating the efficiency of biomedical research now and finding higher value protein-based markers of disease. Current processes for biomedical research, such as drug development, are highly inefficient, and would become more efficient if we knew all of the possible proteoforms.
Please outline the information you will be presenting in your session titled “Mass Spectrometry” at Pittcon 2018.
It will be to review the Cell-Based Human Proteome Project, to frame it for those who don't know about it, highlighting the vision of "A billion proteoforms at $1 each". I will then go deeper into the nature of the project and the motivations for it. I will also discuss the kind of measurement science used in the project, top-down proteomics, with the art and science of measuring proteoforms becoming a major focal point.
Something that has happened this year that has greatly de-risked the Cell-Based Human Proteome Project and made it much more feasible, is the Human Cell Atlas – a group funded by the Chan Zuckerberg Biohub, whose mission is to regularize and categorize all the types of human cells.
With this, I feel like the tide might be turning in favor of this idea that we should map and sequence all the different proteins in all the different cell types. The problem was we didn't have a map of what all the cell types were but that's now being addressed. This will be my story to tell at Pittcon 2018.
At Pittcon 2018, there will be all the best vendors producing equipment for proteomics research. That includes not only mass spectrometry and chromatography, but also antibodies and other complementary technology that would be stimulated by the project.
How is Mass Spectrometry involved with weighing proteins in the Cell-Based Proteome project?
It's hard to envision another technology besides mass spectrometry that can precisely measure the atom composition of protein molecules before they are known. To directly map and sequence proteoforms, you have to directly analyze the whole protein – which is only possible using whole protein, or top-down, mass spectrometry.
However, it is important to note that it is the style of performing mass spec and the way that samples are handled which makes it possible to sequence proteoforms. This method is called top-down mass spectrometry or top-down proteomics. That's the only way we currently have to discover proteoforms. Once they're known, you have all sorts of single-cell, single-molecule technology that could be used.
Please outline the top-down strategy for analyzing proteins.
It's in the name – first you weigh the whole protein at what we call the MS1 level, then from there you measure its components, at the MS2 level.
I sometimes think of the word protein as fiction - if you have 10 different proteoforms that make it up, then what do you mean by the protein? That's why we want to catalogue those proteoforms precisely, so we know exactly what that protein is.
For a given protein, you must imagine spreading the components out. Say there are 10 signals, which all weigh differently or have different atom compositions. Then you would isolate one of them in the gas state, in the mass spectrometer. In addition, you can separate them in the condensed phase using chromatography or electrophoresis before mass spectrometry. There's room for a lot of innovation there.
Once you have the proteoform in its pure form, even if only for a microsecond inside of a mass spectrometer, you then it fragment it into all of the pieces that serve as a fingerprint - the MS2 level. You can have hundreds of fragment ions that are produced from a proteoform, and that allows you to identify, which of the 20,300 human genes produced that proteoform and exactly what it is.
How many proteoforms have been discovered so far? Please provide an example of how these have advanced science.There are two modes of top-down, there's the denatured mode and the native mode. The native, newer mode, allows good coverage of very high mass proteins and even whole protein complexes. Most of proteomics right now is the denatured mode top-down, but I will argue at Pittcon that native mode has a lot of upsides, and that we should be developing more technologies for top-down proteomics discovery in native mode.
Everyone thinks there are more proteoforms than there are. It is easy to think that, because of how many modifications can be possible on proteins, the variety in mass, how it's scaled (exponentially), and more. You create all these potential proteoforms in a computer but how many does biology actually make? What I'm trying to do is to catalogue them and show that they can be mapped, even high-mass, highly-complicated proteins, and that they exist in a limited number of proteoforms.
There are a growing number of cases, but one particular example was found in heart disease and the protein ApoC-III. In this protein there were four proteoforms. One of which was glycosylated and correlated very closely to peoples HDL-C levels (the good cholesterol). It is from here that we are able to investigate further into whether specific proteoforms indicate risk – for example, someone’s risk of heart attack? Although more research is needed here to take the next step, the basic assertion that proteoform mapping and sequencing will lead to deep functional insight, is proving to be true.
It is in microbiology, where bacteria is easy to grow and experiment with, where top-down proteoforms have excelled in creating clarity. There's a case of mapping of 25 proteoforms in a bacteria, and only a few of them had a certain post-translational modification, meaning that those proteoforms were in the membrane of the bacteria.
In another example, researchers investigated the bacteria that causes meningitis. The authors mapped 20-30 forms of proteins called pilin, and they had an unusual post-translational modification on them. The more of that post-translational modification they had, the more infectious, and the more pathogenic, the more virulent the organism was. This gives us a huge insight into better understanding disease and therefore developing treatments.
Proteomics 2.0 from AZoNetwork on Vimeo.
What does the future hold for protein based drugs?
Protein-based drugs are one of the major driving markets that's creating more interest in top-down methods. For example, there's a drug that treats multiple sclerosis and it's a protein-based drug. They mapped 138 proteoforms of that drug as it aged on the shelf.
During drug development, it seems inefficient and even illogical to me to digest it into hundreds of pieces to then do your analysis, as in the bottom-up approach. I understand that was the only way we could do things previously, but there are many uncertainties which arise from this method, due to the effects of oxidation or deamidation. So then it makes you question, "Was it my method or was it the drug?"
For these reasons, I think that top-down has a huge potential for protein based drugs. If you want to know the precise molecular composition of a protein, the way to do it is top-down.
What is making researchers reluctant to adopting this method?
In the past five years, the industry and the state of technology has really changed. Previously, you had to have a custom solution for each individual research goal; however now there are suitably available commercial solutions.
Currently, the main reluctance towards top-down is due to the ease of bottom-up, as this is the method that people are used to performing. However, if only a minority of people are doing top-down, others don't pay attention to it, and therefore it becomes difficult to reduce it to practice and make it easier, so that other people adopt it.
Sometimes, I find advances in technology are like a see-saw. All the weight is on one side but as soon as you can see it lifting, at some point, there'll be a change that's more swift. At some point, there'll be a critical mass on the other side of the see-saw and things will change more rapidly in favour of top-down methods.
It is part of my goal to boost the awareness of top-down proteomics and the benefits it brings. I want to tell people that if you have the individual drugs that you want to characterize, individual proteins, it's achievable through top-down proteomics.
How do you see the top-down approach developing the world of proteomics?
You have two sides to this. One extreme says that all proteomics by 2030, will be top-down. On the other side, you have people that don’t think top-down will ever be capable of performing full, deep proteomics discovery and so bottom-up will just be the dominant approach.
I fall in the middle. The see-saw is currently at an equilibrium for me. Bottom-up is useful if you just want to profile the proteome. But if you want to get into regulatory switches, and you really want to be precise about which proteoform you're dealing with, top-down has to be involved.
If you want to do bottom-up proteomics but you already know what proteoforms are there, and because you have a reference list, you can make much better use of bottom-up data. That’s why, with the Cell-Based Human Proteome Project, I see a much more complementary role than some. However, the top-down still has to yet get that moment where it's value is recognized widely and therefore naturally elevated in the proteomics community. That will take a bit more time, no doubt.
Where can readers find more information?
We published a paper as a consortium in 2013, and that paper now has over 300 citations. It can be found here: https://www.nature.com/articles/nmeth.2369
You can also find more information on the Consortium for Top-Down Proteomics website: http://www.topdownproteomics.org/
For more information on the Human Proteome Project, you can also visit my website: http://www.kelleher.northwestern.edu/human-proteome-project/
About Neil L. Kelleher
Neil L. Kelleher, PhD is the Walter and Mary Glass Professor of Molecular Biosciences and Professor of Chemistry in the Weinberg College of Arts and Sciences. He also is director of the Proteomics Center of Excellence and a member of the Robert H. Lurie Comprehensive Cancer Center of Northwestern University. His research is focused in the areas of top down proteomics, natural products discovery and cancer biology.
Dr Kelleher has been successful in driving both technology development and applications of very high performance mass spectrometry. He has over 300 publications, with an H-factor of 60. One example of his impact on technology is ProSight software, now used by over 1000 labs around the world.
Dr. Kelleher’s research has focused on combining proteomics and metabolomics in innovative ways to provide a deterministic platform to feed compounds from the natural world to pharmaceutical pipelines. Over the past decade, he has led the discovery of projects for over two dozen new natural products and their biosynthetic gene clusters.
Recently, Kelleher has worked with other co-founders of Microbial Pharmaceuticals to establish a new approach to study natural products, metabologenomics. He has also managed the launch of the leading search engine ProSight for top-down proteomics data analysis.
His outstanding contributions to the fields of proteomics and natural products chemistry have been recognized by multiple awards, including the Biemann Medal from the American Society for Mass Spectrometry, the Pfizer Award in Enzyme Chemistry from the American Chemical Society, the Presidential Early Career Award in Science and Engineering, the Camille Dreyfus Teacher-Scholar Award, a Sloan Fellowship, a Packard Fellowship, and an NSF CAREER award.