The Proteoform Puzzle: Unlocking the Next Frontier

Thought LeadersLloyd M. SmithProfessor of ChemistryUniversity of Wisconsin-Madison

In this interview, Lloyd M. Smith, the recipient of 2025's Ralph N. Adams Award in Bioanalytical Chemistry, discusses proteoforms, an area of research worthy of the next Human Genome Project.

When did you first become interested in science, and what was your journey to where you are today? 

I grew up in Berkeley, surrounded by science from a young age—my mother was a mathematician, and my father a physicist. With both parents in academia, I was immersed in a scientific environment early on. Still, like many kids, I didn’t feel a strong connection to any one field at the time.

It wasn’t until college that I started gravitating toward science. I noticed that the courses I found most interesting always seemed to be in that realm. One thing I’ve always appreciated about science is its grounding in evidence. In the humanities, debates can go on endlessly, but in science, there's often a definitive answer—that clarity really appealed to me.

When it came time to choose a major, I landed on biochemistry. I was enjoying chemistry and found it engaging, so it felt like a natural fit. Later, somewhat unexpectedly, I realized I liked physics, a subject I’d initially avoided, perhaps because it was my father’s field. That led to an interesting situation: I was a biochemistry major who genuinely enjoyed physics.

At the same time, I was already doing research in the chemistry department, so I ended up with an interdisciplinary foundation. When I applied to graduate school, I chose biophysics to bring those threads together. I joined the biophysics program at Stanford, though I once again found myself working out of the chemistry department.

For my postdoc, I initially planned to focus on cell biology, a field I’d become interested in through earlier exposure. But plans shifted, and I ended up working on the development of an automated DNA sequencer project that turned out to be incredibly rewarding. It brought together many of the skills I had picked up along the way: synthetic chemistry as an undergrad, along with fluorescence, optics, lasers, and electronics from grad school. All of it came into play and was crucial to the project's success.

That project eventually opened the door to my first academic position, but the path there wasn’t easy. I spent two years on the job market. The first year was especially tough—neither I nor the hiring committees were quite sure how to define my expertise. I worked with DNA, so I figured I was a biochemist. And while the biochemistry department invited me to an interview, none offered a position.

Eventually, people started suggesting analytical chemistry, a field I hadn’t seriously considered. My only experience with it had been an undergrad class I didn’t find particularly memorable. But during my job search, the analytical chemistry community, especially at the University of Wisconsin—was incredibly open and welcoming. They saw that I was tackling complex biological problems with a strong physical sciences background, and they appreciated that perspective. It turned out to be an excellent match, and that’s how I ended up as an analytical chemist at Wisconsin. 

When did proteomics and proteoforms become part of your career? 

I spent about 10 to 15 years focused on DNA sequencing, which was a great fit at the time. My postdoc work had already established me in the field, so securing funding was relatively smooth. It was a fascinating area to work in, but over time—within the electrophoresis framework—I started to feel like I’d explored the most interesting and engaging aspects.

I also noticed a shift in my mindset. When people proposed new ideas, I often found myself thinking, “I’ve already considered that—it won’t work.” That kind of reaction was a red flag for me. It signaled that I was becoming stagnant and that it was time for something new.

Around that point, I became interested in mass spectrometry, particularly as a potential alternative to electrophoresis in DNA sequencing. The idea of replacing electrophoresis with mass spectrometry was exciting, and that transition opened up a whole new set of challenges and learning opportunities. While we ultimately didn’t solve the DNA sequencing problem with mass spec, the process gave me a strong technical foundation in the field.

Much like my experience with DNA sequencing, my enthusiasm for using mass spectrometry in that specific context eventually started to wane. But before stepping away, I realized that many of the techniques I’d developed could be applied to proteomics. That led us to start working in the proteomics space, moving from MALDI to electrospray ionization.

This shift was particularly exciting because we ended up developing a charge reduction approach that caused electrospray ionization spectra to resemble those generated by MALDI—a surprising and intriguing result. That discovery drew us deeper into proteomics and eventually into traditional bottom-up approaches and the broader field.

Since then, I’ve been on an ongoing learning curve, diving deeper into proteomics and proteoforms and continuing to explore how mass spectrometry can uncover new biological insights.

There are two main approaches in proteomics: bottom-up and top-down. Can you explain the differences between the two and why top-down might be more beneficial when studying proteoforms? 

Bottom-up proteomics is the standard approach—probably more than 95 % of the field uses it. It is a well-developed and robust technique.

In bottom-up proteomics, you take a protein or a mixture of proteins, digest them into peptides using an enzyme, and then analyze and identify those peptides using liquid chromatography and mass spectrometry. This method is powerful, widely used, and allows researchers to identify and quantify peptides in complex mixtures.

Bottom-up proteomics does not provide information at the proteoform level. A proteoform refers to the intact protein, including any modifications or variations that distinguish it from other forms of the same protein. To obtain that level of detail, you need top-down proteomics.

Top-down proteomics follows the same general workflow but analyzes the entire protein without breaking it down into peptides. This approach is much more challenging than working with peptides, but the data it provides is incredibly valuable.

There is still a lot of room for development in the field, which makes it an exciting area to explore. More importantly, I believe that understanding proteoforms, knowing exactly what molecules you are working with, is essential for truly comprehending biological systems.

Can you explain what a proteoform family is and its biological significance? 

Let me start by explaining where the concept of a proteoform family came from. I had been exploring the idea of analyzing entire proteoforms by measuring their intact masses, without fragmenting them into smaller pieces.

The advantage of this approach is its simplicity and speed—you’re just measuring a single mass. The tradeoff, of course, is that you lose detailed molecular information about what exactly that mass represents. We’re still working to understand in which context this method is most effective.

Chain of amino acid or bio molecules called protein - 3d illustration

Image Credit: Christoph Burgstedt/Shutterstock.com

Our first tests applied this approach to the yeast proteome. However, when we analyzed the data, we found we weren’t getting as many confident identifications as we had hoped. That’s when Mike Shortreed came up with a key insight in the lab. As he was looking at the data, he noticed that some unidentified masses were offset from known proteoforms by amounts corresponding to known post-translational modifications (PTMs).

If we had a proteoform with a confirmed identity and another molecule with a mass shifted by, say, the mass of a phosphorylation, we could reasonably infer that the second molecule was a modified version of the same protein. We began calling these Experimental-Theoretical (ET) pairs—a known proteoform paired with a related one predicted based on a theoretical mass shift.

Mike pushed this idea even further. He realized that even if we didn’t have a theoretical match for a proteoform, we could still detect relationships between experimental observations by looking at known PTM mass shifts.

These became our Experimental-Experimental (EE) pairs—molecules connected purely by observed mass differences. Using Cytoscape, a network visualization tool, we assembled these relationships into clusters we called proteoform families.

This approach significantly expanded the number of proteoforms we could connect and interpret. And conceptually, I’ve come to really appreciate it. It offers a more gene-centric view of proteomics. Traditionally, we say each gene makes a protein—but the definition of a “protein” is a bit fuzzy. Instead, we can think of each gene giving rise to a set of proteoforms—like a family of related molecules. Just as a family has parents, children, and cousins, a gene produces various forms of a protein through processes like alternative splicing or post-translational modification.

This framework helps simplify how we think about biological complexity. I like to envision around 20,000 proteoform families—one for each gene in the human genome. Each family contains the different proteoforms derived from that gene.

If we want to truly understand biological systems, we need to measure how these families and their members respond to different conditions, environments, or perturbations.

Some members of these proteoform families have implications for diseases, including heart disease and COVID-19. Could you share some examples of how proteoforms are involved in these conditions? 

A couple of examples come to mind. One is in cardiac biology. My colleague Ying Ge, who also works in top-down proteomics, has studied cardiac troponins—specifically troponin A. She’s shown that in diseased hearts compared to healthy ones, there are distinct differences in the phosphorylation states of these proteoforms.

That’s just scratching the surface, though. In biology—and science more broadly—there’s always the ongoing question of correlation versus causation.

One way to frame this is through the lens of biomarkers. If a specific phosphorylated proteoform can be consistently detected in blood and reliably indicates the presence of heart disease, it could serve as a diagnostic marker. However, proving clinical utility takes time and rigorous validation.

The other possibility is that these proteoform differences are not just correlated with disease but actually causative. If that’s the case, then understanding the mechanisms that drive those changes could open up opportunities for intervention, perhaps even with small-molecule therapeutics.

Another compelling example came up during the COVID-19 pandemic. While working from home, I started exploring new research directions and found that COVID provided a striking case for the relevance of proteoforms. There’s an enzyme involved in the innate immune response that plays a role in fighting off viral infections. Genetic variations in the population result in different proteoforms of this enzyme.

One of these proteoforms includes a membrane-spanning domain, which allows it to anchor into the membrane and function effectively. The other, shorter proteoform, lacks this domain and fails to localize properly. As a result, individuals who express only the truncated form essentially lack this arm of the immune response, which can lead to more severe outcomes from COVID-19.

What’s especially interesting is how different scientific communities interpret this phenomenon. A geneticist might focus on it purely as a variant in the genome without emphasizing the proteoform implications. A bottom-up proteomics researcher might describe it as a post-translational event—perhaps a truncation. In the top-down or proteoform-centric view, we see it as a distinct proteoform, with functional consequences tied to its structural differences.

These interpretations aren’t in conflict—they’re just different perspectives on the same underlying biology, shaped by the lens of each discipline.

You have been involved in a proposal for the Human Proteoform Project. Can you tell me more about that and what it aims to achieve? 

I was heavily involved in the Human Genome Project because the instrument I developed during my postdoc ended up being the key tool used in the sequencing efforts. That project supported a lot of my research, and I served several committees that helped oversee its progress. Even at the time, it felt like a well-organized initiative—and in hindsight, it’s clear how effectively it was structured and executed.

One of the main reasons for its success, in my view, was its foundation on multiple pillars, one of the most important of which was technological development. When the genome project began, the early sequencing instruments were quite basic. But as funding ramped up and commercial interest grew, we saw major leaps in performance.

The National Human Genome Research Institute (NHGRI) played a central role in this by specifically funding technology-focused projects, which spurred rapid innovation in sequencing techniques.

Pittcon Thought Leader: Lloyd M. Smith on the Future of Proteomics

Alongside that, there was a strong execution pillar: actually, sequencing the genome. What made this so effective was the interplay between development and implementation. New technologies were stress-tested in real sequencing environments, and the practical challenges of large-scale genome sequencing helped push the technology forward.

That model—pairing technological innovation with ambitious, large-scale execution—is exactly what we’re hoping to bring to the Human Proteoform Project. The goal is to generate the same kind of excitement and momentum around proteoforms that the genome project achieved for DNA. We want to see government agencies and funders support this effort on a scale.

Right now, mass spectrometry is the primary tool for proteoform analysis, and continued, incremental improvement is essential. However, we also need to encourage more radical thinking.

A great example from the genome world is nanopore sequencing. I remember being on review panels for some of the earliest nanopore proposals—at the time, they seemed highly speculative. It took about two decades for the concept to mature into the robust, widely adopted technology it is today. But now, nanopore sequencing has dramatically changed how certain genomic analyses are done.

That’s the mindset we need for proteoforms, creating space for bold, high-risk ideas that may take time but could eventually reshape the field. If we invest in many early-stage projects and accept that not all will succeed, we give ourselves a chance to discover game-changing tools and approaches that could redefine how we study the proteome.

What work is your lab currently doing to contribute to the development of these new technologies? 

Most of our efforts to improve proteoform analysis right now are focused on the data analysis side. If you think about the typical workflow, we’re still operating squarely within the mass spectrometry framework. While I find nanopore sequencing fascinating, I feel like I’m a bit late to that game—many groups are already deeply invested in that space. I haven’t yet come up with a new technology for proteoform-level analysis that sits outside of mass spectrometry.

So, within the mass spec world, I tend to think of the process in three parts: before the mass spectrometer, the instrument itself, and after the mass spectrometer.

Before using the instrument, you’ll need sample preparation and separation techniques. There’s definitely room for improvement there, but most of the progress tends to be incremental. As for the instrument itself, these machines are incredibly sophisticated. Companies like Thermo Fisher and Bruker have teams of brilliant engineers who are constantly pushing the boundaries of what the hardware can do.

But after the mass spectrometer? That’s where things get really interesting. The raw data that comes from these instruments is highly complex, and there’s still a huge amount of valuable information hidden in it. Extracting and interpreting that information is where a significant portion of my group—about a third to half—is focused.

It’s a particularly exciting time to be working in this space, especially with the emergence of AI. If you think back, the Human Genome Project was powered in large part by advances in computing. In the 1980s, bioinformatics was still in its infancy compared to where it is now. I see the rise of AI as a similar inflection point. We’re already seeing its potential, but I believe we’re only beginning to understand how transformative it could be.

AI has the potential to unlock entirely new ways of analyzing proteoform data, which makes this moment so promising for the field.

You mentioned trying to get the government’s attention for funding. What role does the private sector play in this field? 

The private sector has shown strong interest in this space. Companies have correctly recognized that bottom-up proteomics already represents a large, well-established market, and they’re actively looking for ways to either take over or disrupt that space with new technologies.

A lot of these efforts are clearly inspired by what happened in DNA sequencing. Early on, the first human genome was sequenced using electrophoresis-based methods, but what really accelerated the field was the transition to next-generation sequencing (NGS). That leap involved innovations like combining array-based platforms with fluorescence-based sequencing—millions of sequencing reactions happening simultaneously on a chip, with high-resolution imaging capturing the results.

So now, it’s natural for people to ask: Can we do something similar for proteins? It’s not a far-fetched idea at all, and several groups are working toward that goal. Ed Marcotte was one of the first researchers I saw exploring this space, though there may have been others before him. Since then, a number of companies have entered the scene with similar concepts—trying to apply array-based, high-throughput strategies to proteomics.

The challenge for me, though, is that these technologies—at least in their current form—don’t capture proteoforms. They often focus on detecting peptides or protein presence but not the full molecular complexity of intact proteoforms, including post-translational modifications and sequence variants. And for those of us focused on understanding proteins at the proteoform level, that’s a critical gap.

What do you think the next 10 years will look like for proteoforms? 

There’s still a lot of room to grow with mass spectrometry. When it comes to top-down proteomics, I believe we could realistically improve its capabilities by a factor of 10 over the next decade.

Plenty of incremental advances—like better separation techniques—could help push us in that direction. In the near term, I expect mass spectrometry to remain the dominant tool for proteoform research.

That said, I don’t believe mass spectrometry is the endgame. It reminds me a bit of electrophoresis-based sequencing—reliable and highly effective in its time but eventually replaced by newer, more scalable technologies. I find nanopores particularly interesting in this context. I’m not sure if they’ll be able to capture all post-translational modifications—there are just so many, and they’re so diverse—but I do think nanopore-based approaches are going to have a significant impact on protein analysis.

I often think about how the Human Genome Project unfolded. The first full genome sequence gave us a foundational reference, and then the field began to shift—from discovery mode to scoring mode. The focus moved toward identifying and quantifying what we already knew existed and doing it faster and more efficiently.

I think proteoform research will follow a similar path. Mass spectrometry, with continued innovation and support, could provide that foundational proteoform map. Once that’s in place, other technologies—like nanopores or array-based systems—could step in to make proteoform analysis far more scalable and accessible.

It’s about building the groundwork now so that future tools can stand on it.

What does the future hold for you? 

There are a few areas I really enjoy working in right now. First, I’m very interested in the proteoform space—I want to keep pushing forward in this area. Second, I’m looking into dehydroamino acids, which we’ve discovered in Alzheimer’s disease. I want to follow up on this.

Third, I’m getting more interested in epitranscriptomics. I think many of the tools we’ve developed for proteoforms—our software, separation techniques, and mass spectrometry approaches—can also be applied to RNA. And that’s an important, largely unexplored area with a lot of unknowns.

About Lloyd M. Smith

Professor Smith is recognized for his impacts across a spectrum of analytical methods. With Leroy Hood he conceived and developed automated DNA sequencing. He has been a leader in developing biomolecular array technology for lectins, DNA, and RNA with both assays and technical uses such as DNA computing and RNA-mediated gene assembly. In the area of mass spectrometry he has been innovative in protein analysis, coining the term proteoform, and developing advances in ionization including a method to reduce charge states. He also commercialized several of his innovations and made software such as the search engine MetaMorpheus available for other researchers.

About Pittcon

Pittcon is the world’s largest annual premier conference and exposition on laboratory science. Pittcon attracts more than 16,000 attendees from industry, academia and government from over 90 countries worldwide.

Their mission is to sponsor and sustain educational and charitable activities for the advancement and benefit of scientific endeavor.

Pittcon’s target audience is not just “analytical chemists,” but all laboratory scientists — anyone who identifies, quantifies, analyzes or tests the chemical or biological properties of compounds or molecules, or who manages these laboratory scientists.

Having grown beyond its roots in analytical chemistry and spectroscopy, Pittcon has evolved into an event that now also serves a diverse constituency encompassing life sciences, pharmaceutical discovery and QA, food safety, environmental, bioterrorism and cannabis/psychedelics. 


Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Pittcon. (2025, July 10). The Proteoform Puzzle: Unlocking the Next Frontier. News-Medical. Retrieved on July 10, 2025 from https://www.news-medical.net/news/20250710/The-Proteoform-Puzzle-Unlocking-the-Next-Frontier.aspx.

  • MLA

    Pittcon. "The Proteoform Puzzle: Unlocking the Next Frontier". News-Medical. 10 July 2025. <https://www.news-medical.net/news/20250710/The-Proteoform-Puzzle-Unlocking-the-Next-Frontier.aspx>.

  • Chicago

    Pittcon. "The Proteoform Puzzle: Unlocking the Next Frontier". News-Medical. https://www.news-medical.net/news/20250710/The-Proteoform-Puzzle-Unlocking-the-Next-Frontier.aspx. (accessed July 10, 2025).

  • Harvard

    Pittcon. 2025. The Proteoform Puzzle: Unlocking the Next Frontier. News-Medical, viewed 10 July 2025, https://www.news-medical.net/news/20250710/The-Proteoform-Puzzle-Unlocking-the-Next-Frontier.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Pittcon addresses synthetic psychedelics crisis; Boston 2025 to advance collaboration