What Are Proteoforms and Why Do They Matter?
How Are Proteoforms Generated and Regulated in Cells?
How Are Proteoforms Studied Using Proteomics Technologies?
What Is the Biological and Clinical Significance of Proteoform Diversity?
What Challenges and Future Directions Exist in Proteoform Research?
References
Further Reading
Proteoforms represent the diverse molecular forms of proteins produced from a single gene through genetic variation, alternative splicing, and post-translational modifications. Understanding proteoform diversity reveals how protein structure and modification states shape cellular function, disease mechanisms, and biomarker discovery.
Image credit: Christoph Burgstedt/Shutterstock.com
For decades, scientists assumed that each gene produces just one stable protein. Modern proteomic investigations suggest otherwise. Recent studies have found that proteins exist in multiple forms. Researchers collectively term these forms ‘proteoforms’. A proteoform refers to a specific molecular form of a protein produced from a single gene that differs in amino acid sequence and/or post-translational modifications.1,6
Proteoforms differ from genes, transcripts, and protein isoforms. While genes provide the blueprint for protein synthesis, messenger ribonucleic acid (mRNA) transcripts serve as intermediate molecules during protein synthesis. Differences in amino acid sequences give rise to protein isoforms. However, the term proteoform is broader and encompasses all distinct molecular forms of a protein arising from a single gene, including sequence variants, splice variants, and post-translational modifications.2
These proteoforms can arise through several biological mechanisms. Before translation, cells can generate different RNA transcripts through alternative splicing. In this process, different gene segments (exons) can be combined in several ways to produce protein variants. During translation, the use of different starting sequences can generate variants with structural alterations. After translation, a large protein molecule can either be cleaved into multiple smaller fragments or undergo chemical modifications. Post-translational modifications such as phosphorylation, glycosylation, ubiquitination, and acetylation can substantially alter protein structure, localization, stability, and biological activity.1,2,3
Scientists have discovered nearly 20,000 protein-coding genes in the human genome. Interestingly, research estimates point that there may be thousands to millions of proteoforms in the body. Estimates suggest that human cells may contain hundreds of thousands to several million proteoforms generated through combinations of sequence variants, splice isoforms, and post-translational modifications.3,4
Proteoform analysis on the Nautilus Proteome Analysis Platform
Video credit: NautilusBiotechnology/Youtube.com
Cells can undergo several changes during transcription, translation, and even after translation. These changes can give rise to multiple variants of a single protein. Alternative splicing can vary transcription initiation sites and modify the amino acid sequences at the N-terminal. These changes can influence protein localization and function.3,5
During translation, mechanisms such as alternative translation initiation (ATI), ribosomal frameshifting, and stop-codon read-through can alter the terminal sequences to extend protein length. Following translation, certain enzymes may cleave proteins. Functional groups such as phosphoryl, acetyl, or glycosyl groups can also be added to generate new variants. These post-translational modifications may occur individually or in combination, creating large populations of closely related proteoforms from the same gene product.2,3
Cells can also increase or decrease proteoforms under stress. For example, during hypoxia or heat shock, interactions between proteoforms can activate protective mechanisms in the cells to maintain stability of their membranes. Because proteoforms integrate genetic variation, transcript variation, and protein modification states, many researchers consider them the most functionally relevant units of the proteome.1,7
Need to save this for later? Download your free PDF version by clicking here.
Scientists primarily identify proteins using mass spectrometry (MS). For analysis, they use either a bottom-up or a top-down approach. In the bottom-up method, investigators cleave proteins using proteolytic enzymes before analysis. This technique allows them to quantify several proteins simultaneously. However, it can be difficult to detect closely related proteoforms, as protein fragmentation can disrupt intracellular linkages. This peptide-based strategy can obscure the relationship between multiple modifications occurring on the same intact protein molecule.4-6
In such cases, scientists prefer the top-down approach. In this method, researchers can analyze intact proteins without altering their sequences or modifications. The top-down approach is more accurate but has lower throughput and remains technically challenging for complex samples. Top-down proteomics enables direct characterization of intact proteoforms and simultaneously identifies sequence variants and post-translational modifications within the same molecule.4-6
A particular challenge in proteoform detection is distinguishing similar variants that may differ by a single mutation or a minor post-translational modification. Moreover, proteoforms do not exist uniformly in nature. While some proteoforms may be present in large quantities, others may be present in trace amounts. To address these concerns, researchers have begun using high-resolution experimental platforms, such as Fourier transform ion cyclotron resonance (FT-ICR), and improved protein separation techniques, such as two-dimensional gel electrophoresis–liquid chromatography–MS (2DE-LC/MS). These advanced technologies improve the detection ability of traditional analyses to distinguish between structurally similar proteins.1,4,6
Researchers rarely find proteins in a single form. The diverse variants or proteoforms can interact with cellular components in different ways. Cells can adapt their activity in response to environmental signals through proteoform interactions. In this manner, proteoforms can regulate cell activity and behavior. A popular example is the tumor suppressor protein p53. Scientists have identified several proteoforms of p53. The specific post-translational modifications on p53 proteoforms determine a cell’s fate. Whether a cell will undergo repair, cell-cycle arrest, or apoptosis depends on the proteoforms it encloses.1,2,7
Image credit: Bacsica/Shutterstock.com
In cancer, tumor-specific splice variants of proteins can influence treatment responses. As an example, alterations in the v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 (ERBB2) gene can generate proteoforms of the human epidermal growth factor receptor 2 (HER2). These proteoforms can alter how cells respond to breast cancer treatment. Similarly, researchers have found that a variant of the cluster of differentiation 19 (CD19) antigen may confer resistance to chimeric antigen receptor (CAR)-based T cell therapy in acute lymphoblastic leukemia.1,2
Studying proteins at the proteoform level may also improve disease detection. Traditional diagnostic tests are designed to measure the total amount of protein in a given sample. Proteomic tools further this capability by detecting specific protein variants that may increase disease risk. For example, scientists have found certain glycosylated proteoforms of prostate-specific antigen (PSA) that are more specific indicators of prostate cancer than the total PSA concentration.1,2
In neurodegenerative diseases such as Alzheimer’s disease, proteoforms of amyloid-β and tau proteins differ in their neurotoxicity. Likewise, in Parkinson's disease, alpha-synuclein exists in different forms. Certain variants of the protein can increase the accumulation of α-synuclein within the brain. These findings suggest that disease states can change both the structure and abundance of proteoforms in living tissues. Therefore, proteoform-level analysis may help identify new biomarkers and therapeutic targets. Well-known examples include glycated hemoglobin (HbA1c) for monitoring long-term glucose control and hemoglobin S (HbS), which increases the risk of sickle cell anemia.1,2,7
Despite rapid advancements, several technical and analytical challenges persist. A major limitation is the characterization of large, highly modified proteins (>30–70 kDa). Higher-resolution techniques are required, as highly abundant variants may mask low-level ones. Also, overlapping fragmentation spectra can complicate the assignment of post-translational modifications.1,4
Another challenge is interpreting complex MS datasets using computational bioinformatics tools. These tools must improve discrimination between true proteoforms and experimental artifacts, while controlling false discovery rates. It is also essential to improve the standardization of proteoform databases, nomenclature, and annotations to enable consistent interpretation across studies and facilitate data sharing among researchers.1,4
Technological advancements are making proteoform research increasingly feasible. For example, capillary electrophoresis and multidimensional chromatography are allowing researchers to investigate the diverse proteoforms of a protein with greater precision. At the same time, researchers are increasingly combining proteomics with genomics and transcriptomics. When applied together, these methods can characterize various aspects of proteins, enhancing our understanding of how genetic variations shape proteoform signatures in biological systems. In parallel, artificial intelligence–based structural prediction tools such as AlphaFold 3 and D-I-TASSER are being developed to accelerate proteoform discovery and clinical translation to make precision medicine a near reality.1,4
References
- Fang, Z., Zhang, Y., Feng, X., Li, N., Chen, L., & Zhan, X. (2025). Proteoformics: Current status and future perspectives. Journal of Proteomics, 321, 105524. DOI:10.1016/j.jprot.2025.105524, https://www.sciencedirect.com/science/article/pii/S1874391925001514
- Forgrave, L. M., Wang, M., Yang, D., & DeMarco, M. L. (2021). Proteoforms and their expanding role in laboratory medicine. Practical Laboratory Medicine, 28, e00260. DOI:10.1016/j.plabm.2021.e00260, https://www.sciencedirect.com/science/article/pii/S2352551721000603
- Smith, L. M. (2022). Proteoforms and Proteoform Families: Past, Present, and Future. Methods in Molecular Biology (Clifton, N.J.), 2500, 1. DOI:10.1007/978-1-0716-2325-1_1, https://link.springer.com/protocol/10.1007/978-1-0716-2325-1_1
- Po, A., & Eyers, C. E. (2023). Top-Down Proteomics and the Challenges of True Proteoform Characterization. Journal of Proteome Research, 22(12), 3663. DOI:10.1021/acs.jproteome.3c00416, https://pubs.acs.org/doi/10.1021/acs.jproteome.3c00416
- Su, T., Hollas, M. A., Fellers, R. T., & Kelleher, N. L. (2023). Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics. Annual Review of Biomedical Data Science, 6, 357. DOI:10.1146/annurev-biodatasci-020722-044021, https://www.annualreviews.org/content/journals/10.1146/annurev-biodatasci-020722-044021
- Schaffer, L. V. et al. (2019). Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics, 19(10), e1800361. DOI:10.1002/pmic.201800361, https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/pmic.201800361
- Smith, L. M. et al. (2021). The Human Proteoform Project: Defining the human proteome. Science Advances, 7(46), eabk0734. DOI:10.1126/sciadv.abk0734, https://www.science.org/doi/10.1126/sciadv.abk0734
Further Reading
Last Updated: Mar 9, 2026