By Shelley Farrar, MSc, BSc
Proteomics is the large-scale study of proteins expressed by an organism or biological system. It is used to test how proteins are expressed and modified, as well as their function in a particular biological pathway. Advances in the field of proteomics have allowed proteomes to be studied in depth, with high throughput technologies producing long lists of proteins. New technologies are being developed to increase the precision of methods which identify proteins within a sample. The latest advances in bioinformatics have also been applied to proteomics data allowing the output to be analyzed.
The development of data-independent acquisition (DIA) of protein lists
Liquid chromatography-mass spectrometry (LC-MS) is a technique that has been utilized in the past 15 years to detect and quantify lists of proteins by combining the physical separation abilities of liquid chromatography, with the capacity to identify individual components through mass spectrometry.
Image: Pipetting samples into chromatography vials for liquid chromatography mass spectrometry.
Whilst this high throughput method has been the foundation of proteomics studies, the technology has not reached the level of precision necessary to identify all proteins within a biological sample. Nevertheless, data-independent acquisition (DIA) methods may soon reach this goal.
All advanced methods via liquid chromatography-mass spectrometry work by:
- Ionizing the peptides in the mass spectrometer
- Completing a first scan (MS1) where the abundance of ions and their mass-to-charge ratios are measured
- Completing a second scan (MS2) where the detected ions are fragmented so the abundance and mass-to-charge ratios can be recorded
The data-independent acquisition method differs by isolating, fragmenting and analyzing the precursor ions in a single MS2 scan. More precise quantification of peptides is achieved by repeatedly selecting peptides within a specific set of mass ranges, rather than isolating individual peptides.
However, detection of low abundance peptides is difficult because of the dominating abundance of other peptides sampled at the same time. Though data-independent acquisition methods have the potential for increased breadth and precision in their output, further developments in terms of algorithms and software are required to efficiently analyze the data produced.
Advances in proteomics interpretation with IEA annotations
Additional methods of bioinformatics analysis are required to interpret the long list of proteins produced. Protein functions can be predicted through the functional annotation of proteomics data. Gene Ontology (GO) annotations are a commonly applied tool for classifying genes and proteins via a standardized vocabulary that describes their roles in biological systems. GO terms were originally revised by curators so obsolete annotations could be removed and greater depth of biological knowledge could be reflected. Now more than 95% of GO terms are assigned by computational methods with the electronic terms referred to as IEA annotations.
An attempt to quantitatively evaluate IEA annotations and compare their reliability, in comparison to annotations performed by curators, has recently been completed. Experimental annotations were used to test the reliability, coverage and specificity of the IEA annotations and were found to have improved over time in comparison to the curator annotations. Furthermore, the experimental annotations may provide a method of establishing which IEA annotations can be considered the most reliable within a study.
Advances in determining biological networks with the use of new software
The final step for proteomics interpretation requires the determination of the biological process reflected in the data. The protein lists formed can be analyzed for abundances that indicate a certain biological pathway. Various software and programmes have been developed to aid in the visualization of biological processes. In addition, open source software platforms allow for proteomics data visualization to be integrated into apps which the community of proteomics researchers contribute to.
New applications that have been developed can predict gene or protein interactions and form a network from the data. A further advancement in the created software has meant that diverse organism datasets can be applied, constructing networks that represent the interactions between organisms. This may allow for the future ability to analyze proteomics data from organisms without a complete set of genome annotations by utilizing the data from a closely related organism.