Although genomic data provides a significant amount of information regarding the molecular machinery of life cycles and cellular processes, details of gene expression and gene function are reflected more so by the presence (or absence) of ribonucleic acid (RNA) and proteins.
Consequently, modern studies of systems-biology depend on four pivotal “omics” approaches: genomics for studying deoxyribonucleic acid (DNA), transcriptomics for studying RNA, proteomics for studying proteins, and metabolomics for studying metabolites or small molecules.
Rost9 | Shutterstock
Metaproteomics was originally defined as exhaustive characterization of the complete protein aggregates found in environmental microbiota at a certain point in time.
By applying metaproteomics to a wide array of microbial consortia during the past decade, researchers have gained an insight into the key functional traits of different environmental microorganisms.
Background and rationale of metaproteomics
Structure and function of microbial community
The analysis of metaproteome datasets provides information about the structure, function, and dynamics of microbial communities, which is crucial for improved understanding of microbial recruiting, nutrient resource competition, metabolic activity, and defense systems distribution across the community.
Initial successes with microbial isolates resulted in an increased interest to extend and adapt the methodology for more complex samples. This information is paramount for characterizing host/microbe interactions, such as bacterial/human interfaces (with the eminent example of the human gut microflora).
Prerequisites for analysis
Technological prerequisites for proteomic analyses include the capacity to deal with complex mixtures, high-throughput processing, broad dynamic range, very sensitive protein/peptide detection, precise mass measurements, and propensity to structurally distinguish peptide sequences. Mass spectrometry has become the dominant platform for basically all proteomic measurements.
Experimental approach for complex samples
Proteomic analyses and measurements are done using several approaches based on mass spectrometry – all of which focus on the unequivocal identification of the assortment of proteins or peptides present in a given sample.
Successful metaproteome measurement hinges on three elements: efficient extraction of proteins from an environmental sample, separation of proteins or peptides prior to their detection, and finally, high-throughput clear-cut identification of proteins and peptides. Two factors are necessary for any extensive proteome analysis: effective separation of peptides/proteins, followed by unequivocal detection.
There are two fundamental types of proteomic measurement strategies that combine liquid chromatography with mass spectrometry: top-down and bottom-up.
The top-down protocol is conceptually simple: whole proteins are separated by means of liquid chromatography (exploiting charge and/or hydrophobicity), and then directly analyzed directly by (tandem) mass spectrometry. Conversely, bottom-up (or shotgun) proteomics interrogates the samples with additional processing and analysis steps that vastly expand the potential for deep proteomic measurements.
The shotgun technique first uses trypsin to digest proteins to peptides, followed by chromatographic separation and subsequent analysis by mass spectrometry or tandem mass spectrometry. The resulting fragmentation generates a type of barcode that uniquely characterizes a peptide.
The true power of bottom-up approach in metaproteomics is evidenced by both cultured and uncultured microbial isolates, and in more recently, by complex studies of environmental microbial communities that have established metagenomes.
The role of bioinformatics
The caliber of metaproteomic data is intricately linked to the quality of the analysis. Hence, employing high-throughput and multidimensional measurements needed in proteomic research clearly requires robust and often novel bioinformatic approaches for converting raw spectral data in peptide sequence details, identifying the proteins corresponding to each peptide spectrum.
A predicted protein database built from metagenomic information is indispensable for proper assignment of peptide sequence data (as inferred from mass spectrometry-derived fragmentation patterns) to the corresponding proteins.
De novo algorithms (also known as de novo sequencing) may also be used to ascertain the sequence of a certain peptide directly from the information supplied in its tandem mass spectra.
The field of metaproteome bioinformatics is composed of an array of computation operations, such as protein database inquiry and raw mass spectra filtering, data mining, graphical representation and data mining.
As microbes are overwhelmingly present on earth (but often ignored as a result of their microscopic size), metaproteomics has great potential to unravel the information of microbial ecosystems in humans and different ecological niches.