In this interview, Dr. Hannes Röst describes how his lab are using diaPASEF to monitor the human proteome throughout a person's lifetime, and how machine learning could be used to analyze the data.
What are the main advantages of data-independent acquisition modes compared to data-dependent acquisition modes?
The main advantage is reproducibility. Data independent acquisition allows you to reproducibly measure individual analyte signals across very large cohorts. This is in contrast to data dependent acquisition, where you use a stochastic method to sample peptides in your analyte pool. This prevents reproducible sampling of the same peptide pool in every single sample.
With data-independent acquisition, you're getting a quantitative answer for a specific analyte in every single sample, and you don't have this stochastic element that introduces missing values in your quantitative data matrix that you often observe with data-dependent acquisition.
What is the timsTOF Pro, and how are you using it to analyze complex biological samples?
The timsTOF Pro is a novel type of instrumentation that is based on a traditional Q-TOF architecture, but also has a trapped ion mobility device in front of the spectrometer. This allows us to do two things: first, it allows us to accumulate ions for a specific amount of time before we send them to analysis, and second, it allows us to separate ions by ion mobility.
This means that the ions get focused into a very narrow band of ions, which boosts sensitivity and provides additional cell activity, because the ions will be separated by their collisional cross-section.
The timsTOF Pro uses the Parallel Accumulation Serial Fragmentation (PASEF) acquisition method, which exploits the trapped ion mobility to get up to a 10-fold increase in sequencing speed. It's based on using the position of the quadrupole and moving it along with the ions.
This increase in sequencing speed is really crucial for complex proteomes because it allows us to go much deeper into complex proteomes and get quantitative answers for very complex samples in a short amount of time.
Your research group focuses on applications of computational mass spectrometry. Please can you tell us more about the Röst lab and the projects that your team is working on?
We are currently working on two main pillars in mass spectrometry. The first pillar is the development of new technology, because we realized that the current technology in mass spectrometry will need improvement before we can address the large scale questions and the type of cohort sizes that we want to address.
We spend a fair amount of time developing new software and new experimental methods in order to increase the reproducibility and the scalability of mass spectrometric methods. At the same time, we take these methods and apply them on the second pillar of our research program, which is focused on personalized medicine.
Here, we are trying to longitudinally track individual patients with a very dense sampling setup, so that we can see how the molecular profiles in patients’ biofluids change over their lifetime, and during time periods of health and disease. With this, we hope to understand what drives the transition from a healthy to a diseased state.
nobeastsofierce | Shutterstock
What is diaPASEF and how are you applying this technique to personalized medicine?
diaPASEF is a new method which incorporates the ion mobility component into data independent acquisition (dia). We are currently using diaPASEF to analyze large patient cohorts or large peturbations, which is required for any sort of systems biology of personalized medicine approach.
If we are able to analyze very large experimental perturbations and quantify analytes in every single perturbation, we can understand how biological systems work, how they react to perturbations, and how signals are processed.
On the other hand, if we apply this to individual patients and patient cohorts we can trace the molecular profile of a single patient over long periods of time, and we can understand how their molecular profile changes over time as their environment and lifestyle changes. Eventually, we will be able to observe the transition from a healthy profile to a disease profile, and diagnose this much earlier.
Why is it important to study these patient cohorts longitudinally?
The longitudinal aspect is key because it allows us to compare individual patients to their past selves. We want to move away from the paradigm of comparing individual patients to a population average, because that population average may not be as informative as comparing your current, diseased self to your past healthy self.
You may have a level of an analyte that is twice as high as it was 10 years ago, which is a bad sign because it means you may be developing a certain disease. This level of the analyte could still be within the population variance, and would therefore be missed by your current doctor.
How can machine learning enhance the fragmentation patterns obtained with diaPASEF, and do you believe that we can do the same with collisional cross-section values?
Yes. Machine learning is a technique that entered proteomics a few years ago. People have been applying deep learning in proteomics and metabolomics with the goal to skip some of the very tedious and labor-intensive steps that we currently have to do for data independent acquisition.
One of these steps is the generation of a spectral or assay library. We now have the first indications that it may be possible to completely predict these spectral libraries, which would allow us to mine dia data much more efficiently and directly, without relying on prior experimental evidence in prior experimental measurements that usually have to be done by data dependent acquisition.
This allows us to disconnect the tedious link where we always need to do data dependent acquisition before we can do data independent acquisition, and we can directly go into the data independent acquisition data.
One of the next frontiers will be predicting collisional cross-section values directly, because this is required to mine the diaPASEF data. From our collaborations with the Max Planck Institute, we have seen that this is possible and we will soon start moving in that direction.
Zapp2Photo | Shutterstock
What are the main challenges in computational mass spectrometry, and how can we overcome those issues?
I think one of the main challenges of computational mass spectrometry at the moment, specifically related to data-independent acquisition, is the sheer volume of data. This has exploded again with the timsTOF Pro, where we have a thousand times more data than we ever had with traditional Q-TOF instruments. This is because each individual TOF scan is now split up into a thousand individual TOF pushes that we can all analyze individually.
This is a big challenge in terms of scaling our algorithm, but also in interpreting that massive amount of data. Currently, we are recording much more data than we can interpret, so we want to move towards leveraging the full amount of data and not only a small subset.
I think the second challenge with data-independent acquisition is because we are using a method that is acquiring so much data, we need to have a way to de-convolute this data and assign individual precursor traces back together with their fragment ion traces. This is something that was not possible before we added ion mobility as a separation technique.
Is the data a thousand times larger because you're using the mass spectrometer in the TOF?
It’s bigger because in all previous approaches we used the individual TOF pushes to average them into one single spectrum, whereas now each TOF push is associated with an ion mobility. We used to take about a thousand individual measurements and then average them into a single measurement, which allowed us to compress the data.
Now, each individual TOF push is associated with an ion mobility, which means we need to keep that information and we cannot simply merge the data anymore. This means that we have a lot more information, but also substantial challenges in the data analysis.
Why do you feel it is important to make the programs that you develop open access?
In our research team, we put a lot of emphasis on developing open-source software. One of the main reasons for that is transparency. We want people to be able to understand and reproduce exactly what happens with their data, which is only possible with open-source software.
Only with open source software you can go and read the code that manipulates your data so that you can understand how you arrived from a raw data file to a specific output quantitative value.
The second reason is to grow the data-independent acquisition community around the software that we are producing. By making the software open and available to developers, it allows us to create a community of developers that take our algorisms and build something on top of it.
Instead of working in our individual silos and trying to protect our research, we are trying to make it as open as possible and available to the community so that other people can use it and build upon it.
How do you think the field of computational mass spectrometry will evolve in the next decade?
At present, we have several challenges to solve in computational mass spectrometry. One of the challenges is the interpretation of the individual data sets needed to obtain the full amount of information recorded by a mass spectrometric instrument.
I also think we need to go into the second dimension of data acquisition, which is the number of samples. We need to be able to do that type of analysis reproducibly, so that the quantitative values that we find for analytes in a single run can be found for every single patient. This is something that we still need completely new algorithms for, in order to get this to work and in order to overcome batch effects.
Watch Hannes' presentation at ASBMB 2019
Where can readers find more information?
- F. Meier, A. Brunner, M. Frank, A. Ha, E. Voytik, S. Kaspar-Schoenefeld, M. Lubeck, O. Raether, R. Aebersold, B. C. Collins, H. L. Röst, M. Mann. (2019). Parallel accumulation – serial fragmentation combined with data-independent acquisition (diaPASEF): Bottom-up proteomics with near optimal ion usage. bioRxiv. DOI: 10.1101/656207.
- Discover the timsTOF Pro
About Hannes Röst
Hannes Röst is the lead investigator of the Röst Lab at the University of Toronto. Röst completed his Ph.D. in the lab of Professor Ruedi Aebersold at ETH Zurich, where he developed novel computational methods to analyze mass spectrometry-based proteomics data. This work allowed researchers to increase the throughput of targeted proteomics experiments by up to 100-fold and increase the number of samples that could be analyzed in a single study.
In his current role, Röst and team develop novel mass spectrometric methods to obtain highly quantitative proteomics and metabolomics data matrices and use these quantitative data to address questions in systems biology and personalized medicine.