Bioinformatics is a growing field as the load of data being produced and analyzed increases. Next-generation sequencing techniques notoriously create a lot of data, but also answer a lot of questions. To help facilitate the enormous amounts of data, cloud computing and cloud-based technologies can make complex genetic, protein, and other life sciences data manageable.
Image Credit: Panumas Nikhomkhai / Shutterstock
Cloud Computing and Applying Analysis
Among the main problems in modern research is the size of datasets. Surveying gene regulation, proteomes, and so-called brain atlas projects have demanded high computational power, in both storage capacity and computational capacity.
The basis of cloud computing, on which cloud-based technologies are run, allows users to carry out computationally intensive tasks by using virtual computers, often over the internet. These computers can be used for storing, or for running cloud-based technologies such as an analysis program.
To handle large amounts of information, cloud computing divides a task into smaller subtasks which are capable of being carried out on several processors in tandem. This reduces processing time. Central to cloud computing is the virtual machine, which can have the necessary software for an analysis enclosed within it.
Cloud-based technologies can generally be subdivided into four categories: Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). DaaS is the accessibility of data on demand from several public data sets, such as GenBank. SaaS are software solutions and tools to analyze datasets. Analysis is done over the internet, thereby removing the need to install programs and making updates more easily managed. SaaS tools include sequence mapping, alignment, and expression and sequence analysis as well as other genetic applications.
PaaS allows researchers to develop cloud applications. Because it uses cloud computing, the computer resources automatically scale to the demand, making development easier since resource demand does not need to be known beforehand. IaaS aims to offer full infrastructure over the internet, including hardware and software. This flexible approach allows users to pay for the exact amount of computing power or software they need on virtual machines.
Benefits of Cloud-Based Technologies for Research
Although high throughput techniques have been used for a while, cloud computing and cloud-based technologies make these techniques more accessible. They are more flexible, as the desired computational processing power is essentially rented without needing to install and manage any of the infrastructure. By removing the need to finance computational infrastructure locally, as well as maintenance of that computing infrastructure and personnel, cloud-based technologies are highly economical.
Using cloud-based solutions has one benefit that is often overlooked: its ability to easily share data. Data can be transferred and shared instantaneously between collaborating researchers. In addition, the availability of the cloud computers whenever necessary means users can immediately apply analyses.
Applications of Cloud-Based Technologies in Research
In recent decades, bioinformatics has become significantly focused on how to efficiently and correctly analyze genome data. Cloud computing-based applications typically deal with high throughput sequence analysis. CloudBLAST was among the first cloud-based technologies to solve such sequence analysis problems, with more technologies being launched since then.
Cloud-based technologies target a range of genomic applications, such as sequence alignment (e.g. CloudCoffee), short read mapping (e.g. CloudAligner), SNP identification (e.g. Crossbow), genome annotation, and RNA differential expression analysis.
In drug research, understanding molecular and protein interactions is important. To study entire proteomes, their receptor sites and their interactions with ligands, one high performance computing approach is the cloud computing-based application called Cloud-PLBS. It makes use of two existing frameworks, called SMAP and Hadoop, to store the data over several virtual computers and to process it by sorting it into web maps using a map procedure. This allows comparison of the numerous protein-ligand binding sites in the experiment.