Drug development is a costly and often fruitless process. Many of the drugs initially investigated are not brought to clinical trial, and many after that are not approved by the FDA or other similar boards.
Machine learning plays an active part in sorting through the immense amounts of biochemical data accumulated from high throughput sequencing techniques. It makes drug discovery more efficient.
Image Credit: Zolnierek / Shutterstock
Machine learning is the process by which a computer can learn without being programmed. First coined in 1959, machine learning has since become a viable reality. Computers are fed an algorithm they apply to analyze and learn from data.
The computer will then make a decision or prediction about relevant data. It utilizes neural networks, which are algorithms that act similarly to the human brain in that they take inputs, process it and provide an output.
At its core, machine learning involves making a line of best fit in several dimensions to provide the most optimal solution.
There are several types of machine learning, such as supervised machine learning, unsupervised learning and reinforcement learning.
Supervised machine learning is when data is fed to the computer, but this data includes the answer to the problem for each set of data. From this, the algorithm can learn and make future predictions on new data sets.
Unsupervised learning would include data which does not have an output. There would be no answer to the data that is fed to the algorithm initially, but the algorithm can make decisions about which parts of the data are more similar to each other.
Reinforcement learning is more circular. An action takes place in an environment, which leads to a reward and a representation of the state, which feedback to the action.
Perturbagens are small-molecule compounds, inhibitory RNAs or other compounds disrupting intracellular processes. Machine-vision methods can be employed to analyze these, in which features in images can be calculated and recognized by a machine learning algorithm.
Pattern-unmixing methods attempt to account for continuous relocation events inside a cell. Using machine learning, this is done by assessing the portion that is present in each of the subcellular locations.
Once this has been analyzed, learning generative models can be used to create new cellular models. New images are synthesized based on the images used to train the model. This has been applied to HeLa cells in both two and three dimensions. This allows for the study of perturbagen effects and cellular changes as a result of diseases and drugs.
Active learning methods
An active machine learning system has a method for building a predictive model from available data and a method to utilize that model to decide future data collection. The system can, therefore, choose data points to be collected and added to the existing data.
One of the main problems with drug discovery is tracking what a compound effects, so it will affect the target without altering others. Currently, researchers make use of known signal pathways, insight, and intuition.
Machine learning can help streamline the selection of experiments by building statistical models of three-dimensional space to more carefully choose relevant ones.
Applications of machine learning in drug discovery
Several companies are collaborating to save money and effort on unsuccessful drug development endeavors. Pharmaceutical giant Pfizer has now employed an IBM Watson system which uses machine learning to aid its search for immune-oncology drugs.
Similar to this, Genetech of the Roche Group, another pharmaceutical giant, are collaborating to use machine learning on biomedical data. They will also primarily focus on immune-oncology, by using GNS Healthcare’s causal machine learning platform. The aim is to discover and validate potential novel drug candidates.
Genetech and GNS also plan to investigate genetic response markers, to hopefully develop targeted therapies. This principle, which is generally referred to as personalized medicine, is seen to be the future of medicine. It is believed that machine learning can help determine what genes and genetic markers are working together with a potential treatment.
Machine learning and potential causes for concern
While machine learning is an amazing technological advance, it can also be slightly worrying. At the end of the day, machine learning relies on a viable and representative dataset for the model to be built on. This makes it susceptible to a so-called “garbage in, garbage out” syndrome, where biased inputs can cause unrepresentative outputs.
Furthermore, the specific math behind how machine learning actually does what it does is not fully understood. While a computer scientist can tweak the parameters of the neural networks inside the machine learning system, they can not fully explain the behavior of the model.
Therefore, the machine learning system can give a prediction, but not give a reason for that prediction. This can create an undesired need for a leap of faith when deciding on critical signal pathways and experiments in drug development.