Improving virtual screening with a heterogeneous consensus model

This article and associated images are based on a poster originally authored by Lin Chen, Vincent Blay, Pedro J. Ballester and Douglas R. Houston and presented at ELRIG Drug Discovery 2025 in affiliation with University of Edinburgh, University of California at Santa Cruz and Imperial College London.

This poster is being hosted on this website in its raw form, without modifications. It has not undergone peer review but has been reviewed to meet AZoNetwork's editorial quality standards. The information contained is for informational purposes only and should not be considered validated by independent peer assessment. 

Introduction

The discovery of effective therapeutics remains a complex, costly, and time-consuming endeavor. A central bottleneck in early-stage drug discovery is identifying suitable hit compounds with moderate affinity for known biological targets. The research team present SCORCH2, a machine-learning framework to enhance virtual screening performance by using interaction features.

A) Molecular interaction visualization and feature combination. Crystal structure of protein with PDB ID: 1AFK, highlighting interactions with the ligand PAP (3′-Phosphate-Adenosine-5′-Diphosphate). Fuchsia spheres represent atoms in closest contact with the ligand (cutoff within 2.5 Å). Black arrows indicate hydrogen bonds, red dashed lines represent salt bridges, and blue dashed lines denote π–π stacking interactions. SCORCH2 features are primarily generated from three methods: BINANA, ECIF, and RDKit. While BINANA and ECIF extract conformation-sensitive features, RDKit provides features that remain independent of conformational changes. B) SCORCH2 structure and model training. SCORCH2 utilizes a simplified architecture consisting of two XGBoost models. Each model is trained with different data, and Optuna is used for optimal parameter search with appropriate training strategies. C) SCORCH2 inference workflow, for the complex data, these models are designed to operate independently, and the final result is provided by a weighted consensus.

A) Molecular interaction visualization and feature combination. Crystal structure of protein with PDB ID: 1AFK, highlighting interactions with the ligand PAP (3′-Phosphate-Adenosine-5′-Diphosphate). Fuchsia spheres represent atoms in closest contact with the ligand (cutoff within 2.5 Å). Black arrows indicate hydrogen bonds, red dashed lines represent salt bridges, and blue dashed lines denote π–π stacking interactions. SCORCH2 features are primarily generated from three methods: BINANA, ECIF, and RDKit. While BINANA and ECIF extract conformation-sensitive features, RDKit provides features that remain independent of conformational changes. B) SCORCH2 structure and model training. SCORCH2 utilizes a simplified architecture consisting of two XGBoost models. Each model is trained with different data, and Optuna is used for optimal parameter search with appropriate training strategies. C) SCORCH2 inference workflow, for the complex data, these models are designed to operate independently, and the final result is provided by a weighted consensus.

A) Molecular interaction visualization and feature combination. Crystal structure of protein with PDB ID: 1AFK, highlighting interactions with the ligand PAP (3′-Phosphate-Adenosine-5′-Diphosphate). Fuchsia spheres represent atoms in closest contact with the ligand (cutoff within 2.5 Å). Black arrows indicate hydrogen bonds, red dashed lines represent salt bridges, and blue dashed lines denote π–π stacking interactions. SCORCH2 features are primarily generated from three methods: BINANA, ECIF, and RDKit. While BINANA and ECIF extract conformation-sensitive features, RDKit provides features that remain independent of conformational changes. B) SCORCH2 structure and model training. SCORCH2 utilizes a simplified architecture consisting of two XGBoost models. Each model is trained with different data, and Optuna is used for optimal parameter search with appropriate training strategies. C) SCORCH2 inference workflow, for the complex data, these models are designed to operate independently, and the final result is provided by a weighted consensus.

Figure 1. A) Molecular interaction visualization and feature combination. Crystal structure of protein with PDB ID: 1AFK, highlighting interactions with the ligand PAP (3′-Phosphate-Adenosine-5′-Diphosphate). Fuchsia spheres represent atoms in closest contact with the ligand (cutoff within 2.5 Å). Black arrows indicate hydrogen bonds, red dashed lines represent salt bridges, and blue dashed lines denote π–π stacking interactions. SCORCH2 features are primarily generated from three methods: BINANA, ECIF, and RDKit. While BINANA and ECIF extract conformation-sensitive features, RDKit provides features that remain independent of conformational changes. B) SCORCH2 structure and model training. SCORCH2 uses a simplified architecture consisting of two XGBoost models. Each model is trained with different data, and Optuna is used for optimal parameter search with appropriate training strategies. C) SCORCH2 inference workflow, for the complex data, these models are designed to operate independently, and the final result is provided by a weighted consensus. Image Credit: Image courtesy of Lin Chen et al., in partnership with ELRIG (UK) Ltd. 

Methods

SCORCH2 uses two distinct XGBoost1 models trained on separate datasets for heterogeneous consensus scoring. Hyperparameter optimization was conducted using Optuna2, targeting AUCPR maximization for imbalanced datasets.

For the inference, the team used a weighted consensus derived from the maximum-scoring pose. They also introduced the concept of Knowledge Pattern (KP) and demonstrated that incorporating diverse KPs can potentially enhance the overall performance.

Results

SCORCH2 outperforms previous docking and rescoring methods on the full DEKOIS 2.0 benchmark. It also exhibits strong robustness on a subset with unseen targets, highlighting its generalization capability.

Comparative analysis of virtual screening performance using the complete DEKOIS 2.0 benchmark against other methods (Figure 2, panels A–C), as well as on a subset of unseen targets (Figure 2, panels D–F). The Enrichment Factor (EF) measures the model’s ability to prioritize active compounds over random selection

Figure 2. Comparative analysis of virtual screening performance using the complete DEKOIS 2.0 benchmark against other methods (Figure 2, panels A–C), as well as on a subset of unseen targets (Figure 2, panels D–F). The Enrichment Factor (EF) measures the model’s ability to prioritize active compounds over random selection. Image Credit: Image courtesy of Lin Chen et al., in partnership with ELRIG (UK) Ltd.

Model explainability

SHAP-based analysis of two validated Spleen Tyrosine Kinase ligands demonstrates the model’s interpretability and its ability to prioritize critical protein-ligand interactions.

Analysis of molecular interactions and feature importance for two SYK inhibitors using 2D interaction diagrams and SHAP waterfall plots. (A; B) Analysis for CHEMBL3265032: 2D interaction diagram (A) and SHAP waterfall plot (B). (C; D) Analysis for CHEMBL3264995: 2D interaction diagram (C) and SHAP waterfall plot (D). In the SHAP waterfall plots, the color of each bar indicates the value of the corresponding feature (blue for lower values and red for higher values). Features are ranked by their contribution to the model’s prediction, with the most important feature at the top.

Figure 3. Analysis of molecular interactions and feature importance for two SYK inhibitors using 2D interaction diagrams and SHAP waterfall plots. (A; B) Analysis for CHEMBL3265032: 2D interaction diagram (A) and SHAP waterfall plot (B). (C; D) Analysis for CHEMBL3264995: 2D interaction diagram (C) and SHAP waterfall plot (D). In the SHAP waterfall plots, the color of each bar indicates the value of the corresponding feature (blue for lower values and red for higher values). Features are ranked by their contribution to the model’s prediction, with the most important feature at the top. Image Credit: Image courtesy of Lin Chen et al., in partnership with ELRIG (UK) Ltd.

Conclusion

SCORCH2 underscores the importance of modeling general molecular interactions in virtual screening and provides an effective tool for streamlined prediction. Its strong performance demonstrates that tree-based methods remain competitive in specific scenarios.

Acknowledgement

This research was supported by a Medical Research Council programme grant (MR/Y013131/1) and Rosetrees Trust. L.C. was supported by the Darwin Trust, University of Edinburgh.

References

  1. Chen, T. and Guestrin, C. (2016). XGBoost: a Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 1(1), pp.785–794. https://doi.org/10.1145/2939672.2939785.
  2. Akiba, T., et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3292500.3330701.
  3. Bauer, M.R., et al. (2013). Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 – A Public Library of Challenging Docking Benchmark Sets. Journal of Chemical Information and Modeling, 53(6), pp.1447–1462. https://doi.org/10.1021/ci400115b.
  4. Lundberg, S.M., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), pp.56–67. https://doi.org/10.1038/s42256-019-0138-9.

About the University of Edinburgh

Founded in 1583, the University of Edinburgh is one of the oldest universities in the English-speaking world and a major public research university in Scotland.

The university is a member of the Russell Group and other major international research alliances, reflecting its commitment to high-impact research, innovation, and academic excellence.

Among its strengths are the sciences (primarily biological and medical research), humanities, and interdisciplinary fields, with research output that consistently ranks amongst the UK’s top tier.

About ELRIG (UK) Ltd.

The European Laboratory Research & Innovation Group (ELRIG) is a leading European not-for-profit organization that exists to provide outstanding scientific content to the life science community. The foundation of the organization is based on the use and application of automation, robotics and instrumentation in life science laboratories, but over time, we have evolved to respond to the needs of biopharma by developing scientific programmes that focus on cutting-edge research areas that have the potential to revolutionize drug discovery.

Comprised of a global community of over 12,000 life science professionals, participating in our events, whether it be at one of our scientific conferences or one of our networking meetings, will enable any of our community to exchange information, within disciplines and across academic and biopharmaceutical organizations, on an open access basis, as all our events are free-of-charge to attend!

Our values

Our values are to always ensure the highest quality of content and that content will be made readily accessible to all, and that we will always be an inclusive organization, serving a diverse scientific network. In addition, ELRIG will always be a volunteer led organization, run by and for the life sciences community, on a not-for-profit basis.

Our purpose

ELRIG is a company whose purpose is to bring the life science and drug discovery communities together to learn, share, connect, innovate and collaborate, on an open access basis. We achieve this through the provision of world class conferences, networking events, webinars and digital content.


Sponsored Content Policy: News-Medical.net publishes articles and related content that may be derived from sources where we have existing commercial relationships, provided such content adds value to the core editorial ethos of News-Medical.Net which is to educate and inform site visitors interested in medical research, science, medical devices and treatments.

Last Updated: Nov 7, 2025

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    The University of Edinburgh. (2025, November 07). Improving virtual screening with a heterogeneous consensus model. News-Medical. Retrieved on November 13, 2025 from https://www.news-medical.net/health/Improving-virtual-screening-with-a-heterogeneous-consensus-model.aspx.

  • MLA

    The University of Edinburgh. "Improving virtual screening with a heterogeneous consensus model". News-Medical. 13 November 2025. <https://www.news-medical.net/health/Improving-virtual-screening-with-a-heterogeneous-consensus-model.aspx>.

  • Chicago

    The University of Edinburgh. "Improving virtual screening with a heterogeneous consensus model". News-Medical. https://www.news-medical.net/health/Improving-virtual-screening-with-a-heterogeneous-consensus-model.aspx. (accessed November 13, 2025).

  • Harvard

    The University of Edinburgh. 2025. Improving virtual screening with a heterogeneous consensus model. News-Medical, viewed 13 November 2025, https://www.news-medical.net/health/Improving-virtual-screening-with-a-heterogeneous-consensus-model.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.