New chemistry database helps speed up global drug discovery

Developing new medicines can require thousands of chemistry experiments to identify the right recipe for a safe, effective and ideally affordable drug.

The process is slow and labor-intensive, and many of the reactions depend on hard-to-source metals that act as essential catalysts.

While artificial intelligence is helping speed up the process of drug discovery, it can only learn from the data available, and when it comes to chemical reactions, the large, high-quality datasets needed to train powerful AI tools aren’t there.

That’s where Tim Cernak and his team at the University of Michigan College of Pharmacy come in.

They created an open-access database of more than 50,000 carefully designed chemistry experiments, testing thousands of combinations of ingredients and conditions to better understand the reactions that form carbon-nitrogen bonds—essential building blocks of many medicines.

The database is the largest body of chemical reactions data to date and just the start of what could grow into a much larger library of chemical reaction conditions that will feed AI systems, Cernak said.

Building the platform that could pull this off has taken over a decade, but it’s still just scratching the surface.”

Tim Cernak, Associate Professor of Medicinal Chemistry, College of Pharmacy, University of Michigan

The data is freely available to scientists through the Open Reaction Database, a site for sharing reactions.

“We are excited about the discoveries that other scientists can make within this new dataset,” Cernak said. “There’s so much data to mine.”

Giving researchers and AI systems access to more reaction data can help identify promising ways to make medicines more quickly and efficiently. It can also help scientists find alternatives to expensive or hard-to-source catalysts based on precious metals used in drug manufacturing.

“The latest drugs in the pipeline are raising the bar of sophistication for chemical synthesis. At the same time, supply chains for precious metals and other critical reaction components are being exposed as risks,” Cernak said. “Big data drops like this one are going to be needed to build the predictive models that can make better drugs faster.”

In addition to sharing all the data, the study, published in the Journal of the American Chemical Society, compares how different catalysts, specifically palladium, nickel and copper, perform under similar conditions. This is important because palladium is the go-to catalyst for many reactions used in drug synthesis, but the supply of palladium is controlled by just a few countries.

However, the study found that certain reactions performed equally well with nickel, and some even with copper catalysts, which can be sourced all over the planet. The database allows researchers to more easily and quickly compare catalysts and reactions.

“One key takeaway was that large, systematically designed reaction datasets can uncover patterns that are difficult to see from traditional scope studies alone,” Cernak said. “For example, I never would have predicted that the highly reactive intermediate molecules called arynes could form at such low temperatures, but it was hard to ignore when we saw it hundreds of times. This is exciting as a possibility to synthesize drugs without precious metal catalysts.”

Source:
Journal reference:

Das, J., et al. (2026). A 50,688-Reaction Data Set Reveals General Ligands and Mechanistic Diversity in C–N Couplings. Journal of the American Chemical Society. DOI: 10.1021/jacs.6c05959. https://pubs.acs.org/doi/10.1021/jacs.6c05959

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Insilico Medicine to showcase AI-driven drug breakthroughs at BIO 2026 International Convention