As artificial intelligence (AI) becomes more common in health care, from managing records to assisting with medication decisions, researchers at the Icahn School of Medicine at Mount Sinai are asking an important question: How well does AI hold up when the workload gets intense at health system scale?

A new study, published in the March 9 online issue of npj Health Systems [https://doi.org/10.1038/s44401-026-00077-0], suggests that the answer depends less on the AI itself and more on how it's designed.

The investigators found that health care AI systems work far better when tasks are distributed among multiple specialized AI "agents"-software systems that can perform complex tasks, learn, and adapt-rather than relying on a single, all-purpose agent. This multi-agent approach kept performance steady even as demands increased, while dramatically reducing computing costs and delays, say the investigators.

For health care organizations, our findings point to a smarter way to use AI. By assigning different tasks, such as finding patient information, extracting data, or checking medication doses, to specialized AI agents, systems can run faster and more reliably while keeping costs under control. Ultimately, this kind of design could help health care teams spend less time on administrative work and more time focusing on patients." Girish N. Nadkarni, MD, MPH, senior study author, Barbara T. Murphy Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine, and Chief AI Officer of the Mount Sinai Health System

As part of the study, the researchers compared two approaches to clinical AI: a single system responsible for handling many different clinical tasks, and a coordinated network of specialized AI agents overseen by a central "orchestrator." Using state-of-the-art language models, the team evaluated performance across common clinical functions, including information retrieval, data extraction, and medication dosing calculations-under simulated real-world conditions involving up to 80 simultaneous tasks.

"What we found is that AI systems behave a lot like people," says study lead author Eyal Klang, MD, formerly with the Icahn School of Medicine. "When you ask one system to do too many different things at once, performance suffers. But when one orchestrator agent divides the work among specialized agents, the system stays accurate, responsive, and far more efficient, even under heavy demand."

The coordinated multi-agent system maintained superior accuracy levels while using far fewer computing resources, up to 65 times fewer, than a single-agent design. The study simulated real clinical "traffic," where many types of tasks arrive at once and compete for attention, the investigators say.

"Our findings show that smart coordination is not just a technical preference," Dr. Klang says. "It can make the difference between an AI system that continues to function smoothly and one that begins to break down when it is exposed to the pressures of real clinical workloads."

Next, the research team plans to test these coordinated AI systems directly in clinical settings, using real-time patient data. If successful, this approach could help shape how hospitals and health systems scale AI in the future, helping them handle peak workloads without sacrificing quality or safety.

The researchers emphasize that the gains are not automatic: even sophisticated AI can fall short when systems are poorly designed or implemented. "Health care does not operate one task at a time," Dr. Nadkarni says. "Hospitals face constant, overlapping demands, especially during busy periods. Our findings show that the future of health care AI is not a single super-intelligent system, but a coordinated team of focused agents that work together to scale safely, control costs, and support real clinical operations."

"When a single agent handles everything, you can't trace where it went wrong. With the orchestrator, every step is logged, which tool was called, what it returned, and how the answer was assembled. At 80 simultaneous tasks, the single agent dropped to 16 percent accuracy while burning 65 times more compute-and you'd have no way to figure out why. That kind of transparency isn't optional in medicine," says second author Mahmud Omar, MD, a visiting researcher in the Windreich Department. "This matters more now than ever-agentic AI is no longer a research concept. Tools like OpenAI's operator mode, Claude's Cowork, and similar platforms are putting autonomous agents directly in the hands of clinicians and patients. As that adoption accelerates, the architecture behind these systems has to be auditable from the start."

The paper is titled "Orchestrated multi agents sustain accuracy under clinical‑scale workloads compared to a single agent."

The study's authors, as listed in the journal, are Eyal Klang, Mahmud Omar, Ganesh Raut, Reem Agbareia, Prem Timsina, Robert Freeman, Lisa Stump, Alexander Charney, Benjamin S. Glicksberg, Girish N. Nadkarni.

This work was supported in part by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463.