Artificial intelligence (AI) tools designed to automatically document patient visits show promise in reducing the time physicians spend on paperwork and may improve their work experience, according to a new randomized clinical trial conducted at UCLA Health.
The study, published in the New England Journal of Medicine AI, examined two commercially available AI scribe applications-Microsoft DAX and Nabla-in real-world clinical practice. Among 238 physicians across 14 specialties and 72,000 patient encounters, researchers found that Nabla users reduced their documentation time by nearly 10% compared to usual care (control group), while both tools showed potential benefits for physician burnout and work-related stress.
Documentation burden has become a major contributor to physician burnout, with doctors often spending two hours on paperwork for every hour of patient care. This is the first randomized trial to rigorously evaluate whether AI scribes deliver on their promise to help address this problem."
Dr. Paul Lukac, lead author, Chief AI Officer at UCLA Health
How the study worked
The research team randomly assigned physicians to use one of two AI scribe tools or continue their usual documentation practices during a two-month period from November 2024 to January 2025. The AI scribes record patient conversations and automatically generate draft clinical notes, which physicians then review and edit.
Physicians using Nabla saw their average time spent writing each note decrease by an estimated 41 seconds (from 4 minutes 30 seconds to 3 minutes 49 seconds) vs 18 seconds (from 4 minutes 22 seconds to 4 minutes 4 seconds) in the control arm. The reduction in the Nabla arm was 9.5% larger than the control arm-a statistically significant result. Those using DAX showed a smaller decrease that did not reach statistical significance compared to the control group.
Crucially, both AI tools showed modest improvements in validated measures of physician burnout, cognitive workload, and work exhaustion, though these findings require confirmation in larger studies. For example, physicians in the Nabla and DAX arms experienced approximately 7% improvement in their burnout scores compared to those in the control arm.
Balancing benefits and concerns
The research also revealed important limitations. Physicians reported that the AI-generated notes "occasionally" contained clinically significant inaccuracies, most commonly omissions of information or pronoun errors. One mild patient safety event was reported during the study.
"This technology requires active physician oversight, not passive acceptance," said senior author Dr. John N. Mafi, a UCLA Health internist. "Our trial revealed that while AI scribes deliver measurable benefits, they occasionally generate clinically significant inaccuracies. Physicians must remain vigilant in reviewing AI-generated documentation. The path forward requires embracing innovation while maintaining medicine's fundamental commitment to patient safety through rigorous evaluation and ongoing monitoring."
Survey responses indicated that physicians found both tools easy to learn and use and felt they enabled better engagement with patients. Patients were generally receptive to the technology, with fewer than 10% declining its use.
Significance for healthcare
Physician burnout affects nearly half of U.S. doctors and contributes to workforce shortages, increased medical errors, and billions of dollars in costs to health systems. Electronic health records, while improving many aspects of care, have added significant documentation demands.
The study provides timely evidence as healthcare systems nationwide rapidly adopt AI scribes, often without rigorous evaluation of their effectiveness or safety.
"By embedding a randomized trial within routine practice, we've provided the kind of high-quality, real-world evidence that should guide decisions about implementing AI in healthcare," Lukac said. "This approach can serve as a model for responsibly evaluating other AI tools as they emerge."
The researchers note that their findings may not apply broadly to all practice settings, as the study was conducted at a single academic medical center during a relatively short time period. They call for longer-term studies across multiple institutions to confirm these findings and to measure the downstream effects on important health outcomes, such as the quality, costs, and experience of care.
Source:
Journal reference:
Lukac, P. J., et al. (2025). Ambient AI Scribes in Clinical Practice: A Randomized Trial. NEJM AI. doi: 10.1056/aioa2501000. https://ai.nejm.org/doi/10.1056/AIoa2501000