Researchers identify hidden self-harm histories using machine learning

Important mental health history is often present in medical records but hard to find, especially when it is missing from the diagnosis codes that clinicians, researchers and health systems use to search and count conditions.

A new study led by researchers at The University of New Mexico School of Medicine analyzed electronic health records for more than 1.3 million patients served by the Veterans Health Administration (VHA). Highlighting a common gap in how health systems track self-harm, the researchers found that diagnosis codes captured only about one-fourth of clinically documented self-harm history.

For research and planning, if we only count what is easy to see in diagnosis codes, we may substantially underestimate the need for mental health services. Better measurement can help health systems plan better, help researchers study care more accurately and eventually help clinicians know when a patient may need a closer look."

Christophe Lambert, PhD, professor and interim chief of the Division of Translational Informatics in the UNM School of Medicine's Department of Internal Medicine, and the study's corresponding author

The study, published in the Journal of Medical Internet Research, used a novel machine learning method previously developed by members of the research team. Following expert chart review and statistical calibration, the researchers estimated that documented self-harm was present in about 7.9% of those patients seen by VHA clinicians – more than four times the 1.85% visible through diagnosis codes alone. The gap matters because missed history can affect clinical awareness, research findings and planning for mental health services.

Problem lists – the notations providers compile of their patients' health conditions – showed another visibility gap. They are meant to flag important conditions for clinical teams, but in real-world care they are not always complete or consistently maintained. Among veterans with a diagnosis code for self-harm, 22.6% had self-harm or a history of self-harm listed on their VHA problem list. That means even when self-harm appeared in diagnosis codes, it was often missing from one of the record's most visible summary fields.

Past self-harm is clinically important because it is one of the most important predictors of future self-harm and suicide risk. It can also shape how care is delivered, including how clinicians think about depression, PTSD, bipolar disorder, substance use, traumatic brain injury and other conditions that might occur alongside self-harm.

The authors note that VHA already uses specialized suicide and overdose reporting tools and does not rely only on diagnosis codes or problem lists to monitor suicide risk. This study looked at a different but related question: How much past self-harm history is visible in the parts of the record that researchers, care teams and health systems can most easily quantify and review at scale?

"This is a systems-level visibility problem," Lambert said. "The record can be enormous. In our chart review, some patient records had more than 500,000 lines of notes. No clinician can be expected to read all of that during a normal visit."

The study did not try to predict future self-harm or determine with certainty whether any one patient had self-harmed. Instead, the team tested whether a computer model could use patterns in structured electronic health record data to estimate the probability that self-harm history was present but missing from diagnosis codes, then compare those probabilities with expert review of clinical notes.

To do that, the team used a method called PULSNAR - Positive Unlabeled Learning Selected Not At Random, which was built for messy real-world health data. Most machine learning methods need clear examples of both "yes" and "no" cases. But in medical records, a missing diagnosis code does not prove that a patient never had the condition.

PULSNAR works with that uncertainty. It learns from patients who do have a code, then estimates how many similar patients might be present among those without a code. Its key advantage is that it does not assume coded cases are random and allows for the fact that some cases are more likely to be coded than others.

"Medical records can make self-harm hard to see in more than one way," said Praveen Kumar, PhD, the study's first author. "Sometimes the history is in a clinician's note but not in the diagnosis codes. Other times, the record may contain risk factors, injuries, poisonings, or behaviors that are consistent with self-harm, even though the record alone does not prove what happened or why.

"Our method can help flag both patterns for review. This study could verify the first pattern, because the evidence was already in the notes. The second pattern may be just as important, but confirming it would require talking with patients or using information beyond the medical record."

The research team included experts from the UNM Health Sciences Center, the Raymond G. Murphy Veterans Affairs (VA) Medical Center, Vanderbilt University Medical Center, the VA Tennessee Valley Healthcare System, the VA Office of Mental Health, Greer Black Company, and the UNM Department of Economics. The team brought together expertise in medical informatics, computer science, psychiatry, biomedical informatics, economics, statistics and health services research.

The self-harm study is part of a broader research program using positive-and-unlabeled learning to find conditions that may be under-recorded in standard medical data, the investigators said. The team has already published a related study using this approach to detect under-coded opioid use disorder, and ongoing work is extending it to other conditions where the medical record may not show the full picture, including unrecognized PTSD, depression, bipolar disorder and sleep disorders.

The method could complement broader VHA mental health and suicide-prevention efforts by adding a scalable way to measure conditions that may be under-recorded or hard to see in standard medical data. The investigators emphasized that the method is still a research tool and is not ready to be used by itself in clinical care, although with further development, it could help health systems better estimate under-recorded mental health conditions, find documented history that is not clearly visible, and identify records that may warrant closer review.

"Self-harm history matters too much to stay buried in records that are not practical to review line by line during routine care," Lambert said. "Our work is about helping researchers and health systems find documented history and clinically relevant patterns in the data, so care teams can have a more complete picture of the people they serve."

Source:

University of New Mexico Health Sciences Center

Journal reference:

Kumar, P., et al. (2026). Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Cohort Study. Journal of Medical Internet Research. DOI: 10.2196/89071. https://www.jmir.org/2026/1/e89071

Posted in: Device / Technology News | Medical Condition News | Healthcare News