An automated process based on computer algorithms that can read text from medical examiners' death certificates can substantially speed up data collection of overdose deaths – which in turn can ensure a more rapid public health response time than the system currently used, new UCLA research finds.
The analysis, to be published Aug. 8 in the peer-reviewed JAMA Network Open, used tools from artificial intelligence to rapidly identify substances that caused overdose deaths.
The overdose crisis in America is the number one cause of death in young adults, but we don't know the actual number of overdose deaths until months after the fact. We also don't know the number of overdoses in our communities, as rapidly released data is only available at the state level, at best. We need systems that get this data out fast and at a local level so public health can respond. Machine learning and natural language processing can help bridge this gap."
Dr David Goodman-Meza, Study Lead Author and Assistant Professor, Medicine, Division of Infectious Diseases, David Geffen School of Medicine, University of California - Los Angeles Health Sciences
As it now stands, overdose data recording involves several steps, beginning with medical examiners and coroners, who determine a cause of death and record suspected drug overdoses on death certificates, including the drugs that caused the death. The certificates, which include unstructured text, are then sent to local jurisdictions or the Centers for Disease Control and Prevention (CDC) which code them according to the International Statistical Classification of Diseases and Related Health Problems, Tenth Edition (ICD-10). This coding process is time consuming as it may be done manually. As a result, there is a substantial lag time between the date of death and the reporting of those deaths, which slows the release of surveillance data. This in turn slows the public health response.
Further complicating matters is that under this system, different drugs with different uses and effects are aggregated under the same code – for instance buprenorphine, a partial opioid used to treat opioid use disorder, and the synthetic opioid fentanyl are listed under the same ICD-10 code.
For this study, the researchers used "natural language processing" (NLP) and machine learning to analyze nearly 35,500 death records for all of 2020 from Connecticut and from 9 U.S. counties: Cook (Illinois); Jefferson (Alabama); Johnson, Denton, Tarrant and Parker (Texas), Milwaukee (Wisconsin), and Los Angeles and San Diego. They examined how combining NLP, which uses computer algorithms to understand text, and machine learning can automate the deciphering of large amounts of data with precision and accuracy.
They found that of the 8,738 overdose deaths recorded that year the most common specific substances were fentanyl (4758, 54%), alcohol (2866, 33%), cocaine (2247, 26%), methamphetamine (1876, 21%), heroin (1613, 18%), prescription opioids (1197, 14%), and any benzodiazepine (1076, 12%). Of these, only the classification for benzodiazepines was suboptimal under this method and the others were perfect or near perfect.
Most recently the CDC released preliminary overdose data that was no sooner than four months after the deaths, Goodman-Meza said.
"If these algorithms are embedded within medical examiner's offices, the time could be reduced to as early as toxicology testing is completed, which could be about three weeks after the death," he said.
The rest of the overdose deaths were due to other substances such as amphetamines, antidepressants, antipsychotics, antihistamines, anticonvulsants, barbiturates, muscle relaxants, and hallucinogensThe researchers note some limitations to the study, the main one being that the system was not tested on less common substances such as anticonvulsants or other designer drugs, so it is unknown if it would work for these. Also, given that the models need to be trained to rely on a large volume of data to make predictions, the system may be unable to detect emerging trends.
But rapid and accurate data are needed to develop and implement interventions to curb overdoses, the researchers write, and "NLP tools such as these should be integrated in data surveillance workflows to increase rapid dissemination of data to the public, researchers, and policy makers."
Study co-authors in addition to Goodman-Meza are Chelsea Shover, Dr. Jesus Medina, Dr. Amber Tang, Steven Shoptaw, and Alex Bui of UCLA.
Goodman-Meza, D., et al. (2022) Development and Validation of Machine Models Using Natural Language Processing to Classify Substances Involved in Overdose Deaths. JAMA Network Open. doi.org/10.1001/jamanetworkopen.2022.25593.