Using machine learning to predict C. difficile infection risk

Researchers have developed a series of machine learning models that can predict a patient’s risk of infection by Clostridium difficile, a gastrointestinal pathogen responsible for thousands of healthcare acquired infections (HAIs) each year.

Credit: Katryna Kon/Shutterstock.com

Clostridium difficile (C. difficile), a gut-infecting bacterium, is responsible for the deaths of around 30,000 Americans each year. Mainstream antibiotics are largely ineffective at combating the aggressive bacteria, and can even eliminate the good bacteria that help protect against it.

The “machine learning” models were created by researchers from Massachusetts Institute of Technology (MIT), the University of Michigan (U-M), and Massachusetts General Hospital (MGH). They are custom made for individual institutions and can help make early predictions about the risk of patients becoming infected with C. difficile.

Despite substantial efforts to prevent C. difficile infection and to institute early treatment upon diagnosis, rates of infection continue to increase. We need better tools to identify the highest risk patients so that we can target both prevention and treatment interventions to reduce further transmission and improve patient outcomes."

Dr Erica Shenoy, Harvard Medical School.

The researchers used “big data” as part of their process, assessing EHRs (electronic health records) to make predictions about C. difficile risk during patients’ stay in hospital. They then used this data to create institution-specific models that suited various EHR systems, patient populations and factors unique to different institutions.

Dr Jenna Wiens, from The University of Michigan commented: "When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR can lead to differences in the underlying data distributions and ultimately to poor performance of such a model. To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution."

The team analyzed de-identified data from the EHRs of just under 257,000 patients from either Michigan Medicine or MGH over two and six years respectively, using the model.

This included the likeliness of being exposed to C. difficile, details of their admission and daily hospitalization, and individual patient medical history and demographics. Daily risk scores were produced for each patient that determines that a patient is at high risk when the scores exceed the set limit.

For half of the infected patients, the models were able to accurately predict patient risk five days prior to  diagnostic samples being taken, which allowed early intervention using antimicrobial drugs.

If further studies are successful, this risk prediction scoring may contribute to increased early screening for C. difficile. Earlier diagnosis and treatment can lessen the severity of the illness, while confirmed cases can be isolated to avoid spreading of the disease.

The algorithm code is free for people to review and tailor to their own institutions. Shenoy commented that facilities that want to apply such algorithms should confirm the effectiveness of models in their own institutions and gather suitable local subject-matter experts.

This represents a potentially significant advance in our ability to identify and ultimately act to prevent infection with C. difficile. The ability to identify patients at greatest risk could allow us to focus expensive and potentially limited prevention methods on those who would gain the greatest potential benefit."

Dr Vincent Young, University of Michigan

Source:

https://www.eurekalert.org/pub_releases/2018-03/mgh-mlm032218.php

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
You might also like... ×
Dynamics of healthcare-associated SARS-CoV-2 infections revealed with whole genome sequencing