Machine learning models to identify the simplest way to screen for lung cancer have been developed by researchers from UCL and the University of Cambridge, bringing personalized screening one step closer.
The model was found to be as good or better at predicting an individual's risk of getting lung cancer within five years compared to the best risk models available, and was able to do so using just a quarter of the information needed. The findings are published in PLOS Medicine.
Lung cancer is the most common cause of cancer death worldwide, with poor survival in the absence of early detection. It is estimated that there were 1.8 million lung cancer deaths globally in 2020.
Screening for lung cancer amongst those at high-risk could reduce lung cancer-specific mortality by 20-24% amongst those screened, but the ideal way to determine if someone is high-risk remains uncertain and existing approaches are resource intensive.
The UK is currently planning a national screening programme for lung cancer, which will include people aged 55-74 who have ever smoked, using a risk model based on 17 questions. This information is complex and time-consuming to gather and will require a 50-100 person-strong call centre to collect the data from one million people.
In this study, researchers from UCL and the University of Cambridge used data from the UK Biobank and US National Lung Screening Trial to develop models to simplify the prediction of a person getting lung cancer within the next five years.
The team used the datasets to experiment with over 60 different machine learning pipelines to see which were the most effective at predicting lung cancer risk using just three variables - age, how many years the individual smoked for, and the average number of cigarettes per day.
From these, they selected four model pipelines and combined them into an 'ensemble' that was able to predict lung cancer risk with the same or improved accuracy, compared to the best available models currently is use. Importantly, they were able to achieve this accuracy using only a third of the variables, greatly simplifying the process of gathering the data required.
Dr Tom Callender (UCL Medicine), first author of the study, said: "Screening for cancer and other diseases saves lives and we are increasingly able to personalise this process. But such personalised screening and disease prevention programmes present important logistical challenges at scale. Our study shows that artificial intelligence can be used to accurately predict lung cancer risk using just three pieces of information that would be easy to gather during routine GP appointments, online or via apps. This approach has the potential to greatly simplify population level screening for lung cancer and help to make it a reality."
The models used in the study were externally validated in the US Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial and benchmarked against models that are either in use or have performed strongly in previous analyses. The authors believe the same approach could be viable for simplifying screening process for other diseases, such as type-2 diabetes and cardiovascular disease.
This research is a prime example of how machine learning tools such as AutoPrognosis, combined with innovative clinical researchers, can make a real impact in healthcare at a population level. While AutoPrognosis has already been applied for risk prediction and prognosis in numerous diseases, this is the first time it has been used to determine the minimal information needed to screen patients. I think this is the future of preventive medicine and I'm optimistic that the same approach could be applied to screening for other diseases."
Professor Mihaela van der Schaar, Study Author, University of Cambridge
The authors hope the findings will be used to make any national lung cancer screening programme quicker, easier and cheaper to implement, while still achieving the primary aim of reducing lung cancer mortality.
Professor Sam Janes (UCL Medicine), senior author of the study, said: "It's great news that the UK is working towards a national screening programme for lung cancer, which remains the leading cause of cancer-related deaths in this country as it does across the world. But as we've seen in the US, whose screening programme uptake is just eight per cent, there are hurdles to overcome. For any national screening programme to work, it will need to be feasible to run and succeed in getting people to participate. Our findings are good news on both counts."
This work was supported by Wellcome, the National Science Foundation, the Medical Research Council and Cancer Research UK.
Callender, T., et al. (2023). Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study. PLOS Medicine. doi.org/10.1371/journal.pmed.1004287.