Can ChatGPT aid in patient education for benign prostate enlargement?

Download PDF Copy

By Vijay Kumar MalesuReviewed by Lily Ramsey, LLMJun 17 2024

In a recent study published in Prostate Cancer and Prostatic Diseases, a group of researchers evaluated the accuracy and quality of Chat Generative Pre-trained Transformers' (ChatGPT) responses on male lower urinary tract symptoms (LUTS) indicative of benign prostate enlargement (BPE) compared to established urological references.

Study: Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement? Image Credit: Miha Creative/Shutterstock.com

Background

As patients increasingly seek online medical guidance, major urological associations like the Association of Urology (EAU) and the American Urological Association (AUA) provide high-quality resources. However, modern technologies such as artificial intelligence (AI) are gaining popularity due to their efficiency.

ChatGPT, with over 1.5 million monthly visits, offers a user-friendly, conversational interface. A recent survey showed that 20% of urologists used ChatGPT clinically, with 56% recognizing its potential in decision-making.

Studies on ChatGPT's urological accuracy show mixed results. Further research is needed to comprehensively evaluate the effectiveness and reliability of AI tools like ChatGPT in delivering accurate and high-quality medical information.

About the study

The present study examined EAU and AUA patient information websites to identify key topics on BPE, formulating 88 related questions.

These questions covered definitions, symptoms, diagnostics, risks, management, and treatment options. Each question was independently submitted to ChatGPT, and the responses were recorded for comparison with the reference materials.

Two examiners classified ChatGPT's responses as true negative (TN), false negative (FN), true positive (TP), or false positive (FP). Discrepancies were resolved by consensus or consultation with a senior specialist.

Performance metrics, including F1 score, precision, and recall, were calculated to assess accuracy, with the F1 score used for its reliability in evaluating model accuracy.

General quality scores (GQS) were assigned using a 5-point Likert scale, assessing the truthfulness, relevancy, structure, and language of ChatGPT's responses. Scores ranged from 1 (false or misleading) to 5 (extremely accurate and relevant). The mean GQS from the two examiners was used as the final score for each question.

Examiner agreement on GQS scores was measured using the interclass correlation coefficient (ICC), and differences were assessed with the Wilcoxon signed-rank test, with a p-value of less than 0.05 considered significant. Analyses were conducted using SAS version 9.4.

Study results

ChatGPT addressed 88 questions across eight categories related to BPE. Notably, 71.6% of the questions (63 out of 88) focused on BPE management, including conventional surgical interventions (27 questions), minimally invasive surgical therapies (MIST, 21 questions), and pharmacotherapy (15 questions).

ChatGPT generated responses to all 88 questions, totaling 22,946 words and 1,430 sentences. In contrast, the EAU website contained 4,914 words and 200 sentences, while the AUA patient guide had 3,472 words and 238 sentences. The AI-generated responses were almost three times longer than the source materials.

The performance metrics of ChatGPT’s responses varied, with F1 scores ranging from 0.67 to 1.0, precision scores from 0.5 to 1.0, and recall from 0.9 to 1.0.

The GQS ranged from 3.5 to 5. Overall, ChatGPT achieved an F1 score of 0.79, a precision score of 0.66, and a recall score of 0.97. The GQS scores from both examiners had a median of 4, with a range of 1 to 5.

The examiners found no statistically significant difference between the scores they assigned to the overall quality of the responses, with a p-value of 0.72. They determined a good level of agreement between them, reflected by an ICC of 0.86.

Conclusions

To summarize, ChatGPT addressed all 88 queries, with performance metrics consistently above 0.5, and an overall GQS of 4, indicating high-quality responses. However, ChatGPT's responses were often excessively lengthy.

Accuracy varied by topic, excelling in BPE concepts but less in minimally invasive surgical therapies. The high level of agreement between examiners on the quality of the responses underscores the reliability of the evaluation process.

As AI continues to evolve, it holds promise for enhancing patient education and support, but ongoing assessment and improvement are essential to maximize its utility in clinical settings.

Journal reference:

Puerto Nino, A.K., Garcia Perez, V., Secco, S. et al. (2024) Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement?. Prostate Cancer Prostatic Dis. doi: https://doi.org/10.1038/s41391-024-00847-7. https://www.nature.com/articles/s41391-024-00847-7

Comments (0)

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Kumar Malesu, Vijay. (2024, June 17). Can ChatGPT aid in patient education for benign prostate enlargement?. News-Medical. Retrieved on July 18, 2025 from https://www.news-medical.net/news/20240617/Can-ChatGPT-aid-in-patient-education-for-benign-prostate-enlargement.aspx.
MLA
Kumar Malesu, Vijay. "Can ChatGPT aid in patient education for benign prostate enlargement?". News-Medical. 18 July 2025. <https://www.news-medical.net/news/20240617/Can-ChatGPT-aid-in-patient-education-for-benign-prostate-enlargement.aspx>.
Chicago
Kumar Malesu, Vijay. "Can ChatGPT aid in patient education for benign prostate enlargement?". News-Medical. https://www.news-medical.net/news/20240617/Can-ChatGPT-aid-in-patient-education-for-benign-prostate-enlargement.aspx. (accessed July 18, 2025).
Harvard
Kumar Malesu, Vijay. 2024. Can ChatGPT aid in patient education for benign prostate enlargement?. News-Medical, viewed 18 July 2025, https://www.news-medical.net/news/20240617/Can-ChatGPT-aid-in-patient-education-for-benign-prostate-enlargement.aspx.