Can ChatGPT aid in patient education for benign prostate enlargement?

In a recent study published in Prostate Cancer and Prostatic Diseases, a group of researchers evaluated the accuracy and quality of Chat Generative Pre-trained Transformers' (ChatGPT) responses on male lower urinary tract symptoms (LUTS) indicative of benign prostate enlargement (BPE) compared to established urological references. 

Study: Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement? Image Credit: Miha Creative/​​​​​​​Study: Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement? Image Credit: Miha Creative/


As patients increasingly seek online medical guidance, major urological associations like the Association of Urology (EAU) and the American Urological Association (AUA) provide high-quality resources. However, modern technologies such as artificial intelligence (AI) are gaining popularity due to their efficiency.

ChatGPT, with over 1.5 million monthly visits, offers a user-friendly, conversational interface. A recent survey showed that 20% of urologists used ChatGPT clinically, with 56% recognizing its potential in decision-making.

Studies on ChatGPT's urological accuracy show mixed results. Further research is needed to comprehensively evaluate the effectiveness and reliability of AI tools like ChatGPT in delivering accurate and high-quality medical information.

About the study 

The present study examined EAU and AUA patient information websites to identify key topics on BPE, formulating 88 related questions.

These questions covered definitions, symptoms, diagnostics, risks, management, and treatment options. Each question was independently submitted to ChatGPT, and the responses were recorded for comparison with the reference materials.

Two examiners classified ChatGPT's responses as true negative (TN), false negative (FN), true positive (TP), or false positive (FP). Discrepancies were resolved by consensus or consultation with a senior specialist.

Performance metrics, including F1 score, precision, and recall, were calculated to assess accuracy, with the F1 score used for its reliability in evaluating model accuracy.

General quality scores (GQS) were assigned using a 5-point Likert scale, assessing the truthfulness, relevancy, structure, and language of ChatGPT's responses. Scores ranged from 1 (false or misleading) to 5 (extremely accurate and relevant). The mean GQS from the two examiners was used as the final score for each question.

Examiner agreement on GQS scores was measured using the interclass correlation coefficient (ICC), and differences were assessed with the Wilcoxon signed-rank test, with a p-value of less than 0.05 considered significant. Analyses were conducted using SAS version 9.4.

Study results 

ChatGPT addressed 88 questions across eight categories related to BPE. Notably, 71.6% of the questions (63 out of 88) focused on BPE management, including conventional surgical interventions (27 questions), minimally invasive surgical therapies (MIST, 21 questions), and pharmacotherapy (15 questions).

ChatGPT generated responses to all 88 questions, totaling 22,946 words and 1,430 sentences. In contrast, the EAU website contained 4,914 words and 200 sentences, while the AUA patient guide had 3,472 words and 238 sentences. The AI-generated responses were almost three times longer than the source materials.

The performance metrics of ChatGPT’s responses varied, with F1 scores ranging from 0.67 to 1.0, precision scores from 0.5 to 1.0, and recall from 0.9 to 1.0.

The GQS ranged from 3.5 to 5. Overall, ChatGPT achieved an F1 score of 0.79, a precision score of 0.66, and a recall score of 0.97. The GQS scores from both examiners had a median of 4, with a range of 1 to 5.

The examiners found no statistically significant difference between the scores they assigned to the overall quality of the responses, with a p-value of 0.72. They determined a good level of agreement between them, reflected by an ICC of 0.86. 


To summarize, ChatGPT addressed all 88 queries, with performance metrics consistently above 0.5, and an overall GQS of 4, indicating high-quality responses. However, ChatGPT's responses were often excessively lengthy.

Accuracy varied by topic, excelling in BPE concepts but less in minimally invasive surgical therapies. The high level of agreement between examiners on the quality of the responses underscores the reliability of the evaluation process.

As AI continues to evolve, it holds promise for enhancing patient education and support, but ongoing assessment and improvement are essential to maximize its utility in clinical settings.

Journal reference:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2024, June 17). Can ChatGPT aid in patient education for benign prostate enlargement?. News-Medical. Retrieved on July 25, 2024 from

  • MLA

    Kumar Malesu, Vijay. "Can ChatGPT aid in patient education for benign prostate enlargement?". News-Medical. 25 July 2024. <>.

  • Chicago

    Kumar Malesu, Vijay. "Can ChatGPT aid in patient education for benign prostate enlargement?". News-Medical. (accessed July 25, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2024. Can ChatGPT aid in patient education for benign prostate enlargement?. News-Medical, viewed 25 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Lifestyle and socioeconomic status affect health impacts of ultra-processed foods