A new international study reveals that while ChatGPT may deliver clearer, more engaging answers on PCOS than traditional resources, its role is best seen as a powerful support tool rather than a replacement for evidence-based care.
Study: Assessing ChatGPT vs. evidence-based online responses for polycystic ovary syndrome self-management and education: an international cross-sectional blinded survey of healthcare professionals. Image credit: Prostock-studio/Shutterstock.com
A recent study published in Frontiers in Digital Health involving 43 healthcare professionals from diverse backgrounds compared ChatGPT responses to 12 frequently asked questions on PCOS with evidence-based answers. The latter were obtained from a patient-facing webpage on AskPCOS, which served as the evidence-based comparator in this study.
Rising use of ChatGPT for PCOS health questions
Polycystic ovarian syndrome (PCOS) is a widespread condition affecting millions of women in their reproductive years. Spanning endocrine, metabolic, and reproductive health, it causes significant distress. This may be exacerbated by limited knowledge of how and why it occurs and by its variable clinical presentation.
Large language models such as ChatGPT are increasingly used to provide personalized answers to health-related questions. Prior research suggests that ChatGPT shows promise for PCOS education.
Clinicians assess ChatGPT against guideline-based PCOS information
This international, blinded survey study is among the first of its kind to compare AI-generated and evidence-based PCOS information among a large, diverse group of healthcare professionals. A total of 43 clinicians evaluated responses to 12 frequently asked questions covering PCOS causes, symptoms, and diagnosis, treatment and management, and emotional and medical support.
The responses, one generated by ChatGPT and the other sourced from an evidence-based patient resource, were assessed for accuracy and clarity using a standardized Likert scale, with participants blinded to their origin. In parallel, readability was evaluated using multiple established reading indices.
Both the initial ChatGPT and evidence-based responses were found to have relatively high reading complexity, as measured by these metrics, suggesting they may be difficult for general audiences to understand. To address this, ChatGPT responses were further simplified using follow-up prompts and then re-evaluated for readability.
Clinicians rate ChatGPT answers higher across most questions
The ChatGPT answers consistently scored higher overall than the evidence-based answers by an average of 0.8 points on the Likert scale (from 0 (harmful response) to 4 (excellent response requiring no clarification), with statistically significant differences observed for 11 of the 12 questions. However, some questions showed substantial overlap between scores, while others were more widely separated.
Several score distributions showed two peaks, especially for ChatGPT responses to some questions, suggesting variability in grading among respondents. Seven questions showed fair agreement among respondents, while the rest showed poor agreement. The higher scores for ChatGPT answers occurred regardless of the healthcare provider's role or years of practice.
Importantly, the scoring need not indicate that the answers are correct, evidence-based, or up to date, since ChatGPT answers may not always reflect the most up-to-date evidence. Rather than replacing sites like AskPCOS, therefore, these findings indicate that PCOS-related information could be made more accessible, personalized, and reader-friendly using this model. This is especially important since patients find it difficult to access PCOS information in an integrated format.
ChatGPT also uses an empathetic tone, promoting engagement and interaction by PCOS patients. It can also simplify or rephrase any part of any response repeatedly, improving patient understanding. This could empower these patients and support better engagement with care recommendations, while potentially reducing reliance on less reliable information sources, such as social media. However, such outcomes were not directly measured in this study,
Readability was similar between the first set of ChatGPT responses and the evidence-based responses. This suggests that both were somewhat difficult for general readers to understand. The simplified ChatGPT answers were significantly more readable.
ChatGPT may represent a valuable supplementary tool for patient education in populations with low health literacy, which is an adverse risk factor for healthcare use and outcomes.
Strengths and limitations
The sample of respondents was not fully representative of all PCOS-related care professionals. Although larger than in several previous studies, it remains small. Respondents might be more familiar with or interested in PCOS, or more accustomed to AI-generated text, thereby introducing bias into the evaluations. There is also the possibility that some respondents inferred which answers were AI-generated based on style or structure, potentially influencing scoring.
The ChatGPT version used might have changed its response style or been updated. The knowledge base might also have changed or expanded since the study, altering potential responses.
Future directions
Patient-centered evaluations of LLM responses to PCOS would improve current understanding of their utility when combined with professional assessments. The accuracy and readability of responses across different languages and literacy levels remain to be explored.
Our findings suggest that online resources for PCOS could benefit from LLMs’ ability to improve readability through simplification and personalisation of their PCOS-related content.
Download your PDF copy by clicking here.