OpenAI is an Artificial intelligence (AI)-based research and development company that has recently developed ChatGPT, a large language model (LLM). Although previously developed LLMs can perform varied natural language processing (NLP) tasks, ChatGPT processes differentially. ChatGPT is an AI chatbot that can interact in a human-like conversation.
Interestingly, just 5 days after the ChatGPT release, it had over one million users. The majority of users tried ChatGPT to answer complex questions or generate short text. Compared to manually developed text, plagiarism detection in text generated by the ChatGPT tool would not be easy.
A recent Frontiers in Public Health journal study focused on the evolution of LLMs. It also evaluated how ChatGPT could impact future research and public health. This study aimed to promote a debate on ChatGPT’s function in medical research, considering the concept of “AI-driven infodemic.”
Perspective: ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Image Credit: Mila Supinskaya Glashchenko / Shutterstock
Evolution of LLMs
In the last five years, exponential growth in LLMs has been observed, which enabled the execution of varied tasks. However, prior to 2017, most NLP models were trained for one particular task. This drawback was overcome through the development of the self-attention network architecture, also known as Transformer. In 2018, this concept was used to develop two revolutionary models, namely, Generative Pretrained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT).
To achieve generalization capabilities in BERT and GPT, a combination of supervised fine-tuning and unsupervised pre-training was used. This approach enabled the application of pre-trained language representations to execute downstream tasks.
GPT models evolved rapidly, and many versions were launched. The improved versions contain large textual data and parameters. For instance, the third version of GPT (GPT-3) is 100 times bigger than GPT-2 and includes 175 billion parameters. Although GPT-3 can generate texts covering a wide range of domains, it often provides biased text that contains untrue facts. This is because many LLMs, including GPT-3, replicate biases because they were designed to predict the next text element based on data available on the internet. The main problem was to design LLMs that align with human values and ethical principles.
Addressing the aforementioned problem, OpenAI developed ChatGPT incorporating 1.3 billion parameters trained using reinforcement learning from human feedback (RLHF). The 2021 ChatGPT generated incorrect text due to its fact-checking inability, which was rectified by integrating GPT-4 into ChatGPT. Although the latest ChatGPT generates reliable data, one should account for all limitations of this tool, particularly while applying it in medical research.
Assessing Threats of ChatGPT in Public Health Considering the AI-driven Infodemic
ChatGPT can be used by researchers to create important scientific articles. For instance, this tool can be used to suggest relevant titles for research articles, write drafts and express complicated scientific concepts in simple and grammatically correct English. The high interest in ChatGPT in the scientific community could be gauged through the rapid increase in the number of research articles on this tool.
Many authors have already used ChatGPT to write a part of their scientific articles. This underscores the fact that this tool has already been included in research processes, even before addressing ethical concerns and establishing standard rules for its application.
LLMs can be tricked into producing text related to controversial topics or misinformed content. LLMs can produce text similar to those composed by humans. This ability can be misused to create fake news articles and fabricated or misleading content without the user realizing that the content is produced by AI.
Recently, some authors have underscored the need for LLM detectors that can identify fake news. The present GPT-2 detectors are not reliable in detecting text written by AI when generated by ChatGPT. There is a continual need to improve detectors in accordance with the rapid advancement of LLMs to curb malicious intent.
Due to the lack of accurate detectors, some precautionary measures must be followed. For instance, the International Conference on Machine Learning (ICML) for 2023 prohibited the use of LLMs in submitted drafts. However, no tools are available to verify compliance with this rule.
Many scientific journals have updated author’s guidelines; for example, Springer Nature journals added that LLMs cannot be listed as authors and its use must be mentioned in methods or acknowledgments sections. These updated guidelines have also been implemented by Elsevier.
ChatGPT can be misused to generate fake scientific abstracts, articles, and bibliographies. Here, a digital object identifier system (DOI) could be used to accurately detect fake references. Scientists pointed out that years of research are required to validate a finding in medicine before it can be used clinically. Therefore, fake information generated by AI tools can endanger people’s safety.
The coronavirus disease 2019 (COVID-19) pandemic has profoundly affected health research. This is primarily due to the rapid dissemination of information, from preprint servers, via social media that impacted an individual’s health choices. COVID-19 information was mostly circulated through social media, which resulted in a phenomenon known as infodemic. It was observed that an infodemic could significantly influence medical decision-making in preventive or treatment strategies. The authors foresee significant public health threats in the future due to the generation of AI-driven infodemics.