By measuring subtle changes in voice quality, AI could help doctors detect dangerous vocal fold lesions before symptoms worsen.
Study: Voice as a biomarker: exploratory analysis for benign and malignant vocal fold lesions. Image Credit: 3dMediSphere / Shutterstock
An exploratory study reveals that subtle changes in voice patterns, especially variability in harmonic-to-noise ratio, could serve as early warning signs of vocal fold lesions, paving the way for future AI-powered screening tools.
A new study led by Oregon Health and Science University and Portland State University researchers identified distinct vocal features that may serve as potential biomarkers for early detection of benign and malignant vocal fold lesions. The study is published in the journal Frontiers in Digital Health.
Background
Alterations in voice pitch, loudness, and quality characterize vocal disorders. Various factors can potentially trigger these disorders, including vocal fold pathology, neurologic conditions, or functional voice use patterns.
Individuals with voice disorders often experience poor quality of life, low self-esteem, work-related difficulties, and social isolation. These experiences are particularly more pronounced among individuals whose professional roles significantly depend on voice communication.
Both benign and malignant vocal fold lesions (laryngeal cancer) are associated with voice disorders. While benign lesions substantially affect voice quality and cause morbidity, malignant lesions are often life-threatening if left untreated.
Dysphonia (a condition characterized by abnormal voice) is one of the first symptoms of vocal fold lesions, which requires a diagnostic process including visualization of the larynx and assessment of the lesion's morphology through video endoscopy. The larynx is an anatomical structure in the neck where vocal folds are located.
Recent advancements in artificial intelligence (AI) technologies have facilitated human voice analysis for early detection of a variety of health conditions, including laryngeal pathology, neurological and psychological disorders, head and neck cancers, and diabetes.
The use of voice as a digital biomarker provides a promising platform for non-invasive detection and screening of these potentially life-threatening conditions. The Voice to AI project, as part of the National Institutes of Health (NIH) Bridge to Artificial Intelligence (Bridge2AI) consortium, aims to analyze voice as a biomarker of health for use in clinical care.
In the current study, researchers analyzed the Bridge2AI-Voice dataset to identify specific acoustic features that effectively distinguish laryngeal cancer and benign vocal fold lesions from other vocal pathologies and healthy voice function. Acoustic features refer to measurable voice properties, including pitch, loudness, and quality.
The study
The dataset analyzed in the study includes 12,523 recordings of 306 participants collected across five sites in North America. Acoustic analyses focused on Rainbow Passage recordings (180 recordings from 176 participants) with features pre-extracted using openSMILE software. The main aim of the study was the identification of acoustic features that can distinguish the voices of participants with vocal fold lesions from those without any vocal disorders, as well as distinguish the voices of participants with lesions from those with other vocal disorders.
The participants were categorized into two groups based on lesion type and vocal disorder diagnosis. The first group included participants with laryngeal cancer, benign lesions, or no voice disorder, and the second group included participants with laryngeal cancer or benign lesions without other voice disorders, as well as those with other vocal disorders (spasmodic dysphonia or vocal fold paralysis). Transgender participants were excluded from sex-stratified analyses because prior voice-altering care could not be verified.
Four acoustic features plus the variability (standard deviation) of HNR, fundamental frequency, jitter, shimmer, and harmonic-to-noise ratio (HNR) were extracted from the voice recordings of participants for comparative analysis. Fundamental frequency refers to the frequency at which the vocal cords vibrate; jitter is the measure of fundamental frequency fluctuations; shimmer is the measure of fluctuations in the amplitude of sound waves; and HNR is the ratio of the periodic to aperiodic component in a speech signal.
Key findings
The analysis of acoustic features revealed that participants with benign lesions have significantly different mean HNR and fundamental frequency compared to those without any voice disorder, and significantly different HNR variability (SD) compared to laryngeal cancer. HNR variability (SD) was not significantly different between benign lesions and no voice disorder. Mean HNR and fundamental frequency did not differ significantly between benign lesions and laryngeal cancer.
The gender-related comparison revealed in cisgender men similar differences in mean HNR and HNR variability vs no voice disorder and HNR variability vs laryngeal cancer, but not in female participants, which might be due to the smaller sample size.
No significant differences were found for jitter or shimmer in any comparison, and no acoustic feature significantly distinguished lesion groups from other vocal disorders in the second analysis group.
Study significance
The study identifies harmonic-to-noise ratio variability (standard deviation) as a promising voice-related biomarker for early detection and monitoring of vocal fold lesions. The periodic component of this ratio arises from regular glottal pulses during phonation, and the aperiodic component is the noise produced from turbulence as air flows through the glottis (the center of the larynx).
Both the mean and the standard deviation of the harmonic-to-noise ratio were measured in the study, as the researchers believed that this variability would help measure consistency in vocal production. The observed differences in standard deviation between benign and malignant lesion groups suggest that this feature may serve as a useful marker for monitoring lesion progression and detecting laryngeal cancer at an early stage.
However, the study could not detect significant differences in the harmonic-to-noise ratio and its variability between participants with benign or malignant lesions and those with other vocal disorders. This indicates that distinguishing lesions from other vocal pathologies may be more challenging.
Notably, the study could not detect significant differences in the harmonic-to-noise ratio and its variability among female participants. This highlights the need for analyzing additional acoustic features in order to consider voice as a promising early indicator of vocal fold lesions.
The authors emphasise that these are exploratory findings and do not constitute a validated screening test. They call for larger, more diverse cohorts and additional acoustic features to be assessed, particularly in women, before integration into clinical tools.
Overall, the study findings highlight the future potential of validated AI-based voice screening tools to identify individuals with subtle voice changes who may not otherwise seek care, especially in primary care or telehealth settings. Such tools could prompt earlier referrals to voice specialists, help prioritize urgent cases, and reduce diagnostic delays.