Machine-based lip-reading system performance compared with human lip-readers

NewsGuard 100/100 Score

A new study by the University of East Anglia (UEA) suggests computers are now better at lip-reading than humans.

The peer-reviewed findings will be presented for the first time at the eighth International Conference on Auditory-Visual Speech Processing (AVSP) 2009, held at the University of East Anglia from September 10-13.

A research team from the School of Computing Sciences at UEA compared the performance of a machine-based lip-reading system with that of 19 human lip-readers. They found that the automated system significantly outperformed the human lip-readers - scoring a recognition rate of 80 per cent, compared with only 32 per cent for human viewers on the same task.

Furthermore, they found that machines are able to exploit very simplistic features that represent only the shape of the face, whereas human lip-readers require full video of people speaking.

The study also showed that rather than the traditional approach to lip-reading training, in which viewers are taught to spot key lip-shapes from static (often drawn) images, the dynamics and the full appearance of speech gestures are very important.

Using a new video-based training system, viewers with very limited training significantly improved their ability to lip-read monosyllabic words, which in itself is a very difficult task. It is hoped this research might lead to novel methods of lip-reading training for the deaf and hard of hearing.

"This pilot study is the first time an automated lip-reading system has been benchmarked against human lip-readers and the results are perhaps surprising," said the study's lead author Sarah Hilder.

"With just four hours of training it helped them improve their lip-reading skills markedly. We hope this research will represent a real technological advance for the deaf community."

Agnes Hoctor, campaigns manager at the RNID, said: "This research confirms how difficult the vital skill of lip-reading is to learn and why RNID is campaigning for people who are deaf or hard of hearing to have improved access to classes. We would welcome the development of video-based or online training resources to supplement the teaching of lip-reading. Hearing loss affects 55 per cent of people over 60 so, with the ageing population, demand to learn lip-reading is only going to increase."

The AVSP conference is being held in the UK for the first time since its inception in 1998. The University of East Anglia will host cutting edge researchers including psychologists, engineers, scientists and linguists from as far afield as Australia, Canada and Japan.

As part of the conference, delegates will take part in a Visual Speech Synthesis Challenge in which a number of visual speech synthesizers, or 'talking heads', will battle it out to determine the most intelligible and visually appealing system.

AVSP runs as a satellite conference to Interspeech 2009 which will be held in Brighton. Topics under discussion will include: machine recognition of audiovisual speech; the role of gestures accompanying speech; modeling, synthesis and recognition of facial gestures; and speech synthesis.

Keynote speakers will be Dr Peter Bull of the University of York who will be exploring The Myth of Body Language and Prof Louis Goldstein of the University of Southern California whose presentation is entitled Articulatory Phonology and Audio-Visual Speech.

Comparison of human and machine-based lip-reading by Sarah Hilder, Richard Harvey and Barry-John Theobald is published in the Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP) 2009 on Thursday September 10 2009.

The research will be presented on Saturday September 12 at the International Conference on Auditory-Visual Speech Processing (AVSP) 2009 at the University of East Anglia.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Can virtual reality be the future of brain health? New research suggests VR exercise enhances working memory