A novel visual speech representation and HMM classification for visual speech recognition

Yu, Dahai, Ghita, Ovidiu, Sutherland, Alistair and Whelan, Paul F. ORCID: 0000-0001-9230-7656 (2009) A novel visual speech representation and HMM classification for visual speech recognition. In: The 3rd Pacific-Rim symposium on image and video technology (PSIVT2009), 13-16 Jan 2009, Tokyo, Japan.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper presents the development of a novel visual speech recognition (VSR) system based on a new representation that extends the standard viseme concept (that is referred in this paper to as Visual Speech Unit (VSU) and Hidden Markov Models (HMM). The visemes have been regarded as the smallest visual speech elements in the visual domain and they have been widely applied to model the visual speech, but it is worth noting that they are problematic when applied to the continuous visual speech recognition. To circumvent the problems associated with standard visemes, we propose a new visual speech representation that includes not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. To fully evaluate the appropriateness of the proposed visual speech representation, in this paper an extensive set of experiments have been conducted to analyse the performance of the visual speech units when compared with that offered by the standard MPEG-4 visemes. The experimental results indicate that the developed VSR application achieved up to 90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only in the range 62-72%.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Other
Refereed:	Yes
Additional Information:	Winner of best Paper Award
Uncontrolled Keywords:	computer vision; image analysis; Visual Speech Recognition; Visual Speech Unit; Viseme; EMPCA; HMM; Dynamic Time Warping
Subjects:	UNSPECIFIED
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Published in:	Advances in image and video technology, proceedings. Lecture notes in computer science 5414.
Copyright Information:	© 2009 Springer. The original publication is available at www.springerlink.com
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	18623
Deposited On:	14 Aug 2013 10:21 by Mark Sweeney . Last Modified 11 Jan 2019 15:56

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
655kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

A novel visual speech representation and HMM classification for visual speech recognition

Downloads