Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Improving visual speech synthesis using Decision Tree Models

Thesis (MEng)--Stellenbosch University, 2016.

Saved in:

Bibliographic Details
Main Author:	Rademan, Christiaan Frans
Other Authors:	Niesler, T. R.
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2016
Subjects:	UCTD Speech synthesis Speech processing systems Computer animation Speech processing systems > Digital techniques Decision trees
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613759886852096
access_status_str	Open Access
author	Rademan, Christiaan Frans
author2	Niesler, T. R.
author_browse	Niesler, T. R. Rademan, Christiaan Frans
author_facet	Niesler, T. R. Rademan, Christiaan Frans
author_sort	Rademan, Christiaan Frans
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MEng)--Stellenbosch University, 2016.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/98728
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:41:15.521Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2016
publishDateRange	2016
publishDateSort	2016
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/98728 Improving visual speech synthesis using Decision Tree Models Rademan, Christiaan Frans Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. UCTD Speech synthesis Speech processing systems Computer animation Speech processing systems -- Digital techniques Decision trees Thesis (MEng)--Stellenbosch University, 2016. ENGLISH ABSTRACT: Visual speech synthesis is essential for believable virtual character interaction. Traditionally, animation artists recreate the oral motions expected from speech utterances. In response, we present decision tree-based clustering techniques which are employed in automating visual speech animation. This is achieved using a small dataset of phoneticallyannotated audiovisual speech. Our work focuses on extending existing tree-based clustering algorithms by improving on the modelling of coarticulation effects. This is accomplished by capturing the motion of natural speech segments, referred to as dynamic visemes, and conserving their parameters during clustering and speech synthesis. Dynamic visemes are defined as the trajectories of oral features segmented by triphone boundaries. By applying simple search and concatenation criteria, our visual speech synthesis system uses decision trees to better predict which dynamic visemes to use. Experimentation guided all design decisions, suggesting which oral features were of greatest importance, identifying an appropriate dynamic viseme length and finding an effective interpolation method for conserving coarticulation. We evaluate the performance of our visual speech synthesis models by computing squared error differences between synthesised and measured feature trajectories. Perceptual tests also asked participants to compare virtual characters animated by model outputs. Both measured and perceptual tests show that our approaches lead to a clear improvement over a comparable baseline. Through our research, we intended on making speech synthesis more accessible. Therefore, the conversational agents are based on the freely available MakeHuman and Blender software components. The customised oral feature motion capture system is also easily reproduced and requires only consumer grade recording equipment. AFRIKAANSE OPSOMMING: Visuele spraaksintese is noodsaaklik om geloofwaardige interaksie met virtuele karakters moontlik te maak. In die verlede het animasiekunstenaars mondbewegings vanaf werklike spraak nageboots. In hierdie studie bied ons tegnieke aan wat gebaseer is op saambondeling met behulp van besluitnemingsbome. Hierdie tegnieke word gebruik om die animasie van visuele spraak te outomatiseer, en maak gebruik van ’n klein datastel van foneties geannoteerde oudiovisuele spraak. Ons werk fokus op die uitbrei van bestaande besluitnemingsboom-saambondelingsalgoritmes, deur die modellering van koartikulasie-effekte te verbeter. Dit word moontlik gemaak deur eers die beweging van natuurlike spraaksegmente (viseme) vas te vang, en dan hul parameters te bewaar tydens die saambondeling en spraaksintese. Dinamiese viseme word gedefinieer as die trajekte van mondeienskappe, gesegmenteer deur trifoongrense. Deur eenvoudige soek- en saamvoegingskriteria toe te pas, kan ons visuele spraaksintesestelsel van besluitnemingsbome gebruik maak om beter te voorspel watter viseme aangewend moet word. Alle ontwerpsbesluite is deur ekspermintering gelei, om bv. die mondeienskappe van grootste belang te identifiseer, om ’n gepaste viseemlengte vas te stel, en om ’n effektiewe interpolasiemetode te vind wat koartikulasie bewaar. Ons evalueer die werksverrigting van ons visuele spraaksintesemodel deur die kwadraatfout tussen die gesintetiseerde en gemete eienskaptrajekte te bereken. Tydens perseptuele toetse is deelnemers gevra om die geloofwaardigheid van virtuele karakters, aangedryf deur die modeluittrees, te beoordeel. Beide gemete en perseptuele toetse het aangedui dat die voorgestelde tegnieke ’n duidelike verbetering bo ’n geskikte basislynmeting toon. Die doel van hierdie navorsing is om spraaksintese meer toeganklik te maak. Om hierdie rede is die gespreksagente gebou op die vrylik beskikbare MakeHuman- en Blendersagtewarekomponente. Die pasgemaakte mondeienskap-bewegingsaftaster is ook eenvoudig om te herproduseer, en benodig slegs verbruikersgraad-opneemtoerusting. 2016-03-09T14:54:21Z 2016-03-09T14:54:21Z 2016-03 Thesis http://hdl.handle.net/10019.1/98728 en_ZA Stellenbosch University xiii, 88 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle	UCTD Speech synthesis Speech processing systems Computer animation Speech processing systems -- Digital techniques Decision trees Rademan, Christiaan Frans Improving visual speech synthesis using Decision Tree Models
title	Improving visual speech synthesis using Decision Tree Models
title_full	Improving visual speech synthesis using Decision Tree Models
title_fullStr	Improving visual speech synthesis using Decision Tree Models
title_full_unstemmed	Improving visual speech synthesis using Decision Tree Models
title_short	Improving visual speech synthesis using Decision Tree Models
title_sort	improving visual speech synthesis using decision tree models
topic	UCTD Speech synthesis Speech processing systems Computer animation Speech processing systems -- Digital techniques Decision trees
url	http://hdl.handle.net/10019.1/98728
work_keys_str_mv	AT rademanchristiaanfrans improvingvisualspeechsynthesisusingdecisiontreemodels

Full Text Available

Improving visual speech synthesis using Decision Tree Models

Similar Items