Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Improving visual speech synthesis using Decision Tree Models

Thesis (MEng)--Stellenbosch University, 2016.

Saved in:
Bibliographic Details
Main Author: Rademan, Christiaan Frans
Other Authors: Niesler, T. R.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2016
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613759886852096
access_status_str Open Access
author Rademan, Christiaan Frans
author2 Niesler, T. R.
author_browse Niesler, T. R.
Rademan, Christiaan Frans
author_facet Niesler, T. R.
Rademan, Christiaan Frans
author_sort Rademan, Christiaan Frans
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MEng)--Stellenbosch University, 2016.
format Thesis
id oai:scholar.sun.ac.za:10019.1/98728
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:41:15.521Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2016
publishDateRange 2016
publishDateSort 2016
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/98728 Improving visual speech synthesis using Decision Tree Models Rademan, Christiaan Frans Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. UCTD Speech synthesis Speech processing systems Computer animation Speech processing systems -- Digital techniques Decision trees Thesis (MEng)--Stellenbosch University, 2016. ENGLISH ABSTRACT: Visual speech synthesis is essential for believable virtual character interaction. Traditionally, animation artists recreate the oral motions expected from speech utterances. In response, we present decision tree-based clustering techniques which are employed in automating visual speech animation. This is achieved using a small dataset of phoneticallyannotated audiovisual speech. Our work focuses on extending existing tree-based clustering algorithms by improving on the modelling of coarticulation effects. This is accomplished by capturing the motion of natural speech segments, referred to as dynamic visemes, and conserving their parameters during clustering and speech synthesis. Dynamic visemes are defined as the trajectories of oral features segmented by triphone boundaries. By applying simple search and concatenation criteria, our visual speech synthesis system uses decision trees to better predict which dynamic visemes to use. Experimentation guided all design decisions, suggesting which oral features were of greatest importance, identifying an appropriate dynamic viseme length and finding an effective interpolation method for conserving coarticulation. We evaluate the performance of our visual speech synthesis models by computing squared error differences between synthesised and measured feature trajectories. Perceptual tests also asked participants to compare virtual characters animated by model outputs. Both measured and perceptual tests show that our approaches lead to a clear improvement over a comparable baseline. Through our research, we intended on making speech synthesis more accessible. Therefore, the conversational agents are based on the freely available MakeHuman and Blender software components. The customised oral feature motion capture system is also easily reproduced and requires only consumer grade recording equipment. AFRIKAANSE OPSOMMING: Visuele spraaksintese is noodsaaklik om geloofwaardige interaksie met virtuele karakters moontlik te maak. In die verlede het animasiekunstenaars mondbewegings vanaf werklike spraak nageboots. In hierdie studie bied ons tegnieke aan wat gebaseer is op saambondeling met behulp van besluitnemingsbome. Hierdie tegnieke word gebruik om die animasie van visuele spraak te outomatiseer, en maak gebruik van ’n klein datastel van foneties geannoteerde oudiovisuele spraak. Ons werk fokus op die uitbrei van bestaande besluitnemingsboom-saambondelingsalgoritmes, deur die modellering van koartikulasie-effekte te verbeter. Dit word moontlik gemaak deur eers die beweging van natuurlike spraaksegmente (viseme) vas te vang, en dan hul parameters te bewaar tydens die saambondeling en spraaksintese. Dinamiese viseme word gedefinieer as die trajekte van mondeienskappe, gesegmenteer deur trifoongrense. Deur eenvoudige soek- en saamvoegingskriteria toe te pas, kan ons visuele spraaksintesestelsel van besluitnemingsbome gebruik maak om beter te voorspel watter viseme aangewend moet word. Alle ontwerpsbesluite is deur ekspermintering gelei, om bv. die mondeienskappe van grootste belang te identifiseer, om ’n gepaste viseemlengte vas te stel, en om ’n effektiewe interpolasiemetode te vind wat koartikulasie bewaar. Ons evalueer die werksverrigting van ons visuele spraaksintesemodel deur die kwadraatfout tussen die gesintetiseerde en gemete eienskaptrajekte te bereken. Tydens perseptuele toetse is deelnemers gevra om die geloofwaardigheid van virtuele karakters, aangedryf deur die modeluittrees, te beoordeel. Beide gemete en perseptuele toetse het aangedui dat die voorgestelde tegnieke ’n duidelike verbetering bo ’n geskikte basislynmeting toon. Die doel van hierdie navorsing is om spraaksintese meer toeganklik te maak. Om hierdie rede is die gespreksagente gebou op die vrylik beskikbare MakeHuman- en Blendersagtewarekomponente. Die pasgemaakte mondeienskap-bewegingsaftaster is ook eenvoudig om te herproduseer, en benodig slegs verbruikersgraad-opneemtoerusting. 2016-03-09T14:54:21Z 2016-03-09T14:54:21Z 2016-03 Thesis http://hdl.handle.net/10019.1/98728 en_ZA Stellenbosch University xiii, 88 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle UCTD
Speech synthesis
Speech processing systems
Computer animation
Speech processing systems -- Digital techniques
Decision trees
Rademan, Christiaan Frans
Improving visual speech synthesis using Decision Tree Models
title Improving visual speech synthesis using Decision Tree Models
title_full Improving visual speech synthesis using Decision Tree Models
title_fullStr Improving visual speech synthesis using Decision Tree Models
title_full_unstemmed Improving visual speech synthesis using Decision Tree Models
title_short Improving visual speech synthesis using Decision Tree Models
title_sort improving visual speech synthesis using decision tree models
topic UCTD
Speech synthesis
Speech processing systems
Computer animation
Speech processing systems -- Digital techniques
Decision trees
url http://hdl.handle.net/10019.1/98728
work_keys_str_mv AT rademanchristiaanfrans improvingvisualspeechsynthesisusingdecisiontreemodels