Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Latent code manipulation for text-to-image and video synthesis: evaluating generative networks

Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a

Saved in:

Bibliographic Details
Main Author:	Masiya, Elvis
Other Authors:	Ngxande, M.
Format:	Thesis
Language:	English
Published:	Stellenbosch : Stellenbosch University 2025
Subjects:	Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867614052711137280
access_status_str	Open Access
author	Masiya, Elvis
author2	Ngxande, M.
author_browse	Masiya, Elvis Ngxande, M.
author_facet	Ngxande, M. Masiya, Elvis
author_sort	Masiya, Elvis
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/132632
institution	Stellenbosch University (South Africa)
language	English
last_indexed	2026-06-10T12:45:54.519Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/132632 Latent code manipulation for text-to-image and video synthesis: evaluating generative networks Masiya, Elvis Ngxande, M. Stellenbosch University. Faculty of Science. Dept. of Computer Science. Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a Thesis (MSc)--Stellenbosch University, 2025. ENGLISH ABSTRACT: Text-to-image and text-to-video generation are increasingly important in artificial intelligence, enabling the creation of visual content from textual descriptions. However, existing techniques such as Generative Adversarial Networks (GANs) often face limitations in generating high-fidelity and contextually accurate images and videos. This research explores the application of diffusion models, specifically the state-of-the-art Stable Diffusion XL (SDXL), to overcome these challenges. A systematic comparative analysis is performed between diffusion models and GANs, using StyleGAN2 as the representative GAN model. The evaluation focuses on the ability of these models to adapt to variations in textual input and accurately represent contextual information. Experiments involve extensive benchmarking on synthetic datasets designed to test various aspects of visual generation quality, including fidelity, d iversity, a nd c ontextual relevance. The results indicate that diffusion models significantly outperform GANs in generating higher-quality images and videos with enhanced contextual accuracy. Diffusion models demonstrate superior adaptability to complex textual inputs and produce visuals that more accurately reflect the intended content. Nevertheless, challenges related to computational efficiency and scalability are identified, suggesting areas for further optimisation. These findings underscore the potential of diffusion models to advance the field o f creative automation and improve human-computer interaction. The research contributes to the broader domains of computer vision and natural language processing by providing insights into the practical applications and limitations of diffusion models for visual generation technologies. It establishes a foundation for future work aimed at addressing existing challenges and fully realising the capabilities of text-to-image and text-to-video synthesis. AFRIKAANSE OPSOMMING: Teks-na-beeld en teks-na-video-generering speel ’n toenemend belangrike rol in kunsmatige intelligensie, aangesien dit die skepping van visuele inhoud vanaf teksbeskrywings moontlik maak. Bestaande tegnieke, soos Generatiewe Adversariese Netwerke (GANs), het egter dikwels beperkings ten opsigte van die generering van hoëtrou- en kontekstueel akkurate beelde en video’s. Hierdie navorsing ondersoek die toepassing van diffusiemodelle, spesifiek die toonaangewende Stable Diffusion XL (SDXL), om hierdie uitdagings te oorkom. ’n Sistematiese vergelykende analise word uitgevoer tussen diffusiemodelle en GANs, met StyleGAN2 as die verteenwoordigende GAN-model. Die evaluering fokus op die vermoë van hierdie modelle om aan te pas by variasies in teksinvoer en om kontekstuele inligting akkuraat weer te gee. Eksperimente sluit uitgebreide benckmarking in op sintetiese datastelle wat ontwerp is om verskeie aspekte van visuele genereringskwaliteit te toets, insluitend getrouheid, diversiteit en kontekstuele relevansie. Die resultate toon dat diffusiemodelle G ANs b eduidend o ortref i n d ie generering van hoër kwaliteit beelde en video’s met verbeterde kontekstuele akkuraatheid. Diffusiemodelle demonstreer uitstekende aanpasbaarheid by komplekse teksinsette en produseer visuele inhoud wat die beoogde inhoud meer akkuraat weerspieël. Nietemin word uitdagings met betrekking tot berekeningseffektiwiteit e n s kaalbaarheid g eïdentifiseer, wa t ge biede vi r ve rdere optimalisering voorstel. Hierdie bevindinge beklemtoon die potensiaal van diffusiemodelle om die veld van kreatiewe outomatisering te bevorder en mens-rekenaarinteraksie te verbeter. Die navorsing dra by tot die breër domeine van rekenaarvisie en natuurlike taalverwerking deur insigte te bied oor die praktiese toepassings en beperkings van diffusiemodelle vir visuele genereringstegnologieë. Dit vestig ’n grondslag vir toekomstige werk wat gemik is op die aanspreek van bestaande uitdagings en die volle benutting van die vermoëns van teks-na-beeld- en teksna-video-sintese. Masters 2025-06-12T07:05:19Z 2025-06-12T07:05:19Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132632 en Stellenbosch University xxi, 145 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD Masiya, Elvis Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title	Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_full	Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_fullStr	Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_full_unstemmed	Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_short	Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_sort	latent code manipulation for text to image and video synthesis evaluating generative networks
topic	Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD
url	https://scholar.sun.ac.za/handle/10019.1/132632
work_keys_str_mv	AT masiyaelvis latentcodemanipulationfortexttoimageandvideosynthesisevaluatinggenerativenetworks

Full Text Available

Latent code manipulation for text-to-image and video synthesis: evaluating generative networks

Similar Items