Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Latent code manipulation for text-to-image and video synthesis: evaluating generative networks

Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a

Saved in:
Bibliographic Details
Main Author: Masiya, Elvis
Other Authors: Ngxande, M.
Format: Thesis
Language:English
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614052711137280
access_status_str Open Access
author Masiya, Elvis
author2 Ngxande, M.
author_browse Masiya, Elvis
Ngxande, M.
author_facet Ngxande, M.
Masiya, Elvis
author_sort Masiya, Elvis
collection Thesis
dc_rights_str_mv Stellenbosch University
description Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a
format Thesis
id oai:scholar.sun.ac.za:10019.1/132632
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:45:54.519Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/132632 Latent code manipulation for text-to-image and video synthesis: evaluating generative networks Masiya, Elvis Ngxande, M. Stellenbosch University. Faculty of Science. Dept. of Computer Science. Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a Thesis (MSc)--Stellenbosch University, 2025. ENGLISH ABSTRACT: Text-to-image and text-to-video generation are increasingly important in artificial intelligence, enabling the creation of visual content from textual descriptions. However, existing techniques such as Generative Adversarial Networks (GANs) often face limitations in generating high-fidelity and contextually accurate images and videos. This research explores the application of diffusion models, specifically the state-of-the-art Stable Diffusion XL (SDXL), to overcome these challenges. A systematic comparative analysis is performed between diffusion models and GANs, using StyleGAN2 as the representative GAN model. The evaluation focuses on the ability of these models to adapt to variations in textual input and accurately represent contextual information. Experiments involve extensive benchmarking on synthetic datasets designed to test various aspects of visual generation quality, including fidelity, d iversity, a nd c ontextual relevance. The results indicate that diffusion models significantly outperform GANs in generating higher-quality images and videos with enhanced contextual accuracy. Diffusion models demonstrate superior adaptability to complex textual inputs and produce visuals that more accurately reflect the intended content. Nevertheless, challenges related to computational efficiency and scalability are identified, suggesting areas for further optimisation. These findings underscore the potential of diffusion models to advance the field o f creative automation and improve human-computer interaction. The research contributes to the broader domains of computer vision and natural language processing by providing insights into the practical applications and limitations of diffusion models for visual generation technologies. It establishes a foundation for future work aimed at addressing existing challenges and fully realising the capabilities of text-to-image and text-to-video synthesis. AFRIKAANSE OPSOMMING: Teks-na-beeld en teks-na-video-generering speel ’n toenemend belangrike rol in kunsmatige intelligensie, aangesien dit die skepping van visuele inhoud vanaf teksbeskrywings moontlik maak. Bestaande tegnieke, soos Generatiewe Adversariese Netwerke (GANs), het egter dikwels beperkings ten opsigte van die generering van hoëtrou- en kontekstueel akkurate beelde en video’s. Hierdie navorsing ondersoek die toepassing van diffusiemodelle, spesifiek die toonaangewende Stable Diffusion XL (SDXL), om hierdie uitdagings te oorkom. ’n Sistematiese vergelykende analise word uitgevoer tussen diffusiemodelle en GANs, met StyleGAN2 as die verteenwoordigende GAN-model. Die evaluering fokus op die vermoë van hierdie modelle om aan te pas by variasies in teksinvoer en om kontekstuele inligting akkuraat weer te gee. Eksperimente sluit uitgebreide benckmarking in op sintetiese datastelle wat ontwerp is om verskeie aspekte van visuele genereringskwaliteit te toets, insluitend getrouheid, diversiteit en kontekstuele relevansie. Die resultate toon dat diffusiemodelle G ANs b eduidend o ortref i n d ie generering van hoër kwaliteit beelde en video’s met verbeterde kontekstuele akkuraatheid. Diffusiemodelle demonstreer uitstekende aanpasbaarheid by komplekse teksinsette en produseer visuele inhoud wat die beoogde inhoud meer akkuraat weerspieël. Nietemin word uitdagings met betrekking tot berekeningseffektiwiteit e n s kaalbaarheid g eïdentifiseer, wa t ge biede vi r ve rdere optimalisering voorstel. Hierdie bevindinge beklemtoon die potensiaal van diffusiemodelle om die veld van kreatiewe outomatisering te bevorder en mens-rekenaarinteraksie te verbeter. Die navorsing dra by tot die breër domeine van rekenaarvisie en natuurlike taalverwerking deur insigte te bied oor die praktiese toepassings en beperkings van diffusiemodelle vir visuele genereringstegnologieë. Dit vestig ’n grondslag vir toekomstige werk wat gemik is op die aanspreek van bestaande uitdagings en die volle benutting van die vermoëns van teks-na-beeld- en teksna-video-sintese. Masters 2025-06-12T07:05:19Z 2025-06-12T07:05:19Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132632 en Stellenbosch University xxi, 145 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Distributed artificial intelligence
Generative adversarial networks (Computer networks)
Human-computer interaction
Computer vision
Natural language processing (Computer science)
UCTD
Masiya, Elvis
Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_full Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_fullStr Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_full_unstemmed Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_short Latent code manipulation for text-to-image and video synthesis: evaluating generative networks
title_sort latent code manipulation for text to image and video synthesis evaluating generative networks
topic Distributed artificial intelligence
Generative adversarial networks (Computer networks)
Human-computer interaction
Computer vision
Natural language processing (Computer science)
UCTD
url https://scholar.sun.ac.za/handle/10019.1/132632
work_keys_str_mv AT masiyaelvis latentcodemanipulationfortexttoimageandvideosynthesisevaluatinggenerativenetworks