Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614052711137280 |
|---|---|
| access_status_str | Open Access |
| author | Masiya, Elvis |
| author2 | Ngxande, M. |
| author_browse | Masiya, Elvis Ngxande, M. |
| author_facet | Ngxande, M. Masiya, Elvis |
| author_sort | Masiya, Elvis |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/132632 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:45:54.519Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/132632 Latent code manipulation for text-to-image and video synthesis: evaluating generative networks Masiya, Elvis Ngxande, M. Stellenbosch University. Faculty of Science. Dept. of Computer Science. Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD Masiya, E. 2025. Latent Code Manipulation for Text-to-Image and Video Synthesis: Evaluating Generative Networks. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c8269505-2925-45f9-bb22-359d6ac30e7a Thesis (MSc)--Stellenbosch University, 2025. ENGLISH ABSTRACT: Text-to-image and text-to-video generation are increasingly important in artificial intelligence, enabling the creation of visual content from textual descriptions. However, existing techniques such as Generative Adversarial Networks (GANs) often face limitations in generating high-fidelity and contextually accurate images and videos. This research explores the application of diffusion models, specifically the state-of-the-art Stable Diffusion XL (SDXL), to overcome these challenges. A systematic comparative analysis is performed between diffusion models and GANs, using StyleGAN2 as the representative GAN model. The evaluation focuses on the ability of these models to adapt to variations in textual input and accurately represent contextual information. Experiments involve extensive benchmarking on synthetic datasets designed to test various aspects of visual generation quality, including fidelity, d iversity, a nd c ontextual relevance. The results indicate that diffusion models significantly outperform GANs in generating higher-quality images and videos with enhanced contextual accuracy. Diffusion models demonstrate superior adaptability to complex textual inputs and produce visuals that more accurately reflect the intended content. Nevertheless, challenges related to computational efficiency and scalability are identified, suggesting areas for further optimisation. These findings underscore the potential of diffusion models to advance the field o f creative automation and improve human-computer interaction. The research contributes to the broader domains of computer vision and natural language processing by providing insights into the practical applications and limitations of diffusion models for visual generation technologies. It establishes a foundation for future work aimed at addressing existing challenges and fully realising the capabilities of text-to-image and text-to-video synthesis. AFRIKAANSE OPSOMMING: Teks-na-beeld en teks-na-video-generering speel ’n toenemend belangrike rol in kunsmatige intelligensie, aangesien dit die skepping van visuele inhoud vanaf teksbeskrywings moontlik maak. Bestaande tegnieke, soos Generatiewe Adversariese Netwerke (GANs), het egter dikwels beperkings ten opsigte van die generering van hoëtrou- en kontekstueel akkurate beelde en video’s. Hierdie navorsing ondersoek die toepassing van diffusiemodelle, spesifiek die toonaangewende Stable Diffusion XL (SDXL), om hierdie uitdagings te oorkom. ’n Sistematiese vergelykende analise word uitgevoer tussen diffusiemodelle en GANs, met StyleGAN2 as die verteenwoordigende GAN-model. Die evaluering fokus op die vermoë van hierdie modelle om aan te pas by variasies in teksinvoer en om kontekstuele inligting akkuraat weer te gee. Eksperimente sluit uitgebreide benckmarking in op sintetiese datastelle wat ontwerp is om verskeie aspekte van visuele genereringskwaliteit te toets, insluitend getrouheid, diversiteit en kontekstuele relevansie. Die resultate toon dat diffusiemodelle G ANs b eduidend o ortref i n d ie generering van hoër kwaliteit beelde en video’s met verbeterde kontekstuele akkuraatheid. Diffusiemodelle demonstreer uitstekende aanpasbaarheid by komplekse teksinsette en produseer visuele inhoud wat die beoogde inhoud meer akkuraat weerspieël. Nietemin word uitdagings met betrekking tot berekeningseffektiwiteit e n s kaalbaarheid g eïdentifiseer, wa t ge biede vi r ve rdere optimalisering voorstel. Hierdie bevindinge beklemtoon die potensiaal van diffusiemodelle om die veld van kreatiewe outomatisering te bevorder en mens-rekenaarinteraksie te verbeter. Die navorsing dra by tot die breër domeine van rekenaarvisie en natuurlike taalverwerking deur insigte te bied oor die praktiese toepassings en beperkings van diffusiemodelle vir visuele genereringstegnologieë. Dit vestig ’n grondslag vir toekomstige werk wat gemik is op die aanspreek van bestaande uitdagings en die volle benutting van die vermoëns van teks-na-beeld- en teksna-video-sintese. Masters 2025-06-12T07:05:19Z 2025-06-12T07:05:19Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132632 en Stellenbosch University xxi, 145 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD Masiya, Elvis Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title | Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title_full | Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title_fullStr | Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title_full_unstemmed | Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title_short | Latent code manipulation for text-to-image and video synthesis: evaluating generative networks |
| title_sort | latent code manipulation for text to image and video synthesis evaluating generative networks |
| topic | Distributed artificial intelligence Generative adversarial networks (Computer networks) Human-computer interaction Computer vision Natural language processing (Computer science) UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/132632 |
| work_keys_str_mv | AT masiyaelvis latentcodemanipulationfortexttoimageandvideosynthesisevaluatinggenerativenetworks |