Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A computer vision framework towards automated scene understanding & analysis

de Greeff, S. 2025. A computer vision framework towards automated scene understanding & analysis. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8523bc56-ade0-43cb-9a6c-6b55d015c216

Saved in:
Bibliographic Details
Main Author: De Greeff, Sarah-lee
Other Authors: Grobler, Jacomine
Format: Thesis
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613942981853184
access_status_str Open Access
author De Greeff, Sarah-lee
author2 Grobler, Jacomine
author_browse De Greeff, Sarah-lee
Grobler, Jacomine
author_facet Grobler, Jacomine
De Greeff, Sarah-lee
author_sort De Greeff, Sarah-lee
collection Thesis
dc_rights_str_mv Stellenbosch University
description de Greeff, S. 2025. A computer vision framework towards automated scene understanding & analysis. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8523bc56-ade0-43cb-9a6c-6b55d015c216
format Thesis
id oai:scholar.sun.ac.za:10019.1/132130
institution Stellenbosch University (South Africa)
last_indexed 2026-06-10T12:44:09.875Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/132130 A computer vision framework towards automated scene understanding & analysis De Greeff, Sarah-lee Grobler, Jacomine Samuels, J. A. Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Computer vision -- Industrial applications Pattern recognition systems Tracking (Engineering) Artificial intelligence -- Data processing UCTD de Greeff, S. 2025. A computer vision framework towards automated scene understanding & analysis. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8523bc56-ade0-43cb-9a6c-6b55d015c216 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation. Furthermore, the result is used to compute the total energy consumption of the space captured with six various videos. The results are validated by a subject matter expert, demonstrating the practical utility of the framework in real-world energy auditing applications. AFRIKAANSE OPSOMMING: Dit is algemeen bekend dat onlangse vooruitgang in die gebied van kunsmatige intelligensie en die verhoogde kapasiteit van rekenaarhardeware die veld van rekenaarvisie aansienlik bevorder het — ’n studieveld wat rekenaars in staat stel om te ”sien” en betekenisvolle inligting uit visuele insette te onttrek, soortgelyk aan menslike persepsie. ’n Prominente toepassingsgebied binne die domein van rekenaarvisie is toneelbegrip. Verskeie kragtige benaderings tot toneelbegrip maak gebruik van rekenaarvisie-take om semantiese inligting oor tonele te onttrek, wat rekenaars in staat stel om verhoudings tussen voorwerpe en hul omgewings te verstaan. Sulke rekenaarvisie-take sluit in voorwerpopsporing, herkenning, nasporing, posisie-skatting en kontekstuele redenasie. Die meeste rekenaarvisie-algoritmes is diep-leer gebaseerde benaderings, maar verskil aansienlik in argitektuur. Die rekenaarvisie-take wat in hierdie tesis ondersoek word, maak gebruik van argitekture wat bestaan uit rugsteen, nek en kop argitektuur, sowel as alternatiewe transformer argitekture. Alhoewel rekenaarvisie-toepassings uiteenlopend is, bly daar steeds velde wat nog nie ten volle voordeel getrek het uit hierdie ontwikkelings nie. Een so ’n veld is energie-oudits — ’n proses wat onderneem word om die energiebestuur van geboue te evalueer en te verbeter. In hierdie tesis word ’n bewys-van-konsep raamwerk ontwikkel wat inligting kan onttrek rakende toestelle wat in ’n gegewe geboutoneel of omgewing teenwoordig is deur gebruik te maak van voorwerpopsporing en voorwerpnasporingstakke. Die doel van die voorgestelde raamwerk is om verskeie voorwerpopsporingmodelle op te lei en die beste presterende model aan te beveel vir verdere implementering, in samewerking met voorwerpnasporingsmodelle, om videomateriaal van omgewings wat geouditeer moet word, te ontleed. Die raamwerk fasiliteer die verwerking van rou data, opleiding van voorwerpopsporingmodelle met betrekking tot die voorgestelde data, en die ontplooiing van die opgeleide model met betrekking tot onsigbare videomateriaal. ’n Gestruktureerde literatuuroorsig word in hierdie tesis uitgevoer om die relevante literatuur rakende rekenaarvisie-toepassings binne die energie-ouditsdomein te ondersoek. Die grondbeginsels van diep leer, rekenaarvisie en energie-oudits word ook ondersoek. Die voorgestelde raamwerk word eers toegepas op ’n substel van ’n algemeen aanvaarde maatstawedataset om die korrekte werking daarvan te verifieer. Vervolgens, om die raamwerk se prestasie en toepaslikheid verder te assesseer, word dit toegepas op ’n nuwe gevallestudiedataset wat deur ’n industrievennoot voorsien word, wat beelde bevat van toestelle wat algemeen voorkom in ’n opvoedkundige instelling. Die raamwerk fasiliteer hiperparameter-afstemming om die beste parameters vir elke model wat opgelei word, te bepaal. Die beste presterende model word dan gebruik om toestelle van belang op te spoor en na te spoor, wat inligting verskaf oor die aantal toestelle wat teenwoordig is. Die inligting wat deur die modelle verkry word, is noodsaaklik vir die berekening van die omgewing se energieverbruik. Verder word die resultaat gebruik om die totale energieverbruik van die ruimte te bereken, waar die resultate deur ’n vakkenner gevalideer word, wat die praktiese nut van die raamwerk in werklike energie-oudits-toepassings demonstreer. Masters 2025-05-27T08:09:07Z 2025-05-27T08:09:07Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132130 Stellenbosch University xxii, 166 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Computer vision -- Industrial applications
Pattern recognition systems
Tracking (Engineering)
Artificial intelligence -- Data processing
UCTD
De Greeff, Sarah-lee
A computer vision framework towards automated scene understanding & analysis
title A computer vision framework towards automated scene understanding & analysis
title_full A computer vision framework towards automated scene understanding & analysis
title_fullStr A computer vision framework towards automated scene understanding & analysis
title_full_unstemmed A computer vision framework towards automated scene understanding & analysis
title_short A computer vision framework towards automated scene understanding & analysis
title_sort computer vision framework towards automated scene understanding analysis
topic Computer vision -- Industrial applications
Pattern recognition systems
Tracking (Engineering)
Artificial intelligence -- Data processing
UCTD
url https://scholar.sun.ac.za/handle/10019.1/132130
work_keys_str_mv AT degreeffsarahlee acomputervisionframeworktowardsautomatedsceneunderstandinganalysis
AT degreeffsarahlee computervisionframeworktowardsautomatedsceneunderstandinganalysis