Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A computer vision framework towards automated scene understanding & analysis

It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, sim...

Full description

Saved in:

Bibliographic Details
Main Author:	Sarah-lee de Greeff
Format:	Thesis
Language:	English
Published:	2025
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613936560373760
access_status_str	Open Access
author	Sarah-lee de Greeff
author_browse	Sarah-lee de Greeff
author_facet	Sarah-lee de Greeff
author_sort	Sarah-lee de Greeff
collection	Thesis
description	It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/131922
institution	Stellenbosch University (South Africa)
language	English
last_indexed	2026-06-10T12:44:04.029Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/131922 A computer vision framework towards automated scene understanding & analysis Sarah-lee de Greeff It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation. 2025-04-23T14:25:09Z 2025-04-23T14:25:09Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/131922 en application/pdf
spellingShingle	Sarah-lee de Greeff A computer vision framework towards automated scene understanding & analysis
title	A computer vision framework towards automated scene understanding & analysis
title_full	A computer vision framework towards automated scene understanding & analysis
title_fullStr	A computer vision framework towards automated scene understanding & analysis
title_full_unstemmed	A computer vision framework towards automated scene understanding & analysis
title_short	A computer vision framework towards automated scene understanding & analysis
title_sort	computer vision framework towards automated scene understanding analysis
url	https://scholar.sun.ac.za/handle/10019.1/131922
work_keys_str_mv	AT sarahleedegreeff acomputervisionframeworktowardsautomatedsceneunderstandinganalysis AT sarahleedegreeff computervisionframeworktowardsautomatedsceneunderstandinganalysis

Full Text Available

A computer vision framework towards automated scene understanding & analysis

Similar Items