Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A computer vision framework towards automated scene understanding & analysis

It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, sim...

Full description

Saved in:
Bibliographic Details
Main Author: Sarah-lee de Greeff
Format: Thesis
Language:English
Published: 2025
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613936560373760
access_status_str Open Access
author Sarah-lee de Greeff
author_browse Sarah-lee de Greeff
author_facet Sarah-lee de Greeff
author_sort Sarah-lee de Greeff
collection Thesis
description It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation.
format Thesis
id oai:scholar.sun.ac.za:10019.1/131922
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:44:04.029Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/131922 A computer vision framework towards automated scene understanding & analysis Sarah-lee de Greeff It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation. 2025-04-23T14:25:09Z 2025-04-23T14:25:09Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/131922 en application/pdf
spellingShingle Sarah-lee de Greeff
A computer vision framework towards automated scene understanding & analysis
title A computer vision framework towards automated scene understanding & analysis
title_full A computer vision framework towards automated scene understanding & analysis
title_fullStr A computer vision framework towards automated scene understanding & analysis
title_full_unstemmed A computer vision framework towards automated scene understanding & analysis
title_short A computer vision framework towards automated scene understanding & analysis
title_sort computer vision framework towards automated scene understanding analysis
url https://scholar.sun.ac.za/handle/10019.1/131922
work_keys_str_mv AT sarahleedegreeff acomputervisionframeworktowardsautomatedsceneunderstandinganalysis
AT sarahleedegreeff computervisionframeworktowardsautomatedsceneunderstandinganalysis