Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, sim...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2025
|
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613936560373760 |
|---|---|
| access_status_str | Open Access |
| author | Sarah-lee de Greeff |
| author_browse | Sarah-lee de Greeff |
| author_facet | Sarah-lee de Greeff |
| author_sort | Sarah-lee de Greeff |
| collection | Thesis |
| description | It is well-known that recent advancements in the domain of artificial intelligence and the increased
capability of computer hardware have significantly advanced the field of computer vision
– a field of study which enables computers to “see” and extract meaningful information from
visual inputs, similar to human perception.
A prominent application area within the domain of computer vision is scene understanding.
Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate
semantic information about scenes, allowing computers to understand relationships
between objects and their environments. Such computer vision tasks include object detection,
recognition, tracking, pose estimation, and contextual reasoning. Most computer vision
algorithms are deep learning based approaches but differ significantly in architecture. The computer
vision tasks investigated in this thesis utilise architectures consisting of backbone, neck,
and head architecture as well as alternative transformer architectures.
Although computer vision applications are diverse, there remain fields that have not yet fully
benefited from these developments. One such field is energy auditing – a process undertaken to
evaluate and improve the energy management of buildings.
In this thesis, a proof-of-concept framework is developed, capable of extracting information
regarding appliances present in a given building scene or environment by employing object
detection and object tracking tasks. The objective of the proposed framework is to train various
object detection models and recommend the best-performing model for further implementation,
in conjunction with object tracking models, to analyse video footage of environments needing to
be audited. The framework facilitates the processing of raw data, training of object detection
models with respect to the proposed data, and the deployment of the trained model with respect
to unseen video footage.
A structured literature review is conducted in this thesis to investigate the pertinent literature
related to computer vision applications within the energy auditing domain. The fundamentals of
deep learning, computer vision and energy auditing are also explored. The proposed framework is
first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning.
Subsequently, to further assess the framework’s performance and applicability, it is applied to
a novel case study dataset provided by an industry partner, containing images of appliances
common in an educational institution. The framework facilitates hyperparameter tuning to
determine the best parameters for each model being trained. The best-performing model, RTDeTR,
is then utilised to detect and track appliances of interest, providing information regarding
the number of appliances present. The information attained by the models is essential for the
environment’s energy consumption computation. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/131922 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:44:04.029Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/131922 A computer vision framework towards automated scene understanding & analysis Sarah-lee de Greeff It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation. 2025-04-23T14:25:09Z 2025-04-23T14:25:09Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/131922 en application/pdf |
| spellingShingle | Sarah-lee de Greeff A computer vision framework towards automated scene understanding & analysis |
| title | A computer vision framework towards automated scene understanding & analysis |
| title_full | A computer vision framework towards automated scene understanding & analysis |
| title_fullStr | A computer vision framework towards automated scene understanding & analysis |
| title_full_unstemmed | A computer vision framework towards automated scene understanding & analysis |
| title_short | A computer vision framework towards automated scene understanding & analysis |
| title_sort | computer vision framework towards automated scene understanding analysis |
| url | https://scholar.sun.ac.za/handle/10019.1/131922 |
| work_keys_str_mv | AT sarahleedegreeff acomputervisionframeworktowardsautomatedsceneunderstandinganalysis AT sarahleedegreeff computervisionframeworktowardsautomatedsceneunderstandinganalysis |