Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms

Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on...

Full description

Saved in:
Bibliographic Details
Main Author: Fitzhenry, Charles
Other Authors: Marais, Patrick
Format: Thesis
Language:English
Published: Department of Computer Science 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613295206203392
access_status_str Open Access
author Fitzhenry, Charles
author2 Marais, Patrick
author_browse Fitzhenry, Charles
Marais, Patrick
author_facet Marais, Patrick
Fitzhenry, Charles
author_sort Fitzhenry, Charles
collection Thesis
description Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on boards/screens. Unfortunately, at 4K resolution, a typical 45- minute lecture video easily exceeds 2GB. Many video files of this size place a financial burden on institutions and students, especially in developing countries where financial resources are limited. Institutions require costly high-end equipment to capture, store and distribute this ever-increasing collection of videos. Students require a fast internet connection with a large data quota for off-campus viewing, which can be too expensive for many, especially if they use mobile data. This project designs and implements a low-cost presenter and writing detection front-end that can integrate with an external Virtual Cinematographer (VC). Gesture detection was also explored; however, the frame differencing approach used for presenter detection was not sufficiently robust for gesture detection. Our front-end is carefully designed to run on commodity computers without requiring expensive Graphics Processing Units (GPU) or servers. An external VC can use our contextual information to segment a smaller cropping window from the 4K frame, only containing the presenter and relevant boards, drastically reducing the file size of the resultant videos while preserving writing clarity. The software developed as part of this project will be available as open source. Our results show that the front-end module is fit for purpose and sufficiently robust across several challenging lecture venue types. On average, a 2-minute video clip is processed by the front-end in under 60 seconds (or approximately half of the input video duration). The majority (89%) of this time is used for reading and decoding frames from storage. Additionally, our low-cost presenter detection achieves an overall F1-Score of 0.76, while our writing detection achieves an overall F1-Score of 0.55. We also demonstrate a mean reduction of 81.3% in file size from the original 4K video to a cropped 720p video when using our front-end in a full pipeline with an external VC.
format Thesis
id oai:open.uct.ac.za:11427/37949
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:51.607Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/37949 Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms Fitzhenry, Charles Marais, Patrick Marquard, Stephen lecture recording educational institutions online courses remote learning programs Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on boards/screens. Unfortunately, at 4K resolution, a typical 45- minute lecture video easily exceeds 2GB. Many video files of this size place a financial burden on institutions and students, especially in developing countries where financial resources are limited. Institutions require costly high-end equipment to capture, store and distribute this ever-increasing collection of videos. Students require a fast internet connection with a large data quota for off-campus viewing, which can be too expensive for many, especially if they use mobile data. This project designs and implements a low-cost presenter and writing detection front-end that can integrate with an external Virtual Cinematographer (VC). Gesture detection was also explored; however, the frame differencing approach used for presenter detection was not sufficiently robust for gesture detection. Our front-end is carefully designed to run on commodity computers without requiring expensive Graphics Processing Units (GPU) or servers. An external VC can use our contextual information to segment a smaller cropping window from the 4K frame, only containing the presenter and relevant boards, drastically reducing the file size of the resultant videos while preserving writing clarity. The software developed as part of this project will be available as open source. Our results show that the front-end module is fit for purpose and sufficiently robust across several challenging lecture venue types. On average, a 2-minute video clip is processed by the front-end in under 60 seconds (or approximately half of the input video duration). The majority (89%) of this time is used for reading and decoding frames from storage. Additionally, our low-cost presenter detection achieves an overall F1-Score of 0.76, while our writing detection achieves an overall F1-Score of 0.55. We also demonstrate a mean reduction of 81.3% in file size from the original 4K video to a cropped 720p video when using our front-end in a full pipeline with an external VC. 2023-06-10T20:23:04Z 2023-06-10T20:23:04Z 2023 2023-06-10T19:34:45Z Master Thesis Masters MSc http://hdl.handle.net/11427/37949 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle lecture recording
educational institutions
online courses
remote learning programs
Fitzhenry, Charles
Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
thesis_degree_str Master's
title Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_full Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_fullStr Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_full_unstemmed Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_short Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_sort fast presenter tracking for 4k lecture videos using computationally inexpensive algorithms
topic lecture recording
educational institutions
online courses
remote learning programs
url http://hdl.handle.net/11427/37949
work_keys_str_mv AT fitzhenrycharles fastpresentertrackingfor4klecturevideosusingcomputationallyinexpensivealgorithms