Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms

Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on...

Full description

Saved in:

Bibliographic Details
Main Author:	Fitzhenry, Charles
Other Authors:	Marais, Patrick
Format:	Thesis
Language:	English
Published:	Department of Computer Science 2023
Subjects:	lecture recording educational institutions online courses remote learning programs
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613295206203392
access_status_str	Open Access
author	Fitzhenry, Charles
author2	Marais, Patrick
author_browse	Fitzhenry, Charles Marais, Patrick
author_facet	Marais, Patrick Fitzhenry, Charles
author_sort	Fitzhenry, Charles
collection	Thesis
description	Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on boards/screens. Unfortunately, at 4K resolution, a typical 45- minute lecture video easily exceeds 2GB. Many video files of this size place a financial burden on institutions and students, especially in developing countries where financial resources are limited. Institutions require costly high-end equipment to capture, store and distribute this ever-increasing collection of videos. Students require a fast internet connection with a large data quota for off-campus viewing, which can be too expensive for many, especially if they use mobile data. This project designs and implements a low-cost presenter and writing detection front-end that can integrate with an external Virtual Cinematographer (VC). Gesture detection was also explored; however, the frame differencing approach used for presenter detection was not sufficiently robust for gesture detection. Our front-end is carefully designed to run on commodity computers without requiring expensive Graphics Processing Units (GPU) or servers. An external VC can use our contextual information to segment a smaller cropping window from the 4K frame, only containing the presenter and relevant boards, drastically reducing the file size of the resultant videos while preserving writing clarity. The software developed as part of this project will be available as open source. Our results show that the front-end module is fit for purpose and sufficiently robust across several challenging lecture venue types. On average, a 2-minute video clip is processed by the front-end in under 60 seconds (or approximately half of the input video duration). The majority (89%) of this time is used for reading and decoding frames from storage. Additionally, our low-cost presenter detection achieves an overall F1-Score of 0.76, while our writing detection achieves an overall F1-Score of 0.55. We also demonstrate a mean reduction of 81.3% in file size from the original 4K video to a cropped 720p video when using our front-end in a full pipeline with an external VC.
format	Thesis
id	oai:open.uct.ac.za:11427/37949
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:33:51.607Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	Department of Computer Science
publisherStr	Department of Computer Science
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/37949 Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms Fitzhenry, Charles Marais, Patrick Marquard, Stephen lecture recording educational institutions online courses remote learning programs Lecture recording has become an essential tool for educational institutions to enhance the student learning experience and offer online courses for remote learning programs. Highresolution 4K cameras have gained popularity in these systems due to their affordability and clarity of written content on boards/screens. Unfortunately, at 4K resolution, a typical 45- minute lecture video easily exceeds 2GB. Many video files of this size place a financial burden on institutions and students, especially in developing countries where financial resources are limited. Institutions require costly high-end equipment to capture, store and distribute this ever-increasing collection of videos. Students require a fast internet connection with a large data quota for off-campus viewing, which can be too expensive for many, especially if they use mobile data. This project designs and implements a low-cost presenter and writing detection front-end that can integrate with an external Virtual Cinematographer (VC). Gesture detection was also explored; however, the frame differencing approach used for presenter detection was not sufficiently robust for gesture detection. Our front-end is carefully designed to run on commodity computers without requiring expensive Graphics Processing Units (GPU) or servers. An external VC can use our contextual information to segment a smaller cropping window from the 4K frame, only containing the presenter and relevant boards, drastically reducing the file size of the resultant videos while preserving writing clarity. The software developed as part of this project will be available as open source. Our results show that the front-end module is fit for purpose and sufficiently robust across several challenging lecture venue types. On average, a 2-minute video clip is processed by the front-end in under 60 seconds (or approximately half of the input video duration). The majority (89%) of this time is used for reading and decoding frames from storage. Additionally, our low-cost presenter detection achieves an overall F1-Score of 0.76, while our writing detection achieves an overall F1-Score of 0.55. We also demonstrate a mean reduction of 81.3% in file size from the original 4K video to a cropped 720p video when using our front-end in a full pipeline with an external VC. 2023-06-10T20:23:04Z 2023-06-10T20:23:04Z 2023 2023-06-10T19:34:45Z Master Thesis Masters MSc http://hdl.handle.net/11427/37949 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle	lecture recording educational institutions online courses remote learning programs Fitzhenry, Charles Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
thesis_degree_str	Master's
title	Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_full	Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_fullStr	Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_full_unstemmed	Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_short	Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms
title_sort	fast presenter tracking for 4k lecture videos using computationally inexpensive algorithms
topic	lecture recording educational institutions online courses remote learning programs
url	http://hdl.handle.net/11427/37949
work_keys_str_mv	AT fitzhenrycharles fastpresentertrackingfor4klecturevideosusingcomputationallyinexpensivealgorithms

Full Text Available

Fast presenter tracking for 4K lecture videos using computationally inexpensive algorithms

Similar Items