Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Video classification using deep learning

Thesis (MSc)--Stellenbosch University, 2020.

Saved in:
Bibliographic Details
Main Author: Newman, Gregory
Other Authors: Brink, Willie
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University. 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613791081988096
access_status_str Open Access
author Newman, Gregory
author2 Brink, Willie
author_browse Brink, Willie
Newman, Gregory
author_facet Brink, Willie
Newman, Gregory
author_sort Newman, Gregory
collection Thesis
dc_rights_str_mv Stellenbosch University.
description Thesis (MSc)--Stellenbosch University, 2020.
format Thesis
id oai:scholar.sun.ac.za:10019.1/108279
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:41:45.229Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Stellenbosch : Stellenbosch University.
publisherStr Stellenbosch : Stellenbosch University.
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/108279 Video classification using deep learning Newman, Gregory Brink, Willie Herbst, B. M. Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics). Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD Thesis (MSc)--Stellenbosch University, 2020. ENGLISH ABSTRACT: To help analyse, classify, and monitor video data we need scalable algorithms that can handle video sequences of various lengths. Existing approaches tend to be both computationally expensive and restricted to classifying sequences of a fixed length, making them ill-suited for real-world use. For video classification we explore using convolutional neural networks to learn the spatial features relevant to each frame of a video, and several transfer learning approaches to leverage the InceptionV3 architecture with weights pretrained on ImageNet. With Grad-CAM we show that CNN models alone primarily rely on detecting class specific objects within images, and perform poorly on classes that have similar spatial features to other classes. To learn the temporal features of a video and to accommodate variable length sequences, we train LSTM and GRU networks. We show that without downsampling the frames the parameter space of the networks explodes, quickly becoming computationally infeasible to train over, but that downsampling techniques cause too much information loss. We also find comparable performance between the two types of recurrent networks, despite the GRU network having fewer parameters. We go on to propose an architecture that uses InceptionV3, with pretrained weights, to learn representations of the frames to be used when training a GRU network. After experimenting with different transfer learning approaches we show that we can achieve a top-5 classification accuracy of 91.8% on the UCF- 101 test set, which is 6.2% less than the state-of-the-art while having half as many parameters and an architecture that can accommodate variable length inputs. AFRIKAANSE OPSOMMING: Om die analise, klassifisering en monitering van video’s met veranderlike lengtes te verbeter, het ons algoritmes nodig wat kan skaleer. Bestaande benaderings is tipies berekeningsintensief en beperk tot die klassifisering van video’s van vaste lengtes, wat hulle ongeskik maak vir gebruik in die regte wêreld. Ons ondersoek die gebruik van konvolusionele neurale netwerke vir die klassifisering van video’s, om ruimtelike kenmerke van elke videoraam te leer. Ons kyk ook na verskeie benaderings van oordragsleer, om voordeel te trek uit die InceptionV3-argitektuur se gewigte wat vooraf op ImageNet afgerig is. Ons gebruik Grad-CAM om te wys dat konvolusionele modelle op hul eie hoofsaaklik op die opsporing van klas-spesifieke voorwerpe in beelde fokus, en sleg vaar op klasse waar die ruimtelike kenmerke soortgelyk is aan dié van ander klasse. LSTM en GRU netwerke word afgerig om tyd-afhanklike kenmerke te leer, en om die veranderlike lengtes van die video’s te akkommodeer. Ons wys dat sonder om die prente te reduseer, ontplof die parameter-ruimte van die netwerke, en maak dat praktiese afrigting vinnig onmoonlik word. Die reduksie-tegnieke veroorsaak wel te veel dataverlies. Ons vind vergelykbare prestasies tussen die twee tipes terugkerende netwerke, ten spyte van die feit dat die GRU netwerk minder parameters het. Ons stel dan ook ’n argitektuur voor wat die InceptionV3 met vooraf-afgerigte gewigte gebruik om voorstellings van die rame te leer, en dan daardie voorstellings gebruik om die GRU netwerk af te rig. Eksperimentering met verskillende oordragsleer-tegnieke wys dat ons ’n top-5 akkuraatheid van 91.8% op die UCF-101 toetsstel kan behaal. Hierdie akkuraatheid is 6.2% minder as die huidige beste metode, maar benodig omtrent die helfte soveel parameters en kan video’s van verandelike lengtes hanteer. Masters 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-03 Thesis http://hdl.handle.net/10019.1/108279 en_ZA Stellenbosch University. vi, 51 pages : illustrations application/pdf Stellenbosch : Stellenbosch University.
spellingShingle Videos -- Classification
Machine learning
Neural networks (Computer Science) -- Scalability
Computer vision
Deep learning
UCTD
Newman, Gregory
Video classification using deep learning
title Video classification using deep learning
title_full Video classification using deep learning
title_fullStr Video classification using deep learning
title_full_unstemmed Video classification using deep learning
title_short Video classification using deep learning
title_sort video classification using deep learning
topic Videos -- Classification
Machine learning
Neural networks (Computer Science) -- Scalability
Computer vision
Deep learning
UCTD
url http://hdl.handle.net/10019.1/108279
work_keys_str_mv AT newmangregory videoclassificationusingdeeplearning