Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Video classification using deep learning

Thesis (MSc)--Stellenbosch University, 2020.

Saved in:

Bibliographic Details
Main Author:	Newman, Gregory
Other Authors:	Brink, Willie
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University. 2020
Subjects:	Videos > Classification Machine learning Neural networks (Computer Science) > Scalability Computer vision Deep learning UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613791081988096
access_status_str	Open Access
author	Newman, Gregory
author2	Brink, Willie
author_browse	Brink, Willie Newman, Gregory
author_facet	Brink, Willie Newman, Gregory
author_sort	Newman, Gregory
collection	Thesis
dc_rights_str_mv	Stellenbosch University.
description	Thesis (MSc)--Stellenbosch University, 2020.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/108279
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:41:45.229Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2020
publishDateRange	2020
publishDateSort	2020
publisher	Stellenbosch : Stellenbosch University.
publisherStr	Stellenbosch : Stellenbosch University.
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/108279 Video classification using deep learning Newman, Gregory Brink, Willie Herbst, B. M. Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics). Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD Thesis (MSc)--Stellenbosch University, 2020. ENGLISH ABSTRACT: To help analyse, classify, and monitor video data we need scalable algorithms that can handle video sequences of various lengths. Existing approaches tend to be both computationally expensive and restricted to classifying sequences of a fixed length, making them ill-suited for real-world use. For video classification we explore using convolutional neural networks to learn the spatial features relevant to each frame of a video, and several transfer learning approaches to leverage the InceptionV3 architecture with weights pretrained on ImageNet. With Grad-CAM we show that CNN models alone primarily rely on detecting class specific objects within images, and perform poorly on classes that have similar spatial features to other classes. To learn the temporal features of a video and to accommodate variable length sequences, we train LSTM and GRU networks. We show that without downsampling the frames the parameter space of the networks explodes, quickly becoming computationally infeasible to train over, but that downsampling techniques cause too much information loss. We also find comparable performance between the two types of recurrent networks, despite the GRU network having fewer parameters. We go on to propose an architecture that uses InceptionV3, with pretrained weights, to learn representations of the frames to be used when training a GRU network. After experimenting with different transfer learning approaches we show that we can achieve a top-5 classification accuracy of 91.8% on the UCF- 101 test set, which is 6.2% less than the state-of-the-art while having half as many parameters and an architecture that can accommodate variable length inputs. AFRIKAANSE OPSOMMING: Om die analise, klassifisering en monitering van video’s met veranderlike lengtes te verbeter, het ons algoritmes nodig wat kan skaleer. Bestaande benaderings is tipies berekeningsintensief en beperk tot die klassifisering van video’s van vaste lengtes, wat hulle ongeskik maak vir gebruik in die regte wêreld. Ons ondersoek die gebruik van konvolusionele neurale netwerke vir die klassifisering van video’s, om ruimtelike kenmerke van elke videoraam te leer. Ons kyk ook na verskeie benaderings van oordragsleer, om voordeel te trek uit die InceptionV3-argitektuur se gewigte wat vooraf op ImageNet afgerig is. Ons gebruik Grad-CAM om te wys dat konvolusionele modelle op hul eie hoofsaaklik op die opsporing van klas-spesifieke voorwerpe in beelde fokus, en sleg vaar op klasse waar die ruimtelike kenmerke soortgelyk is aan dié van ander klasse. LSTM en GRU netwerke word afgerig om tyd-afhanklike kenmerke te leer, en om die veranderlike lengtes van die video’s te akkommodeer. Ons wys dat sonder om die prente te reduseer, ontplof die parameter-ruimte van die netwerke, en maak dat praktiese afrigting vinnig onmoonlik word. Die reduksie-tegnieke veroorsaak wel te veel dataverlies. Ons vind vergelykbare prestasies tussen die twee tipes terugkerende netwerke, ten spyte van die feit dat die GRU netwerk minder parameters het. Ons stel dan ook ’n argitektuur voor wat die InceptionV3 met vooraf-afgerigte gewigte gebruik om voorstellings van die rame te leer, en dan daardie voorstellings gebruik om die GRU netwerk af te rig. Eksperimentering met verskillende oordragsleer-tegnieke wys dat ons ’n top-5 akkuraatheid van 91.8% op die UCF-101 toetsstel kan behaal. Hierdie akkuraatheid is 6.2% minder as die huidige beste metode, maar benodig omtrent die helfte soveel parameters en kan video’s van verandelike lengtes hanteer. Masters 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-03 Thesis http://hdl.handle.net/10019.1/108279 en_ZA Stellenbosch University. vi, 51 pages : illustrations application/pdf Stellenbosch : Stellenbosch University.
spellingShingle	Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD Newman, Gregory Video classification using deep learning
title	Video classification using deep learning
title_full	Video classification using deep learning
title_fullStr	Video classification using deep learning
title_full_unstemmed	Video classification using deep learning
title_short	Video classification using deep learning
title_sort	video classification using deep learning
topic	Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD
url	http://hdl.handle.net/10019.1/108279
work_keys_str_mv	AT newmangregory videoclassificationusingdeeplearning

Full Text Available

Video classification using deep learning

Similar Items