Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MSc)--Stellenbosch University, 2020.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University.
2020
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613791081988096 |
|---|---|
| access_status_str | Open Access |
| author | Newman, Gregory |
| author2 | Brink, Willie |
| author_browse | Brink, Willie Newman, Gregory |
| author_facet | Brink, Willie Newman, Gregory |
| author_sort | Newman, Gregory |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University. |
| description | Thesis (MSc)--Stellenbosch University, 2020. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/108279 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:41:45.229Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2020 |
| publishDateRange | 2020 |
| publishDateSort | 2020 |
| publisher | Stellenbosch : Stellenbosch University. |
| publisherStr | Stellenbosch : Stellenbosch University. |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/108279 Video classification using deep learning Newman, Gregory Brink, Willie Herbst, B. M. Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics). Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD Thesis (MSc)--Stellenbosch University, 2020. ENGLISH ABSTRACT: To help analyse, classify, and monitor video data we need scalable algorithms that can handle video sequences of various lengths. Existing approaches tend to be both computationally expensive and restricted to classifying sequences of a fixed length, making them ill-suited for real-world use. For video classification we explore using convolutional neural networks to learn the spatial features relevant to each frame of a video, and several transfer learning approaches to leverage the InceptionV3 architecture with weights pretrained on ImageNet. With Grad-CAM we show that CNN models alone primarily rely on detecting class specific objects within images, and perform poorly on classes that have similar spatial features to other classes. To learn the temporal features of a video and to accommodate variable length sequences, we train LSTM and GRU networks. We show that without downsampling the frames the parameter space of the networks explodes, quickly becoming computationally infeasible to train over, but that downsampling techniques cause too much information loss. We also find comparable performance between the two types of recurrent networks, despite the GRU network having fewer parameters. We go on to propose an architecture that uses InceptionV3, with pretrained weights, to learn representations of the frames to be used when training a GRU network. After experimenting with different transfer learning approaches we show that we can achieve a top-5 classification accuracy of 91.8% on the UCF- 101 test set, which is 6.2% less than the state-of-the-art while having half as many parameters and an architecture that can accommodate variable length inputs. AFRIKAANSE OPSOMMING: Om die analise, klassifisering en monitering van video’s met veranderlike lengtes te verbeter, het ons algoritmes nodig wat kan skaleer. Bestaande benaderings is tipies berekeningsintensief en beperk tot die klassifisering van video’s van vaste lengtes, wat hulle ongeskik maak vir gebruik in die regte wêreld. Ons ondersoek die gebruik van konvolusionele neurale netwerke vir die klassifisering van video’s, om ruimtelike kenmerke van elke videoraam te leer. Ons kyk ook na verskeie benaderings van oordragsleer, om voordeel te trek uit die InceptionV3-argitektuur se gewigte wat vooraf op ImageNet afgerig is. Ons gebruik Grad-CAM om te wys dat konvolusionele modelle op hul eie hoofsaaklik op die opsporing van klas-spesifieke voorwerpe in beelde fokus, en sleg vaar op klasse waar die ruimtelike kenmerke soortgelyk is aan dié van ander klasse. LSTM en GRU netwerke word afgerig om tyd-afhanklike kenmerke te leer, en om die veranderlike lengtes van die video’s te akkommodeer. Ons wys dat sonder om die prente te reduseer, ontplof die parameter-ruimte van die netwerke, en maak dat praktiese afrigting vinnig onmoonlik word. Die reduksie-tegnieke veroorsaak wel te veel dataverlies. Ons vind vergelykbare prestasies tussen die twee tipes terugkerende netwerke, ten spyte van die feit dat die GRU netwerk minder parameters het. Ons stel dan ook ’n argitektuur voor wat die InceptionV3 met vooraf-afgerigte gewigte gebruik om voorstellings van die rame te leer, en dan daardie voorstellings gebruik om die GRU netwerk af te rig. Eksperimentering met verskillende oordragsleer-tegnieke wys dat ons ’n top-5 akkuraatheid van 91.8% op die UCF-101 toetsstel kan behaal. Hierdie akkuraatheid is 6.2% minder as die huidige beste metode, maar benodig omtrent die helfte soveel parameters en kan video’s van verandelike lengtes hanteer. Masters 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-02-26T11:38:26Z 2020-04-28T12:29:42Z 2020-03 Thesis http://hdl.handle.net/10019.1/108279 en_ZA Stellenbosch University. vi, 51 pages : illustrations application/pdf Stellenbosch : Stellenbosch University. |
| spellingShingle | Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD Newman, Gregory Video classification using deep learning |
| title | Video classification using deep learning |
| title_full | Video classification using deep learning |
| title_fullStr | Video classification using deep learning |
| title_full_unstemmed | Video classification using deep learning |
| title_short | Video classification using deep learning |
| title_sort | video classification using deep learning |
| topic | Videos -- Classification Machine learning Neural networks (Computer Science) -- Scalability Computer vision Deep learning UCTD |
| url | http://hdl.handle.net/10019.1/108279 |
| work_keys_str_mv | AT newmangregory videoclassificationusingdeeplearning |