Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Multitask learning and data distribution search in visual relationship recognition

Thesis (MSc)--Stellenbosch University, 2020.

Saved in:
Bibliographic Details
Main Author: Josias, Shane
Other Authors: Brink, Willie
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University. 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613835253252096
access_status_str Open Access
author Josias, Shane
author2 Brink, Willie
author_browse Brink, Willie
Josias, Shane
author_facet Brink, Willie
Josias, Shane
author_sort Josias, Shane
collection Thesis
dc_rights_str_mv Stellenbosch University.
description Thesis (MSc)--Stellenbosch University, 2020.
format Thesis
id oai:scholar.sun.ac.za:10019.1/108109
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:42:26.594Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Stellenbosch : Stellenbosch University.
publisherStr Stellenbosch : Stellenbosch University.
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/108109 Multitask learning and data distribution search in visual relationship recognition Josias, Shane Brink, Willie Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics). Machine learning Neural networks (Computer science) Computer vision Computer multitasking Visual relationship recognition Electronic data processing -- Batch processing UCTD Thesis (MSc)--Stellenbosch University, 2020. ENGLISH ABSTRACT: An image can be described by the objects within it, as well as the interactions between those objects. A pair of object labels together with an interaction label can be assembled into what is known as a visual relationship, represented as a triplet of the form (subject, predicate, object). Recognising visual relationships in a given image is a challenging task, owing to the combinatorially large number of possible relationship triplets which lead to a so-called extreme classification problem, as well as a very long tail found typically in the distribution of those possible triplets. We investigate the efficacy of four strategies that could potentially address these issues. Firstly, instead of predicting the full triplet we opt to predict each element separately. Secondly, we investigate the use of shared network parameters to perform these separate predictions in a basic multitask setting. Thirdly, we extend the multitask setting by including an online ranking loss that acts on a trio of samples (an anchor, a positive sample, and a negative sample). Semi-hard negative mining is used to select negative samples. Finally, we consider a class-selective batch construction strategy to expose the network to more of the many rare classes during mini-batch training. We view semihard negative mining and class-selective batch construction as training data distribution search, in the sense that they both attempt to carefully select training samples in order to improve model performance. In addition to the aforementioned strategies, we also introduce a means of evaluating model behaviour in visual relationship recognition. This evaluation motivates the use of semantics. Our experiments demonstrate that batch construction can improve performance on the long tail, possibly at the expense of accuracy on the small number of dominating classes. We also find that a basic multitask model neither improves nor impedes performance in any significant way, but that its smaller size may be beneficial. Moreover, multitask models trained with a ranking loss yield a decrease in performance, possibly due to limited batch sizes. AFRIKAANSE OPSOMMING: ’n Beeld kan beskryf word deur die voorwerpe daarin, asook die interaksies tussen daardie voorwerpe. Twee voorwerpetikette saammet ’n interaksie-etiket staan bekend as ’n visuele verwantskap, en word voorgestel met ’n drieling van die vorm (onderwerp, predikaat, voorwerp). Die herkenning van visuele verwantskappe in ’n gegewe beeld is ’n uitdagende taak, te danke aan die kombinatoriese groot aantal moontlike verwantskap-drielinge, wat lei tot ’n sogenaamde ekstreme klassifikasieprobleem, sowel as ’n baie lang stert wat tipies in die verspreiding van daardie moontlike drielinge voorkom. Ons ondersoek die doeltreffendheid van vier strategieë om hierdie probleme aan te pak. Eerstens, in plaas daarvan om die volledige drieling te voorspel, kies ons om elke element afsonderlik te voorspel. Tweedens ondersoek ons die gebruik van gedeelde netwerkparameters om hierdie afsonderlike voorspellings in ’n basiese multitaak-opstelling uit te voer. Derdens brei ons die multitaak-opstelling uit deur ’n aanlyn rang-verliesfunksie in te sluit, gedefinieër op ’n trio van datapunte (’n anker, ’n positiewe voorbeeld en ’n negatiewe voorbeeld). Semi-moeilike negatiewe ontginning word gebruik om negatiewe voorbeelde te selekteer. Laastens word daar gekyk na ’n klas-selektiewe bondelkonstruksie-strategie om die netwerk bloot te stel aan meer van die seldsame klasse tydens mini-bondel afrigting. Ons beskou semi-moeilike negatiewe ontginning en klas-selektiewe bondelkonstruksie as vorme van ’n dataverspreidings-soektog. Albei poog om afrig-datapunte noukeurig te kies om die model se prestasie te verbeter. Benewens die bogenoemde strategieë, stel ons ook ’n manier voor om modelgedrag in die herkenning van visuele verwantskappe te evalueer. Hierdie evaluering motiveer die gebruik van semantiek. Ons eksperimente demonstreer dat bondelkonstruksie prestasie op die lang stert kan verbeter, moontlik ten koste van akkuraatheid op die klein aantal dominante klasse. Ons vind ook dat ’n basiese multitaakmodel nie die prestasie op ’n beduidende manier verbeter of belemmer nie, maar dat die kleiner modelgrootte daarvan voordelig kan wees. Boonop lei multitaakmodelle wat met ’n rang-verliesfunksie afgerig word, tot ’n laer prestasie, moontlik as gevolg van beperkte bondelgroottes. Masters 2020-02-19T13:20:04Z 2020-04-28T12:19:42Z 2020-02-19T13:20:04Z 2020-04-28T12:19:42Z 2020-03 Thesis http://hdl.handle.net/10019.1/108109 en_ZA Stellenbosch University. vi, 60 pages : illustrations application/pdf Stellenbosch : Stellenbosch University.
spellingShingle Machine learning
Neural networks (Computer science)
Computer vision
Computer multitasking
Visual relationship recognition
Electronic data processing -- Batch processing
UCTD
Josias, Shane
Multitask learning and data distribution search in visual relationship recognition
title Multitask learning and data distribution search in visual relationship recognition
title_full Multitask learning and data distribution search in visual relationship recognition
title_fullStr Multitask learning and data distribution search in visual relationship recognition
title_full_unstemmed Multitask learning and data distribution search in visual relationship recognition
title_short Multitask learning and data distribution search in visual relationship recognition
title_sort multitask learning and data distribution search in visual relationship recognition
topic Machine learning
Neural networks (Computer science)
Computer vision
Computer multitasking
Visual relationship recognition
Electronic data processing -- Batch processing
UCTD
url http://hdl.handle.net/10019.1/108109
work_keys_str_mv AT josiasshane multitasklearninganddatadistributionsearchinvisualrelationshiprecognition