Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Automated payment fraud detection using logistic regression and support vector machines

Thesis (MComm)--Stellenbosch University, 2021.

Saved in:
Bibliographic Details
Main Author: Thetard, Heinrich Mathias
Other Authors: Nel, J. H.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613895768670208
access_status_str Open Access
author Thetard, Heinrich Mathias
author2 Nel, J. H.
author_browse Nel, J. H.
Thetard, Heinrich Mathias
author_facet Nel, J. H.
Thetard, Heinrich Mathias
author_sort Thetard, Heinrich Mathias
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MComm)--Stellenbosch University, 2021.
format Thesis
id oai:scholar.sun.ac.za:10019.1/110024
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:43:25.190Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/110024 Automated payment fraud detection using logistic regression and support vector machines Thetard, Heinrich Mathias Nel, J. H. Stellenbosch University. Faculty of Economic and Management Science. Dept. of Logistics. Logistic regression analysis Machine learning Support vector machines SVMs (Algorithms) Automated tellers ATMs (Banking) Remote sensing Contingency tables -- Computer programs Commercial crimes Banks and banking -- Security measures Bank fraud UCTD Thesis (MComm)--Stellenbosch University, 2021. ENGLISH ABSTRACT: The financial technology sector is a fast moving environment. There are many innovations I nthe automation and efficiency spheres where human intervention is required less and processing speed is rapidly increasing. In the payments space this is evident as payments are processed faster each year with the vast majority of these transactions driven automatically. This has opened up a platform for fraudsters to operate on. The use of Machine Learning (ML) in fraud detection has grown in popularity. Two methods, logistic regression (LR) and support vector machines (SVMs), are used to identify fraud and are investigated in this thesis. LR is less complex as compared to SVMs, but SVMs have unique situations where it will outperform any other ML model [31]. Either method is assessed based on application conditions and measured based on a certain set of confusion matrix based metrics. The two methods are applied to a data set from a bank which participates in the automated payment environment. It was evident that the sample proportions selected had a major impact on the model performance especially with regards to sensitivity and specificity. This was an exercise of fraud identification where sensitivity is the most important. This may not be the case for all data sets and environments as the cost to investigate false positives may be higher than the actual cost of fraud prevented. Condition testing and post model application diagnostics were applied in this research. It was evident principle component analysis (PCA) feature selection was inferior to stepwise feature selection. The relatively poor performance of the PCA feature selection models is due to a loss of information when variables are removed when choosing the components. When considering the odds ratios for LR, there were several variables that were protective factors and others that were risk factors. These factors either increased or decreased the odds of a case being fraudulent. It was found that when a debit order (DO) was associated with an older person it was more likely to be fraudulent than when the DO was associated with a younger person. It was also found that if a DO had a value of R99 or R45 then the odds of the case being fraudulent would increase several-fold. LR models produced equivalent results to the more complex SVM models with a much better run time. From a practical point of view, this means that LR is preferred on larger data sets. AFRIKAANSE OPSOMMING: Die finansiële tegnologie sektor is ’n vinnig bewegende omgewing. Daar is baie innovasies op die gebied van outomatisering en doeltreffendheid, waar menslike ingryping minder nodig is en die spoed van verwerking vinnig toeneem. In die betalingsruimte blyk dit dat betalings elke jaar vinniger verwerk word, met die oorgrote meerderheid van die betalingstransaksies wat outomaties verwerk word. Dit het ’n platform vir bedrieërs geskep. Gevolglik neem die gewildheid van die gebruik van masjienleer (ML) in die opsporing van bedrog steeds toe.Twee metodes, logistieke regressie (LR) en ondersteunings vektormasjiene (SVMs), word gebruik om bedrog te identifiseer en word in hierdie tesis ondersoek. LR is minder kompleks in vergelyking met SVMs, maar SVMs het unieke situasies waar dit beter sal presteer as enige ander ML-model. Elk van hierdie metodes word beoordeel op grond van toepassingsvoorwaardes en die prestasie word gemeet aan die hand van ’n sekere stel maatstawwe wat op die verwarringsmatriks gebaseer is. Die twee metodes word op ’n datastel van ’n bank wat aan die outomatiese betalingsomgewing deelneem, toegepas.Dit was duidelik dat die geselekteerde steekproefverhoudings ’n groot invloed op die modelprestasie, sensitiwiteit en spesifisiteit gehad het. In hierdie studie is die identifikasie van bedrog die oogmerk, en daarom is die meting van sensitiwiteit die belangrikste. Dit is miskien nie die geval vir alle datastelle en omgewings nie, aangesien die koste om vals positiewe gevalle te ondersoek, hoër kan wees as wat die werklike koste van die voorkoming van bedrog is. Die toetsing van voorwaardes en ontleding van postmodel diagnostieke is in hierdie navorsing toegepas. Dit was duidelik dat hoofkomponentanalise (PCA) ondergeskik presteer het in vergelyking met stapsgewyse seleksiemetodes. Die relatief swak prestasie van die PCA seleksiemodelle is te wyte aan die verlies van inligting wanneer veranderlikes geelimineer word in die keuse van die komponente. By die oorweging van die kansverhoudings vir LR was daar verskillende veranderlikes wat beskermende faktore was en ander wat risikofaktore was. Hierdie faktore het die kans op gevalle van bedrog verhoog of verminder. Daar is gevind dat wanneer ’n debietorder (DO) met ’n ouer persoon geassosieer word, dit meer waarskynlik as bedrog geklassifiseer word as wanneer die DO met ’n jonger persoon geassosieer word. Dit is ook gevind dat as ’n DO ’n waarde van R99 en R45 het, die kans dat dit ‘n bedrogsaak sal wees, meer sal vergroot. LR-modelle lewer gelykstaande resultate aan die meer ingewikkelde SVM-modelle met ’n baie beter tydsduur. Uit ’n praktiese oogpunt beteken dit dat LR modelle verkies sal word vir groter datastelle. Masters 2021-03-06T16:40:04Z 2021-04-21T14:36:56Z 2021-03-06T16:40:04Z 2021-04-21T14:36:56Z 2021-03 Thesis http://hdl.handle.net/10019.1/110024 en_ZA Stellenbosch University 125 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Logistic regression analysis
Machine learning
Support vector machines
SVMs (Algorithms)
Automated tellers
ATMs (Banking)
Remote sensing
Contingency tables -- Computer programs
Commercial crimes
Banks and banking -- Security measures
Bank fraud
UCTD
Thetard, Heinrich Mathias
Automated payment fraud detection using logistic regression and support vector machines
title Automated payment fraud detection using logistic regression and support vector machines
title_full Automated payment fraud detection using logistic regression and support vector machines
title_fullStr Automated payment fraud detection using logistic regression and support vector machines
title_full_unstemmed Automated payment fraud detection using logistic regression and support vector machines
title_short Automated payment fraud detection using logistic regression and support vector machines
title_sort automated payment fraud detection using logistic regression and support vector machines
topic Logistic regression analysis
Machine learning
Support vector machines
SVMs (Algorithms)
Automated tellers
ATMs (Banking)
Remote sensing
Contingency tables -- Computer programs
Commercial crimes
Banks and banking -- Security measures
Bank fraud
UCTD
url http://hdl.handle.net/10019.1/110024
work_keys_str_mv AT thetardheinrichmathias automatedpaymentfrauddetectionusinglogisticregressionandsupportvectormachines