Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MComm)--Stellenbosch University, 2021.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2021
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613895768670208 |
|---|---|
| access_status_str | Open Access |
| author | Thetard, Heinrich Mathias |
| author2 | Nel, J. H. |
| author_browse | Nel, J. H. Thetard, Heinrich Mathias |
| author_facet | Nel, J. H. Thetard, Heinrich Mathias |
| author_sort | Thetard, Heinrich Mathias |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MComm)--Stellenbosch University, 2021. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/110024 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:43:25.190Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2021 |
| publishDateRange | 2021 |
| publishDateSort | 2021 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/110024 Automated payment fraud detection using logistic regression and support vector machines Thetard, Heinrich Mathias Nel, J. H. Stellenbosch University. Faculty of Economic and Management Science. Dept. of Logistics. Logistic regression analysis Machine learning Support vector machines SVMs (Algorithms) Automated tellers ATMs (Banking) Remote sensing Contingency tables -- Computer programs Commercial crimes Banks and banking -- Security measures Bank fraud UCTD Thesis (MComm)--Stellenbosch University, 2021. ENGLISH ABSTRACT: The financial technology sector is a fast moving environment. There are many innovations I nthe automation and efficiency spheres where human intervention is required less and processing speed is rapidly increasing. In the payments space this is evident as payments are processed faster each year with the vast majority of these transactions driven automatically. This has opened up a platform for fraudsters to operate on. The use of Machine Learning (ML) in fraud detection has grown in popularity. Two methods, logistic regression (LR) and support vector machines (SVMs), are used to identify fraud and are investigated in this thesis. LR is less complex as compared to SVMs, but SVMs have unique situations where it will outperform any other ML model [31]. Either method is assessed based on application conditions and measured based on a certain set of confusion matrix based metrics. The two methods are applied to a data set from a bank which participates in the automated payment environment. It was evident that the sample proportions selected had a major impact on the model performance especially with regards to sensitivity and specificity. This was an exercise of fraud identification where sensitivity is the most important. This may not be the case for all data sets and environments as the cost to investigate false positives may be higher than the actual cost of fraud prevented. Condition testing and post model application diagnostics were applied in this research. It was evident principle component analysis (PCA) feature selection was inferior to stepwise feature selection. The relatively poor performance of the PCA feature selection models is due to a loss of information when variables are removed when choosing the components. When considering the odds ratios for LR, there were several variables that were protective factors and others that were risk factors. These factors either increased or decreased the odds of a case being fraudulent. It was found that when a debit order (DO) was associated with an older person it was more likely to be fraudulent than when the DO was associated with a younger person. It was also found that if a DO had a value of R99 or R45 then the odds of the case being fraudulent would increase several-fold. LR models produced equivalent results to the more complex SVM models with a much better run time. From a practical point of view, this means that LR is preferred on larger data sets. AFRIKAANSE OPSOMMING: Die finansiële tegnologie sektor is ’n vinnig bewegende omgewing. Daar is baie innovasies op die gebied van outomatisering en doeltreffendheid, waar menslike ingryping minder nodig is en die spoed van verwerking vinnig toeneem. In die betalingsruimte blyk dit dat betalings elke jaar vinniger verwerk word, met die oorgrote meerderheid van die betalingstransaksies wat outomaties verwerk word. Dit het ’n platform vir bedrieërs geskep. Gevolglik neem die gewildheid van die gebruik van masjienleer (ML) in die opsporing van bedrog steeds toe.Twee metodes, logistieke regressie (LR) en ondersteunings vektormasjiene (SVMs), word gebruik om bedrog te identifiseer en word in hierdie tesis ondersoek. LR is minder kompleks in vergelyking met SVMs, maar SVMs het unieke situasies waar dit beter sal presteer as enige ander ML-model. Elk van hierdie metodes word beoordeel op grond van toepassingsvoorwaardes en die prestasie word gemeet aan die hand van ’n sekere stel maatstawwe wat op die verwarringsmatriks gebaseer is. Die twee metodes word op ’n datastel van ’n bank wat aan die outomatiese betalingsomgewing deelneem, toegepas.Dit was duidelik dat die geselekteerde steekproefverhoudings ’n groot invloed op die modelprestasie, sensitiwiteit en spesifisiteit gehad het. In hierdie studie is die identifikasie van bedrog die oogmerk, en daarom is die meting van sensitiwiteit die belangrikste. Dit is miskien nie die geval vir alle datastelle en omgewings nie, aangesien die koste om vals positiewe gevalle te ondersoek, hoër kan wees as wat die werklike koste van die voorkoming van bedrog is. Die toetsing van voorwaardes en ontleding van postmodel diagnostieke is in hierdie navorsing toegepas. Dit was duidelik dat hoofkomponentanalise (PCA) ondergeskik presteer het in vergelyking met stapsgewyse seleksiemetodes. Die relatief swak prestasie van die PCA seleksiemodelle is te wyte aan die verlies van inligting wanneer veranderlikes geelimineer word in die keuse van die komponente. By die oorweging van die kansverhoudings vir LR was daar verskillende veranderlikes wat beskermende faktore was en ander wat risikofaktore was. Hierdie faktore het die kans op gevalle van bedrog verhoog of verminder. Daar is gevind dat wanneer ’n debietorder (DO) met ’n ouer persoon geassosieer word, dit meer waarskynlik as bedrog geklassifiseer word as wanneer die DO met ’n jonger persoon geassosieer word. Dit is ook gevind dat as ’n DO ’n waarde van R99 en R45 het, die kans dat dit ‘n bedrogsaak sal wees, meer sal vergroot. LR-modelle lewer gelykstaande resultate aan die meer ingewikkelde SVM-modelle met ’n baie beter tydsduur. Uit ’n praktiese oogpunt beteken dit dat LR modelle verkies sal word vir groter datastelle. Masters 2021-03-06T16:40:04Z 2021-04-21T14:36:56Z 2021-03-06T16:40:04Z 2021-04-21T14:36:56Z 2021-03 Thesis http://hdl.handle.net/10019.1/110024 en_ZA Stellenbosch University 125 pages application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Logistic regression analysis Machine learning Support vector machines SVMs (Algorithms) Automated tellers ATMs (Banking) Remote sensing Contingency tables -- Computer programs Commercial crimes Banks and banking -- Security measures Bank fraud UCTD Thetard, Heinrich Mathias Automated payment fraud detection using logistic regression and support vector machines |
| title | Automated payment fraud detection using logistic regression and support vector machines |
| title_full | Automated payment fraud detection using logistic regression and support vector machines |
| title_fullStr | Automated payment fraud detection using logistic regression and support vector machines |
| title_full_unstemmed | Automated payment fraud detection using logistic regression and support vector machines |
| title_short | Automated payment fraud detection using logistic regression and support vector machines |
| title_sort | automated payment fraud detection using logistic regression and support vector machines |
| topic | Logistic regression analysis Machine learning Support vector machines SVMs (Algorithms) Automated tellers ATMs (Banking) Remote sensing Contingency tables -- Computer programs Commercial crimes Banks and banking -- Security measures Bank fraud UCTD |
| url | http://hdl.handle.net/10019.1/110024 |
| work_keys_str_mv | AT thetardheinrichmathias automatedpaymentfrauddetectionusinglogisticregressionandsupportvectormachines |