Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Albertyn, C. 2025. A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain. Unpublished masters thesis. Stellenbosch: Stellenbosch Univeristy [online]. Available: https://scholar.sun.ac.za/items/1426022c-fbf0-44eb-ab7...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613763021045760 |
|---|---|
| access_status_str | Open Access |
| author | Albertyn, Carla |
| author2 | Grobler, J. |
| author_browse | Albertyn, Carla Grobler, J. |
| author_facet | Grobler, J. Albertyn, Carla |
| author_sort | Albertyn, Carla |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Albertyn, C. 2025. A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain. Unpublished masters thesis. Stellenbosch: Stellenbosch Univeristy [online]. Available: https://scholar.sun.ac.za/items/1426022c-fbf0-44eb-ab74-43df924264d6 |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/132040 |
| institution | Stellenbosch University (South Africa) |
| last_indexed | 2026-06-10T12:41:18.607Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/132040 A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain Albertyn, Carla Grobler, J. Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Machine learning Neural networks (Computer science) Boosting (Algorithms) Pattern recognition systems UCTD Albertyn, C. 2025. A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain. Unpublished masters thesis. Stellenbosch: Stellenbosch Univeristy [online]. Available: https://scholar.sun.ac.za/items/1426022c-fbf0-44eb-ab74-43df924264d6 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: The adoption of machine learning and artificial intelligence in various industries, such as advertising, financial services, and the retail industry, has experienced substantial growth, driven by the desire to increase business revenue and improve the client experience. Machine learning algorithms have the potential to accelerate decision-making processes in businesses, and deliver personalised experiences by predicting relevant recommendations tailored to individual client needs. The need for highly accurate algorithms that can be implemented with ease and at affordable costs is ever-rising. Gradient boosting algorithms have achieved state-of-the-art performance in classification and regression problems in the tabular domain. High prediction accuracy, ease of implementation, and fast computation are some of the most reported advantages of gradient boosting approaches. In recent years, deep learning models, particularly neural networks, have demonstrated significant success in domains with unstructured data, such as image and natural language processing. However, when applied to structured tabular data, these models often underperform compared to traditional machine learning techniques like decision trees, random forests, and gradient boosting machines. Recent research on transformer neural networks, however, has yielded promising results, suggesting that this novel architecture may challenge the established supremacy of gradient boosting models in the field of supervised learning. In this thesis, a generic framework for classification model development is proposed, with a focus on facilitating the model development process in a manner such that good performance may be achieved irrespective of the problem domain. This classification modelling framework is applied in a comparative study to establish the feasibility of using transformer neural networks as a superior alternative to gradient boosting algorithms to solve classification problems based on tabular data. In particular, the algorithmic performance of XGBoost is compared with the performance of FT-Transformer and TabTransformer. The framework is verified in an application on 12 benchmark data sets from various domains. Additionally, the framework is validated on three real-world case studies in the South African banking sector to illustrate the practical applicability of the framework. During this process, the framework is shown to produce simplified, stable models that generalise well to unseen data. Moreover, the application of the framework to the benchmark data and real-world case studies shows that TabTransformer was outperformed by both XGBoost and FT-Transformer. FT-Transformer produced competitive results, but XGBoost outperformed FT-Transformer and TabTransformer overall and is significantly faster and less costly in terms of hyper-parameter tuning. AFRIKAANSE OPSOMMING: Die aanvaarding van masjienleer en kunsmatige intelligensie in verskeie bedrywe, soos advertensies, finansi¨ele dienste en die kleinhandelbedryf, het aansienlike groei beleef, gedryf deur die begeerte om wins te verhoog en die kli¨entervaring te verbeter. Masjienleeralgoritmes het die potensiaal om besluitnemingsprosesse in besighede te versnel, en gepersonaliseerde ervarings te lewer deur relevante aanbevelings te voorspel wat aangepas is vir individuele kli¨entbehoeftes. Die behoefte aan hoogs akkurate algoritmes wat met gemak en teen bekostigbare koste ge¨ımplementeer kan word, neem steeds toe. Gradi¨entversterkende algoritmes het die top-prestasie in klassifikasie- en regressieprobleme in die tabulˆere data domein behaal. Ho¨e akkuraatheid, gemak van implementering, en vinnige berekening is van die mees gedokumenteerde voordele van gradi¨entversterkende benaderings. In onlangse jare het diepleermodelle, veral neurale netwerke, beduidende sukses getoon in domeine met ongestruktureerde data, soos beeld- en natuurlike taalverwerking. Wanneer dkit egter op gestruktureerde tabulˆere data toegepas word, onderpresteer hierdie modelle dikwels in vergelyking met tradisionele masjienleertegnieke soos besluitnemingsbome, en gradi¨entversterkingsmasjiene. Onlangse navorsing oor transformator neurale netwerke het egter belowende resultate opgelewer, wat daarop dui dat hierdie nuwe argitektuur die gevestigde oppergesag van gradi¨entversterkende modelle in die veld van klassifikasie- en regressiemodelle kan uitdaag. In hierdie tesis word ’n generiese raamwerk vir klassifikasiemodelontwikkeling voorgestel, met ’n fokus op die fasilitering van die modelontwikkelingsproses op ’n wyse dat goeie prestasie behaal kan word, ongeag die probleemdomein. Hierdie klassifikasie-modelleringsraamwerk word in ’n vergelykende studie toegepas om die haalbaarheid van die gebruik van transformatorneurale netwerke as ’n beter alternatief vir gradi¨entversterkende algoritmes vas te stel om klassifikasieprobleme gebaseer op tabelˆere data op te los. In die besonder word die algoritmiese resultate van XGBoost vergelyk met die resultate van FT-Transformer en TabTransformer. Die raamwerk word geverifi¨eer in ’n toepassing op 12 maatstafdatastelle van verskeie domeine. Daarbenewens word die raamwerk bekragtig op drie werklike gevallestudies in die Suid-Afrikaanse banksektor om die praktiese toepaslikheid van die raamwerk te illustreer. Tydens hierdie proses word getoon dat die raamwerk vereenvoudigde, stabiele modelle produseer wat goed veralgemeen na nuwe data. Boonop toon die toepassing van die raamwerk op die maatstafdata en gevallestudies dat TabTransformer uitgestof is deur beide XGBoost en FT-Transformer. FT-Transformer het mededingende resultate gelewer, maar XGBoost het oor die algemeen die beste resultate gelewer en is aansienlik vinniger en goedkoper in terme van hiperparameter-instelling. Masters 2025-05-20T13:52:57Z 2025-05-20T13:52:57Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132040 Stellenbosch University xxiv, 101 pages application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Machine learning Neural networks (Computer science) Boosting (Algorithms) Pattern recognition systems UCTD Albertyn, Carla A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title | A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title_full | A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title_fullStr | A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title_full_unstemmed | A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title_short | A comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| title_sort | comparative study of gradient boosting algorithms and transformer neural networks for classification problems in the tabular data domain |
| topic | Machine learning Neural networks (Computer science) Boosting (Algorithms) Pattern recognition systems UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/132040 |
| work_keys_str_mv | AT albertyncarla acomparativestudyofgradientboostingalgorithmsandtransformerneuralnetworksforclassificationproblemsinthetabulardatadomain AT albertyncarla comparativestudyofgradientboostingalgorithmsandtransformerneuralnetworksforclassificationproblemsinthetabulardatadomain |