Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Insights into the South African research landscape through mining theses and dissertations using transformer-based language models

Thesis (MSc)--Stellenbosch University, 2026.

Saved in:
Bibliographic Details
Main Author: Khanyi, Masana Hlengiwe Michelle
Other Authors: Dunaiski, Marcel
Format: Thesis
Language:English
Published: Stellenbosch : Stellenbosch University 2026
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614037867495424
access_status_str Open Access
author Khanyi, Masana Hlengiwe Michelle
author2 Dunaiski, Marcel
author_browse Dunaiski, Marcel
Khanyi, Masana Hlengiwe Michelle
author_facet Dunaiski, Marcel
Khanyi, Masana Hlengiwe Michelle
author_sort Khanyi, Masana Hlengiwe Michelle
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2026.
format Thesis
id oai:scholar.sun.ac.za:10019.1/136187
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:45:40.774Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2026
publishDateRange 2026
publishDateSort 2026
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/136187 Insights into the South African research landscape through mining theses and dissertations using transformer-based language models Khanyi, Masana Hlengiwe Michelle Dunaiski, Marcel Van Lill, Milandre Stellenbosch University. Faculty of Science. Dept. of Computer Science. Thesis (MSc)--Stellenbosch University, 2026. Khanyi, M. H. M. 2026. Insights into the South African research landscape through mining theses and dissertations using transformer-based language models. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c1718131-6742-4e0f-9980-1b72fc7cd181 Postgraduate research plays a critical role in the development of national research capacity and advanced knowledge production. In South Africa, electronic theses and dissertations (ETDs) constitute a substantial yet underutilised body of scholarly output for analysing postgraduate training, knowledge production, and scholarly influence. Despite their significance, ETDs are rarely incorporated into large-scale scientometric analyses due to fragmented institutional repositories, limited standardisation, and weak integration with global bibliographic infrastructures. This study develops a methodology for harvesting, enriching, and analysing ETDs using a combination of metadata integration, full-text mining, and citation analysis. Institutional ETD metadata are integrated with OpenAlex, an open-access scholarly knowledge graph, using a stemming-assisted title matching approach and persistent identifier mapping. In addition, the study applies automated PDF mining techniques to extract reference lists and citation contexts directly from ETD full texts, enabling fine-grained analysis of citation behaviour beyond aggregate citation counts. The enriched dataset supports citation network construction, concept mapping, supervisor–student linkage, and longitudinal analysis of postgraduate research output. Empirical analyses focus on South Africa’s research-intensive universities and examine institutional productivity, temporal growth patterns, language use, retention dynamics, and citation characteristics of postgraduate research between 2000 and 2024. The results reveal a concentration of postgraduate research output among a small number of institutions, sustained growth prior to 2020, and a marked decline thereafter. This post-2020 downturn is likely influenced by economic pressures, funding constraints, and the disruptive effects of the COVID-19 pandemic, with implications for future doctoral production and national policy targets. By exploring ETD metadata integration, full-text citation mining, and open bibliographic enrichment, this study extends traditional publication-based scientometrics and demonstrates the value of ETDs as instruments for monitoring postgraduate research training capacity and informing evidence-based higher education policy in South Africa. Masters 2026-04-24T11:54:47Z 2026-04-24T11:54:47Z 2026-03 Thesis https://scholar.sun.ac.za/handle/10019.1/136187 en Stellenbosch University 110 pages : ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle Khanyi, Masana Hlengiwe Michelle
Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title_full Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title_fullStr Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title_full_unstemmed Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title_short Insights into the South African research landscape through mining theses and dissertations using transformer-based language models
title_sort insights into the south african research landscape through mining theses and dissertations using transformer based language models
url https://scholar.sun.ac.za/handle/10019.1/136187
work_keys_str_mv AT khanyimasanahlengiwemichelle insightsintothesouthafricanresearchlandscapethroughminingthesesanddissertationsusingtransformerbasedlanguagemodels