Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MEng)--Stellenbosch University, 2025.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613785008635904 |
|---|---|
| access_status_str | Open Access |
| author | De Wet, Lize |
| author2 | Basson, Anton |
| author_browse | Basson, Anton De Wet, Lize |
| author_facet | Basson, Anton De Wet, Lize |
| author_sort | De Wet, Lize |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MEng)--Stellenbosch University, 2025. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/134592 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:41:39.515Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/134592 Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions De Wet, Lize Basson, Anton Grobler, J. Van Schalkwyk, T. Stellenbosch University. Faculty of Engineering. Dept. of Mechanical and Mechatronic Engineering. Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing Thesis (MEng)--Stellenbosch University, 2025. De Wet, L. 2025. Evaluation of Selected Language Models for the Generation of Patient Assessment Summaries from Uncorrected Transcriptions. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/235216e6-984e-4589-a63f-64b2b77074c8 ENGLISH ABSTRACT: The administrative burden of manual clinical documentation is a significant contributor to burnout among healthcare professionals. While automated solutions that generate documentation from doctor-patient conversations exist, their practical adoption is often limited by high costs and a reliance on error-free, corrected transcripts. This research addresses the challenge of using cost-effective, open-source technologies, which must contend with the imperfect output of automatic speech recognition (ASR) systems, particularly their difficulty in accurately transcribing specialised medical terminology. The objective of this research is to evaluate the effectiveness of open-source language models in automatically generating patient assessment summaries from uncorrected ASR transcriptions. The study systematically investigates the impact of transcription errors on the quality of the generated summaries by comparing two distinct model paradigms: smaller, fine-tuned transformer-based models, namely the Bidirectional and Auto-Regressive Transformer (BART) and the Longformer Encoder-Decoder (LED), and large language models (LLMs) adapted using in-context learning (ICL). To facilitate this investigation, a new dataset variant was created from the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, containing dialogue transcriptions with synthetically generated ASR errors representative of open-source tools. The results demonstrate that LLMs with sufficient context windows, specifically a Vicuna-7B-16k model adapted with few-shot ICL, outperform fine-tuned transformer-based models, particularly in preserving clinical meaning as measured by the MEDCON metric. The experiments confirmed that, while model performance degrades when using uncorrected transcriptions, both transformer-based and LLM model types show resilience, producing structured summaries without catastrophic failure. While the performance decreases were often not statistically significant, the findings still highlight the persistent challenge of preserving medical accuracy with imperfect input. Furthermore, ICL was shown to be a highly data-efficient and effective adaptation strategy, surpassing the performance of fine-tuning on the limited training data available. This study concludes that it is feasible to use open-source language models to generate useful drafts of clinical notes, which still contain errors and require review and correction by healthcare professionals, from imperfect ASR transcriptions. This finding has significant practical implications, demonstrating a viable pathway for healthcare organisations to develop cost-effective, automated documentation tools that can reduce administrative workload without requiring expensive commercial systems or extensive manual data correction. AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar. Masters 2025-12-15T13:48:46Z 2025-12-15T13:48:46Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134592 en Stellenbosch University ix, 115 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing De Wet, Lize Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title | Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title_full | Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title_fullStr | Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title_full_unstemmed | Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title_short | Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| title_sort | evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions |
| topic | Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing |
| url | https://scholar.sun.ac.za/handle/10019.1/134592 |
| work_keys_str_mv | AT dewetlize evaluationofselectedlanguagemodelsforthegenerationofpatientassessmentsummariesfromuncorrectedtranscriptions |