Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions

Thesis (MEng)--Stellenbosch University, 2025.

Saved in:

Bibliographic Details
Main Author:	De Wet, Lize
Other Authors:	Basson, Anton
Format:	Thesis
Language:	English
Published:	Stellenbosch : Stellenbosch University 2025
Subjects:	Automatic speech recognition Medical transcription Medical informatics Medical records > Data processing
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613785008635904
access_status_str	Open Access
author	De Wet, Lize
author2	Basson, Anton
author_browse	Basson, Anton De Wet, Lize
author_facet	Basson, Anton De Wet, Lize
author_sort	De Wet, Lize
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MEng)--Stellenbosch University, 2025.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/134592
institution	Stellenbosch University (South Africa)
language	English
last_indexed	2026-06-10T12:41:39.515Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/134592 Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions De Wet, Lize Basson, Anton Grobler, J. Van Schalkwyk, T. Stellenbosch University. Faculty of Engineering. Dept. of Mechanical and Mechatronic Engineering. Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing Thesis (MEng)--Stellenbosch University, 2025. De Wet, L. 2025. Evaluation of Selected Language Models for the Generation of Patient Assessment Summaries from Uncorrected Transcriptions. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/235216e6-984e-4589-a63f-64b2b77074c8 ENGLISH ABSTRACT: The administrative burden of manual clinical documentation is a significant contributor to burnout among healthcare professionals. While automated solutions that generate documentation from doctor-patient conversations exist, their practical adoption is often limited by high costs and a reliance on error-free, corrected transcripts. This research addresses the challenge of using cost-effective, open-source technologies, which must contend with the imperfect output of automatic speech recognition (ASR) systems, particularly their difficulty in accurately transcribing specialised medical terminology. The objective of this research is to evaluate the effectiveness of open-source language models in automatically generating patient assessment summaries from uncorrected ASR transcriptions. The study systematically investigates the impact of transcription errors on the quality of the generated summaries by comparing two distinct model paradigms: smaller, fine-tuned transformer-based models, namely the Bidirectional and Auto-Regressive Transformer (BART) and the Longformer Encoder-Decoder (LED), and large language models (LLMs) adapted using in-context learning (ICL). To facilitate this investigation, a new dataset variant was created from the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, containing dialogue transcriptions with synthetically generated ASR errors representative of open-source tools. The results demonstrate that LLMs with sufficient context windows, specifically a Vicuna-7B-16k model adapted with few-shot ICL, outperform fine-tuned transformer-based models, particularly in preserving clinical meaning as measured by the MEDCON metric. The experiments confirmed that, while model performance degrades when using uncorrected transcriptions, both transformer-based and LLM model types show resilience, producing structured summaries without catastrophic failure. While the performance decreases were often not statistically significant, the findings still highlight the persistent challenge of preserving medical accuracy with imperfect input. Furthermore, ICL was shown to be a highly data-efficient and effective adaptation strategy, surpassing the performance of fine-tuning on the limited training data available. This study concludes that it is feasible to use open-source language models to generate useful drafts of clinical notes, which still contain errors and require review and correction by healthcare professionals, from imperfect ASR transcriptions. This finding has significant practical implications, demonstrating a viable pathway for healthcare organisations to develop cost-effective, automated documentation tools that can reduce administrative workload without requiring expensive commercial systems or extensive manual data correction. AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar. Masters 2025-12-15T13:48:46Z 2025-12-15T13:48:46Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134592 en Stellenbosch University ix, 115 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing De Wet, Lize Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title	Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title_full	Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title_fullStr	Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title_full_unstemmed	Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title_short	Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
title_sort	evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions
topic	Automatic speech recognition Medical transcription Medical informatics Medical records -- Data processing
url	https://scholar.sun.ac.za/handle/10019.1/134592
work_keys_str_mv	AT dewetlize evaluationofselectedlanguagemodelsforthegenerationofpatientassessmentsummariesfromuncorrectedtranscriptions

Full Text Available

Evaluation of selected language models for the generation of patient assessment summaries from uncorrected transcriptions

Similar Items