Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Automatic assignment of diagnosis codes to free-form text medical notes

Thesis (MSc)--Stellenbosch University, 2021.

Saved in:
Bibliographic Details
Main Author: Strydom, Stefan
Other Authors: Van der Merwe, Brink
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613907469729792
access_status_str Open Access
author Strydom, Stefan
author2 Van der Merwe, Brink
author_browse Strydom, Stefan
Van der Merwe, Brink
author_facet Van der Merwe, Brink
Strydom, Stefan
author_sort Strydom, Stefan
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2021.
format Thesis
id oai:scholar.sun.ac.za:10019.1/123654
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:43:36.390Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/123654 Automatic assignment of diagnosis codes to free-form text medical notes Strydom, Stefan Van der Merwe, Brink Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science. Clinical auto-coding systems Machine learning Diagnosis related groups -- Automation Medical codes -- Automatic control UCTD Thesis (MSc)--Stellenbosch University, 2021. ENGLISH ABSTRACT: Clinical coding is the process of describing and categorising healthcare episodes according to standardised ontologies. The coded data have important downstream applications, including population morbidity studies, health systems planning and reimbursement. Clinical codes are generally assigned based on information contained in free-form text clinical notes by specialist human coders. This process is expensive, time-consuming, subject to human error and burdens scarce clinical human resources with administrative roles. An accurate automatic coding system can alleviate these problems. Clinical coding is a challenging task for machine learning systems. The source texts are often long, has a highly specialised vocabulary, contains non-standard clinician shorthand and the code sets can contain tens-of-thousands of codes. We review previous work on clinical auto-coding systems and perform an empirical analysis of widely used and current state-of-the-art machine learning approaches to the problem. We propose a novel attention mechanism that takes the text description of clinical codes into account. We also construct a small pre-trained transformer model that achieves state-of-the-art performance on the MIMIC II and III ICD-9 auto-coding tasks. To the best of our knowledge, it is the first successful application of a pre-trained transformer model on this task. AFRIKAANSE OPSOMMING: Kliniese kodering is die proses om gesondheidsorg-voorvalle volgens gestandaardiseerde ontologieë te beskryf en te kategoriseer. Die gekodeerde data het belangrike praktiese toepassings, insluitend studies omtrent die siektelas in die bevolking, gesondheidstelselbeplanning en regverdige vergoeding van medici. Kliniese kodes word gewoonlik toegeken deur klinies-opgeleide persone op grond van inligting vervat in vrye teks kliniese aantekeninge. Hierdie proses is duur, tydrowend, onderhewig aan menslike foute en belas skaars kliniese menslike hulpbronne met administratiewe rolle. ’n Akkurate outomatiese koderingstelsel kan help om hierdie probleme te verlig. Kliniese kodering is ’n uitdagende taak vir masjienleerstelsels. Die kliniese teks is dikwels lank, het ’n gespesialiseerde woordeskat, bevat nie-standaard kliniese snelskrif en die kodestelle kan tienduisende kodes bevat. Ons ondersoek vorige werk oor kliniese outokoderingstelsels en voer ’n empiriese analise uit van die mees algemene en beste-in-klas masjienleerbenaderings tot die probleem. Ons stel ’n nuwe aandagmeganisme voor wat die teksbeskrywing van kliniese kodes tydens klassifikasie in ag neem. Ons konstrueer ook ’n klein voorafopgeleide transformatormodel wat huidige maatstawwe vir die MIMIC II and III ICD-9 outokoderingstake oortref. Na ons beste wete is dit die eerste suksesvolle toepassing van ’n vooraf opgeleide transformatormodel vir hierdie taak. Masters 2021-09-05T13:16:11Z 2021-12-22T14:14:14Z 2021-09-05T13:16:11Z 2021-12-22T14:14:14Z 2021-12 Thesis http://hdl.handle.net/10019.1/123654 en_ZA Stellenbosch University xii, 102 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Clinical auto-coding systems
Machine learning
Diagnosis related groups -- Automation
Medical codes -- Automatic control
UCTD
Strydom, Stefan
Automatic assignment of diagnosis codes to free-form text medical notes
title Automatic assignment of diagnosis codes to free-form text medical notes
title_full Automatic assignment of diagnosis codes to free-form text medical notes
title_fullStr Automatic assignment of diagnosis codes to free-form text medical notes
title_full_unstemmed Automatic assignment of diagnosis codes to free-form text medical notes
title_short Automatic assignment of diagnosis codes to free-form text medical notes
title_sort automatic assignment of diagnosis codes to free form text medical notes
topic Clinical auto-coding systems
Machine learning
Diagnosis related groups -- Automation
Medical codes -- Automatic control
UCTD
url http://hdl.handle.net/10019.1/123654
work_keys_str_mv AT strydomstefan automaticassignmentofdiagnosiscodestofreeformtextmedicalnotes