Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A framework for evaluating clinical data using graph-based link prediction

Thesis (MEng)--Stellenbosch University, 2024.

Saved in:
Bibliographic Details
Main Author: Parshotam, Himil
Other Authors: Nel, Stephan
Format: Thesis
Language:en_ZA
en_ZA
Published: Stellenbosch : Stellenbosch University 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614063736913920
access_status_str Open Access
author Parshotam, Himil
author2 Nel, Stephan
author_browse Nel, Stephan
Parshotam, Himil
author_facet Nel, Stephan
Parshotam, Himil
author_sort Parshotam, Himil
collection Thesis
description Thesis (MEng)--Stellenbosch University, 2024.
format Thesis
id oai:scholar.sun.ac.za:10019.1/130664
institution Stellenbosch University (South Africa)
language en_ZA
en_ZA
last_indexed 2026-06-10T12:46:04.365Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/130664 A framework for evaluating clinical data using graph-based link prediction Parshotam, Himil Nel, Stephan Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Medical records -- Data processing Graphic methods Link prediction Knowledge graph System analysis Machine learning UCTD Thesis (MEng)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Clinical data repositories, the most prevalent of which are electronic health records, typically comprise a broad range of information pertaining to various _elds within the medical domain. Clinical data _elds encapsulate distinct notions and are typically expressed in various formats indicative of the specialised nature of the clinical domain. Data range from complex medical imaging data employed for diagnosis, therapy planning, intraoperative navigation, and postoperative monitoring, to clinical summaries derived from information collected during doctor-patient consultations, such as conditions, medications, and patient demographics. These data sources may be characterised by a notable degree of interconnectedness due to complex interactions and relationships that are embedded in respect of the various medical concepts. Actionable insight may be derived from clinical data if abstracted and analysed by means of an appropriate modelling approach. One such approach is a so-called knowledge graph which represents an effective approach towards abstracting complex, interconnected data via a specialised data structure in which information is expressed by means of a mathematical construct known as graphs. Various algorithmic techniques may be applied to knowledge graphs in order to identify and leverage inferential relationships in a systematic manner | this may then form the basis of clinical decision support. Link prediction represents a popular approach towards inferring insight from graph-based data. The application of link prediction techniques from the realms of network analysis and machine learning warrants consideration in a clinical context due to their notable algorithmic utility in respect of extracting actionable insight from data their utility can be further enhanced when applied to clinical data that are abstracted by means of knowledge graphs. There are, however, various complexities involved with constructing a knowledge graph and subsequently deriving insight therefrom. In this thesis, a generic framework is proposed for constructing and analysing a knowledge graph derived from clinical data. The proposed framework conceptually represents a unified pipeline (or architecture) that may be employed towards transforming raw clinical data into an appropriate knowledge graph representation, and subsequently performing graph-based analysis, i.e. link prediction. The proposed framework comprises three functional components, each of which addresses an integral step in the overarching process. Two main types of use cases may be realised by means of the framework's implementation, the first of which relates to the prediction of medical conditions in order to identify misdiagnosed conditions, while the second use case pertains to the prediction (i.e. suggestion) of medication for prescription purposes. Due to the challenges associated with access to real-world clinical data (attributable to privacy and confidentiality), reputable synthetic data are considered in order to demonstrate the methodological utility of the proposed framework. More specifically, two different data sets are considered, each of which differs in respect of the underlying clinical context and in terms of complexity. Three computerised instantiations are carried out, each of which varies in respect of different data sets that are subjected to the modelling pipeline and/or the clinical use case under consideration. An algorithmic verification study is carried out prior to these instantiations in order to verify the functional correctness of the sixteen link prediction algorithms considered. The three main paradigms of link prediction algorithms considered are common neighbour-based algorithms, machine learning classifiers, and graph neural networks. Hyperparameter tuning is also carried out in order to identify suitable algorithmic configurations in respect of the different link prediction algorithms. During each computerised instantiation, algorithmic performance is evaluated both quantitatively (including statistical analyses) and qualitatively by means of appropriate visualisations and contextual reflections. The algorithmic output is also further contextualised in respect of practical insight pertaining to the aim of the respective use cases. Furthermore, the framework is validated by means of a subject matter expert, during which the methodological utility of the proposed framework and its applicability to real-world operations is corroborated. AFRIKAANSE OPSOMMING: Kliniese databewaarplekke, waarvan elektroniese gesondheidsrekords die mees algemeen is, bevat tipies 'n wye reeks inligting wat betrekking het op verskeie aspekte binne die mediese veld. Datavelde omsluit verskillende begrippe en word tipies in verskeie formate uitgedruk | aanduidend van die gespesialiseerde aard van die kliniese veld. Data wissel van komplekse mediese beeldingdata wat gebruik word vir diagnose, terapiebeplanning, intraoperatiewe navigasie en post-operatiewe monitering, tot kliniese opsommings afgelei van inligting wat ingesamel is tydens dokter-pasi ent konsultasies, soos toestande, medikasie, en pasi entdemogra_e. Hierdie databronne kan gekenmerk word deur 'n noemenswaardige mate van onderlinge verbondenheid as gevolg van komplekse interaksies en verwantskappe wat ingebed is ten opsigte van die verskillende mediese konsepte. Praktiese uitvoerbare insig kan vanaf kliniese data afegelei word indien dit deur middel van 'n toepaslike modelleringsbenadering uitgedruk en ontleed word. Een so 'n benadering is 'n sogenaamde kennisgra_ek wat 'n e_ektiewe benadering tot die modellering van komplekse, onderling gekoppelde data verteenwoordig deur middel van 'n gespesialiseerde datastruktuur waarin inligting uitgedruk word deur middel van 'n wiskundige konsep bekend as gra_eke. Verskeiealgoritmiese tegnieke kan op kennisgra_eke toegepas word om inferensi ele verhoudings op 'n sistematiese wyse te identi_seer en te benut | dit kan dan die basis vorm van kliniese besluitsteun. Skakelvoorspelling verteenwoordig 'n gewilde benadering tot die aeiding van insig uit gra_ek-gebaseerde data. Die toepassing van skakelvoorspellingstegnieke uit die gebiede van netwerkanalise en masjienleer regverdig oorweging in 'n kliniese konteks as gevolg van hul noemenswaardige algoritmiese nut ten opsigte van die onttrekking van uitvoerbare insig uit data | hul nut kan verder verbeter word wanneer dit toegepas word op kliniese data wat deur middel van kennisgra_eke gemodelleer word. Daar is egter verskeie komplekse oorwegings betrokke tydens die opstel van 'n kennisgra_ek en die verkryging van insig daaruit. In hierdie tesis word 'n generiese raamwerk voorgestel vir die ontwikkeling en ontleding van 'n kennisgra_ek wat afgelei is van kliniese data. Die voorgestelde raamwerk verteenwoordig konseptueel 'n verenigde pyplyn (of argitektuur) wat aangewend kan word om rou kliniese data in 'n toepaslike kennisgra_ekvoorstelling te transformeer, en daarna gra_ekgebaseerde analise, i.e. skakelvoorspelling, uit te voer. Die voorgestelde raamwerk bestaan uit drie funksionele komponente, wat elk 'n integrale stap in die oorkoepelende proses aanspreek. Twee hooftipes gevallestudies kan deur middel van die raamwerk se implementering gerealiseer word, waarvan die eerste verband hou met die voorspelling van mediese toestande ten einde verkeerd gediagnoseerde toestande te identi_seer, terwyl die tweede gevallestudie fokus op die voorspelling van medikasie vir voorskrifdoeleindes. As gevolg van die uitdagings wat verband hou met toegang tot werklike kliniese data (toeskryfbaar aan privaatheid en vertroulikheid), word betroubare sintetiese data oorweeg om die metodologiese nut van die voorgestelde raamwerk te demonstreer. Meer spesi_ek word twee verskillende datastelle oorweeg, wat elkeen verskil ten opsigte van die onderliggende kliniese konteks en ten opsigte van kompleksiteit. Drie gerekenariseerde instansiasies word uitgevoer, wat elk verskil ten opsigte van verskillende datastelle wat aan die modelleringspyplyn en/of die kliniese gebruiksgeval onder oorweging onderwerp word. 'n Algoritmiese veri_kasiestudie word voor hierdie instansiasies uitgevoer om die funksionele korrektheid van die sestien skakelvoorspellingsalgoritmes wat oorweeg is, te veri_eer. Die drie hoofparadigmas van skakelvoorspellingsalgoritmes wat oorweeg word, is algemene buur-gebaseerde algoritmes, masjienleerklassi_seerders, en gra_ekneurale netwerke. Hiperparameterinstelling word ook uitgevoer om geskikte algoritmiese kon_gurasies ten opsigte van die verskillende skakelvoorspellingsalgoritmes te identi_seer. Tydens elke gerekenariseerde instansiasie word algoritmiese prestasie beide kwantitatief (insluitend statistiese ontledings) en kwalitatief deur middel van gepaste visualiserings en kontekstuele reeksies ge evalueer. Die algoritmiese uitset word ook verder gekontekstualiseer ten opsigte van praktiese insig met betrekking tot die doel van die onderskeie gevallestudie. Verder word die raamwerk deur middel van 'n vakkundige gevalideer, waartydens die metodologiese bruikbaarheid van die voorgestelde raamwerk en die toepaslikheid daarvan op werklike bedrywighede gestaaf word. Masters 2024-02-09T14:43:35Z 2024-04-27T01:50:40Z 2024-02-09T14:43:35Z 2024-04-27T01:50:40Z 2024-02 Thesis https://scholar.sun.ac.za/handle/10019.1/130664 en_ZA en_ZA xxiv, 182 pages : illistrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Medical records -- Data processing
Graphic methods
Link prediction
Knowledge graph
System analysis
Machine learning
UCTD
Parshotam, Himil
A framework for evaluating clinical data using graph-based link prediction
title A framework for evaluating clinical data using graph-based link prediction
title_full A framework for evaluating clinical data using graph-based link prediction
title_fullStr A framework for evaluating clinical data using graph-based link prediction
title_full_unstemmed A framework for evaluating clinical data using graph-based link prediction
title_short A framework for evaluating clinical data using graph-based link prediction
title_sort framework for evaluating clinical data using graph based link prediction
topic Medical records -- Data processing
Graphic methods
Link prediction
Knowledge graph
System analysis
Machine learning
UCTD
url https://scholar.sun.ac.za/handle/10019.1/130664
work_keys_str_mv AT parshotamhimil aframeworkforevaluatingclinicaldatausinggraphbasedlinkprediction
AT parshotamhimil frameworkforevaluatingclinicaldatausinggraphbasedlinkprediction