Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MSc)--Stellenbosch University, 2024.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613896319172608 |
|---|---|
| access_status_str | Open Access |
| author | Kitching, Chene |
| author2 | Moller, Marlo |
| author_browse | Kitching, Chene Moller, Marlo |
| author_facet | Moller, Marlo Kitching, Chene |
| author_sort | Kitching, Chene |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MSc)--Stellenbosch University, 2024. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/131715 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:43:25.190Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/131715 Development of an open-source variant prioritisation tool Kitching, Chene Moller, Marlo Petersen, Desiree Van der Spuy, Gian Tromp, Gerard Stellenbosch University. Faculty of Science. Centre for Bioinformatics & Computational Biology. Genomics -- Variation Phylogeny -- Molecular aspects Rare diseases -- Diagnosis Genomics -- Data processing -- Mathematical models Machine learning -- Data processing -- Mathematical models Computational learning theory UCTD Thesis (MSc)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Interpreting genomic variants is critical for diagnosing and assessing the risk of genetic diseases, particularly rare genetic disorders. Many such conditions are caused by variants that are rarely found in the general population. Accurate variant interpretation and prioritisation are essential for narrowing down the list of potentially pathogenic variants. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) introduced guidelines for genetic variant classification. Although the guidelines offer a standardised approach, they are limited to qualitative interpretations, complicating the comparison and prioritisation of variants within the same subject. We developed an open-source variant prioritisation tool designed to prioritise rare disease variants using both data-driven and guideline-based approaches. Multiple machine learning models, including random forest, were trained and evaluated using a curated dataset of well-classified rare disease variants and features derived from the ACMG/AMP criteria. The area under the receiver operating characteristic curve (AUC-ROC) was utilised to evaluate model performance. Among the evaluated models, the random forest model showed superior performance in predicting variant pathogenicity, achieving an AUC of 0.915. This model was integrated into a Nextflow pipeline to automate and parallelise the filtering, annotation, and prioritisation steps involved in variant interpretation. Singularity containers were employed to ensure the pipeline could be executed across different platforms, enhancing its accessibility and reproducibility. The developed tool, VIPR, represents a significant advancement in the prioritisation of rare disease variants by combining guideline-based criteria with data-driven machine learning approaches. The high AUC achieved by the random forest model underscores its effectiveness in variant pathogenicity prediction. Integration into a Nextflow pipeline and use of Singularity containers ensure a robust and scalable solution for variant interpretation workflows. This tool addresses the limitations of qualitative-only interpretation methods, facilitating more accurate and efficient prioritisation of genetic variants in clinical and research settings. AFRIKAANSE OPSOMMING: Die interpretasie van genomiese variante is van kritieke belang vir die diagnose en beoordeling van die risiko van genetiese siektes, veral seldsame genetiese afwykings. Baie sulke toestande word veroorsaak deur variante wat selde in die algemene bevolking waargeneem word. Akkurate interpretasie en prioritisering van variante is noodsaaklik om die lys van potensiële siekteveroorsakende variante te verklein. In 2015 het die American College of Medical Genetics and Genomics (ACMG) en Association for Molecular Pathology (AMP) riglyne vir klassifikasie van variante gepubliseer. Hierdie riglyne verskaf 'n gestandaardiseerde raamwerk, maar is beperk tot kwalitatiewe interpretasies, wat die vergelyking en prioritisering van variante binne dieselfde pasïent bemoeilik. Ons het 'n oopbron-instrument ontwikkel wat ontwerp is om seldsame siekteveroorsakende variante te prioritiseer deur beide datagedrewe en riglyngebaseerde benaderings te gebruik. Verskeie masjienleermodelle, insluitend “random forest”, is opgelei en geëvalueer met behulp van 'n saamgestelde datastel van goed geklassifiseerde seldsame siekteveroorsakende variante en kenmerke afgelei van die ACMG/AMP-kriteria. Modelprestasie is beoordeel met behulp van die gebied onder die ontvangerbedryfseienskap-kromme (AUC-ROC). Onder die geëvalueerde modelle het die “random forest” model uitstekende prestasie getoon in die voorspelling van patogene variante, wat 'n AUC van 0.915 behaal het. Hierdie model is geïntegreer in 'n Nextflow-pyplyn om die filter-, aantekening- en prioritiseringstappe wat by variantinterpretasie betrokke is, te outomatiseer en te paralleliseer. Singularit- houers is gebruik om te verseker dat die pypleiding oor verskillende platforms uitgevoer kan word, wat die toeganklikheid en reproduseerbaarheid daarvan verbeter. Die ontwikkelde instrument, VIPR, verteenwoordig 'n beduidende vooruitgang in die prioritisering van seldsame siekteveroosakende variante deur riglyngebaseerde kriteria met datagedrewe masjienleerbenaderings te kombineer. Die hoë AUC wat deur die “random forest” model behaal word, beklemtoon die doeltreffendheid daarvan in voorspelling van patogene variante. Integrasie in 'n Nextflow-pyplyn en die gebruik van Singularity-houers verseker 'n robuuste en skaalbare oplossing vir werkstrome van variantinterpretasie. Hierdie instrument spreek die beperkings van kwalitatief alleen interpretasiemetodes aan, wat meer akkurate en doeltreffende prioritisering van genetiese variante in kliniese en navorsingsomgewings fasiliteer. Masters 2025-02-20T07:50:52Z 2025-02-20T07:50:52Z 2024-12 Thesis https://scholar.sun.ac.za/handle/10019.1/131715 en Stellenbosch University xiii, 91 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Genomics -- Variation Phylogeny -- Molecular aspects Rare diseases -- Diagnosis Genomics -- Data processing -- Mathematical models Machine learning -- Data processing -- Mathematical models Computational learning theory UCTD Kitching, Chene Development of an open-source variant prioritisation tool |
| title | Development of an open-source variant prioritisation tool |
| title_full | Development of an open-source variant prioritisation tool |
| title_fullStr | Development of an open-source variant prioritisation tool |
| title_full_unstemmed | Development of an open-source variant prioritisation tool |
| title_short | Development of an open-source variant prioritisation tool |
| title_sort | development of an open source variant prioritisation tool |
| topic | Genomics -- Variation Phylogeny -- Molecular aspects Rare diseases -- Diagnosis Genomics -- Data processing -- Mathematical models Machine learning -- Data processing -- Mathematical models Computational learning theory UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/131715 |
| work_keys_str_mv | AT kitchingchene developmentofanopensourcevariantprioritisationtool |