Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MSc)--Stellenbosch University, 2025.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613949775577088 |
|---|---|
| access_status_str | Open Access |
| author | Madula, Lusanda Indiphile |
| author2 | Maasdorp, Elizna |
| author_browse | Maasdorp, Elizna Madula, Lusanda Indiphile |
| author_facet | Maasdorp, Elizna Madula, Lusanda Indiphile |
| author_sort | Madula, Lusanda Indiphile |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MSc)--Stellenbosch University, 2025. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/134688 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:44:16.501Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/134688 Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array Madula, Lusanda Indiphile Maasdorp, Elizna Bardien, Soraya Sparks, Anel Step, Kathryn Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences. Division of Molecular Biology and Human Genetics. Parkinson’s disease -- Molecular aspects Computational biology DNA microarrays Mutation detection Thesis (MSc)--Stellenbosch University, 2025. Madula, L. I. 2025. Computational workflow to identify pathogenic variants for Parkinson’s disease using the NeuroBooster Array. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/898df746-931f-4d15-8732-7e10b145234b ENGLISH ABSTRACT: Neurological disorders are now the leading cause of global disease burden, with Parkinson’s disease (PD) ranked as the 11th most significant contributor to disability and premature mortality worldwide. PD is a common neurodegenerative disorder characterized by progressive motor and non-motor symptoms that severely impair quality of life. While current treatments can alleviate symptoms, no cure exists. The etiology of PD is complex, involving an interplay between aging, environmental exposures, and genetic factors. More than 20 genes have been linked to PD, with seven of these (GBA, LRRK2, PRKN, PINK1, SNCA, DJ-1, and VPS35) being unequivocally implicated in the development of PD. Despite extensive research in European populations, genetic studies of PD in Sub-Saharan African populations remain limited. To address this knowledge gap, the South African Parkinson’s Disease Research Group, in collaboration with a consortium known as the Global Parkinson’s Genetics Program (GP2), conducted a comprehensive genetic investigation of PD in South African individuals with PD. The present study aimed to develop and apply an integrative computational workflow to identify pathogenic PD variants using genotyping data from the NeuroBooster Array (NBA). Two datasets were analyzed: (i) a positive control group of 19 monogenic PD cases with known pathogenic variants (i.e. solved cases) to develop and validate the workflow, and (ii) a collection of 645 unsolved PD cases. The computational workflow, implemented via Bash scripting and R, integrated quality control steps, variant annotation (via ANNOVAR), filtering, and prioritization using multiple in-silico pathogenicity prediction tools, to produce a list of candidate variants for further study. In the solved cases (i.e. positive controls), the computational workflow correctly identified four known pathogenic variants: p.G2019S and p.R1441C in LRRK2, and p.G430D and p.R275W in PRKN, in 11 individuals. In the unsolved cases , 17 distinct putative pathogenic variants (p.T408M, p.N409S, p.L119P, p.R793M, p.R1325Q, p.A339T, p.A359T, p.A494V, p.S34L, p.D31N, p.G361S, p.V1044A, p.R334C, p.R402C, p.G430D, p.P437L, and p.I707T) in seven genes were identified in 48 individuals. Due to the known limitations of array-based genotyping, all candidate variants were validated using Sanger sequencing, the gold standard for variant confirmation. All of the variants except the p.G430D variant in PRKN (Individual lab ID: 65.26) was confirmed by Sanger sequencing. This suggests a false-positive call of this one variant by the NBA genotyping. Co-segregation analyses of four variants in families with available DNA revealed incomplete segregation patterns, likely reflecting PD’s age-related penetrance. Furthermore, the putative pathogenic variants identified in our analysis were cross-referenced against a collection of non-PD South African controls, with only six of these variants detected in controls, underscoring the pathogenic potential of our findings. Recognizing the limited number of variants uncovered by the initial four in-silico prediction tools (CADD, SIFT, FATHMM, PolyPhen), we pursued additional analysis employing alternative meta-predictors, specifically MetaLR and MetaSVM, which integrate multiple scoring systems to provide a more comprehensive assessment of variant pathogenicity. As a result, of the original 17 candidate variants, six variants (p.T408M, p.N409S, p.V1044A, p.R402C, p.G430D, and p.I707T) were prioritized in 25 individuals, which represents a subset of the previously identified group of 48. This outcome attests to the robustness and precision of our computational workflow and highlights the increased stringency and discriminative power of the meta-predictor tools in the identification of truly pathogenic variants. In conclusion, this study demonstrates the utility and robustness of our newly developed computational workflow for detecting PD associated variants using NBA genotyping data. It underscores the importance of validating array-based findings using gold-standard methods like Sanger sequencing and highlights the value of expanding PD genetics research in underrepresented populations. These findings provide a crucial foundation for future studies into the genetic architecture of PD in Africa and contribute to global efforts toward equitable precision medicine in neurodegenerative diseases. AFRIKAANSE OPSOMMING: Neurologiese siektes is nou die voorloper oorsaak in die globale las van siekte, met Parkinson siekte (PS) elfde in die rangorde van siektes wat mees betekenisvol bydra tot gestremdheid envoortydige sterftes wêreldwyd. PS is ‘n algemene neuro-degeneratiewe siekte gekenmerk deur progressiewe motor en nie-motor simptome wat kwaliteit van lewe erg benadeel. Huidiglik is genesende middels nie beskikbaar nie, hoewel daar behandeling vir simptoom verligting is. Die oorsaak van PS is kompleks en betrek ‘n wisselwerking tussen veroudering, die omgewing, en genetiese faktore. Meer as 20 gene is reeds aan PS geskakel, met sewe van hierdie (GBA, LRRK2, PRKN, PINK1, SNCA, DJ-1, en VPS35) deurslaggewend betrokke in die ontwikkeling van die siekte. Ten spyte van uitgebreide navorsing in Europese populasies, bly genetiese PS studies in populasies uit Sub-Sahara Afrika, beperk. Om hierdie kennisgaping aan te spreek, het die SuidAfrikaanse Parkinson Siekte Navorsingsgroep, in samewerking met die “Global Parkinson’sGenetics Program (GP2)”, ‘n omvattende genetiese ondersoek van PS in Suid-Afrikaanse individue met PS, aangepak. Die studie het ten doel gehad om ‘n geïntegreerde berekeningspyplyn te ontwikkel om patogeniese PS variante te identifiseer met geentipering data van die “NeuroBooster Array (NBA)”. Twee datastelle is geanaliseer: (i) ‘n positiewe kontrole groep van 19 monogeniese PS gevalle, bekend met patogene variante (dus die opgelosde gevalle), om die pyplyn te ontwikkel en te bevestig, en (ii) ‘n versameling van 645 onopgelosde gevalle. Die berekeningspyplyn, geïmplimenteer met Bash kode en R, integreer die kwaliteitskontrole stappe, variantannotasie (met ANNOVAR), filtrering en prioritisering, met verskeie in-silico patogenisiteit voorspellers, om ‘n lys van kandidaat variante te skep, vir verdere studie. Die berekeningspyplyn het vier bekende patogeniese variante (p.G2019S en p.R1441C in LRRK2, en p.G430D en p.R275W in PRKN) korrek identifiseer in 11 individue wat opgelosde gevalle (die positiewe kontroles) was. In die onopgelosde gevalle, is 17 unieke, veronderstelde patogeniese variante (p.T408M, p.N409S, p.L119P, p.R793M, p.R1325Q, p.A339T, p.A359T, p.A494V, p.S34L, p.D31N, p.G361S, p.V1044A, p.R334C, p.R402C, p.G430D, p.P437L, en p.I707T), in sewe gene geïdentifiseer, vir 48 individue. As gevolg van die bekende beperkinge met hierdie tipe genotipering, is al die kandidaat variante ook bevestig met Sanger volgordebepaling, die goue standard vir variant bevestiging. Al die variante, behalwe die p.G430D variant in PRKN (Individu laboratorium ID: 65.26), met Sanger volgordebepaling bevestig. Dit stel voor dat dit ‘n valspositiewe roep van hierdie variant in die NBA genotipering was. Ko-segregasie analise van vier variante in families met beskikbare DNS het onvolledige segregasie patrone gewys, waarskynlik die gevolg van PS se ouderdomsverwante penetrasie. Ons het verder die veronderstelde patogeniese variante wat in ons analise geïdentifiseer is, kruisverwys teen ‘n versameling nie-PS Suid-Afrikaanse kontroles, met slegs ses van hierdie variante opgemerk in die kontroles, wat die patogeniese potensiaal van ons bevindinge onderstreep. In herkenning van die beperkte hoeveelheid variante wat ons gevind het met die oorspronklike vier in-silico voorspellers (CADD, SIFT, FATHMM, PolyPhen), het ons verdere analise met alternatiewe meta-voorspellers uitgevoer, spesifiek met MetaLR en MetaSVM, wat verskeie puntsisteme integreer om a meer omvattende evaluasie van variant patogenesiteit te gee. As gevolg hiervan is ses variante (p.T408M, p.N409S, p.V1044A, p.R402C, p.G430D, en p.I707T) uit die oorspronklike 17 kandidate in 25 individue prioritiseer, wat ‘n substel is van die voorheen geïdentifiseerde groep van 48. Hierdie resultate getuig van die presisie en robuuste natuur van ons berekeningspyplyn en lig die beter onderskeidingsvermoë van die metavoorspellers uit, vir die identifisering van werklike patogeniese variante. Ter afsluiting, hierdie studie demonstreer die bruikbaarheid en robuustheid van ‘n nuut ontwikkelde berekeningspyplyn vir die ontdek van PS-geassosieerde variante met NBA genotipiese data. Dit onderstreep die belangrikheid van die bevestiging van hierdie tipe data met goue standaard metodes soos Sanger volgordebepaling en wys die waarde daarvan om PS genetiese navorsing uit te brei na onderverteenwoordigde populasies. Hierdie bevindinge voorsien ‘n noodsaaklike fondasie vir toekomstige studies van die genetiese argitektuur van PS in Afrika en dra by tot die globale pogings om gelyke presisie medisyne in neurodegeneratiewe siektes ‘n werklikheid te maak. Masters 2025-12-23T12:06:43Z 2025-12-23T12:06:43Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134688 en Stellenbosch University xx, 155 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Parkinson’s disease -- Molecular aspects Computational biology DNA microarrays Mutation detection Madula, Lusanda Indiphile Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title | Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title_full | Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title_fullStr | Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title_full_unstemmed | Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title_short | Computational workflow to identify pathogenic variants for Parkinson’s disease using the neurobooster array |
| title_sort | computational workflow to identify pathogenic variants for parkinson s disease using the neurobooster array |
| topic | Parkinson’s disease -- Molecular aspects Computational biology DNA microarrays Mutation detection |
| url | https://scholar.sun.ac.za/handle/10019.1/134688 |
| work_keys_str_mv | AT madulalusandaindiphile computationalworkflowtoidentifypathogenicvariantsforparkinsonsdiseaseusingtheneuroboosterarray |