Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa

Thesis Title Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus patients in the Western Cape Province, South Af...

Full description

Saved in:
Bibliographic Details
Main Author: Tamuhla, Tsaone
Other Authors: Tiffin, Nicola
Format: Thesis
Language:English
Published: Department of Integrative Biomedical Sciences (IBMS) 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613560714035200
access_status_str Open Access
author Tamuhla, Tsaone
author2 Tiffin, Nicola
author_browse Tamuhla, Tsaone
Tiffin, Nicola
author_facet Tiffin, Nicola
Tamuhla, Tsaone
author_sort Tamuhla, Tsaone
collection Thesis
description Thesis Title Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus patients in the Western Cape Province, South Africa. Abstract Introduction There is poor knowledge on the genetic drivers of disease in African populations and this is largely driven by the limited data for human genomes from sub-Saharan Africa. While the costs of generating human genomic data have gone down significantly, they are still a barrier to generating large scale African genomic data. This project is therefore a proof-of-concept pilot study that demonstrates the implementation of a cost-effective, scalable genotyped virtual cohort that can address population level genomic questions. Methods We optimised a tiered informed consent process that is suitable for the cohort study design and adapted it to conducting human genomic research in the African context. We used an existing dataset to explore statistical methods for modelling longitudinal routine health data into a standardised phenotype for genome wide association studies (GWAS). We then conducted a feasibility study and piloted the tiered informed consent process, DNA collection by buccal swab and DNA extraction from buccal swabs and peripheral blood samples. DNA samples were genotyped for approximately 2.2 million variants on the Infinium™ H3Africa Consortium Array V2. Genotyping quality control (QC) was done in Plink 1.9 and genome wide imputation on the Sanger Imputation Service. We demonstrated successful variant calling and provide aggregate statistics for known aetiological variants for type 2 diabetes and severe COVID-19 as well as demonstrating the feasibility of running nested case-control GWAS with these data. Results We demonstrate the use of routine health data to provide complex phenotypes to link to genotype data for both non-communicable diseases (diabetes) and infectious diseases (Tuberculosis, HIV and COVID-19). 459 participants consented to providing a DNA sample and access to their routine health data and were included in the feasibility study. A total of 343 DNA samples and 1782023 genotyped variants passed quality control and were available for further analysis. While most of the cohort population clustered with the 1000 genomes African population, principal component analysis showed extensive population admixture. For the COVID-19 analysis, we identified 63 cases of severe COVID-19 and 280 controls, and for the type 2 diabetes analysis we identified 93 cases and 250 controls using the routine health data of participants in the cohort. While the sample sizes were insufficient for a GWAS we were able to evaluate known type 2 diabetes mellitus and COVID-19 variants in the study population. Conclusion We have described how we conceptualised and implemented a genotyped virtual population cohort in a resource constrained environment, and we are confident that this design and implementation are appropriate to scale up the cohort to a size where novel health discoveries can be made through nested case-control studies. In the interim we demonstrate the analysis and validation of aetiological variants identified in other studies and populations.
format Thesis
id oai:open.uct.ac.za:11427/38543
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:38:05.825Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Integrative Biomedical Sciences (IBMS)
publisherStr Department of Integrative Biomedical Sciences (IBMS)
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/38543 Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa Tamuhla, Tsaone Tiffin, Nicola Mulder, Nicola J Type 2 Diabetes Mellitus Thesis Title Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus patients in the Western Cape Province, South Africa. Abstract Introduction There is poor knowledge on the genetic drivers of disease in African populations and this is largely driven by the limited data for human genomes from sub-Saharan Africa. While the costs of generating human genomic data have gone down significantly, they are still a barrier to generating large scale African genomic data. This project is therefore a proof-of-concept pilot study that demonstrates the implementation of a cost-effective, scalable genotyped virtual cohort that can address population level genomic questions. Methods We optimised a tiered informed consent process that is suitable for the cohort study design and adapted it to conducting human genomic research in the African context. We used an existing dataset to explore statistical methods for modelling longitudinal routine health data into a standardised phenotype for genome wide association studies (GWAS). We then conducted a feasibility study and piloted the tiered informed consent process, DNA collection by buccal swab and DNA extraction from buccal swabs and peripheral blood samples. DNA samples were genotyped for approximately 2.2 million variants on the Infinium™ H3Africa Consortium Array V2. Genotyping quality control (QC) was done in Plink 1.9 and genome wide imputation on the Sanger Imputation Service. We demonstrated successful variant calling and provide aggregate statistics for known aetiological variants for type 2 diabetes and severe COVID-19 as well as demonstrating the feasibility of running nested case-control GWAS with these data. Results We demonstrate the use of routine health data to provide complex phenotypes to link to genotype data for both non-communicable diseases (diabetes) and infectious diseases (Tuberculosis, HIV and COVID-19). 459 participants consented to providing a DNA sample and access to their routine health data and were included in the feasibility study. A total of 343 DNA samples and 1782023 genotyped variants passed quality control and were available for further analysis. While most of the cohort population clustered with the 1000 genomes African population, principal component analysis showed extensive population admixture. For the COVID-19 analysis, we identified 63 cases of severe COVID-19 and 280 controls, and for the type 2 diabetes analysis we identified 93 cases and 250 controls using the routine health data of participants in the cohort. While the sample sizes were insufficient for a GWAS we were able to evaluate known type 2 diabetes mellitus and COVID-19 variants in the study population. Conclusion We have described how we conceptualised and implemented a genotyped virtual population cohort in a resource constrained environment, and we are confident that this design and implementation are appropriate to scale up the cohort to a size where novel health discoveries can be made through nested case-control studies. In the interim we demonstrate the analysis and validation of aetiological variants identified in other studies and populations. 2023-09-12T09:01:09Z 2023-09-12T09:01:09Z 2023 2023-09-12T08:48:41Z Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/38543 eng application/pdf Department of Integrative Biomedical Sciences (IBMS) Faculty of Health Sciences
spellingShingle Type 2 Diabetes Mellitus
Tamuhla, Tsaone
Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
thesis_degree_str Doctoral
title Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
title_full Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
title_fullStr Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
title_full_unstemmed Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
title_short Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa
title_sort exploring new methodologies to identify disease associated variants in african populations through the integration of patient genotype data and clinical phenotypes derived from routine health data a case study for type 2 diabetes mellitus in patients in the western cape province south africa
topic Type 2 Diabetes Mellitus
url http://hdl.handle.net/11427/38543
work_keys_str_mv AT tamuhlatsaone exploringnewmethodologiestoidentifydiseaseassociatedvariantsinafricanpopulationsthroughtheintegrationofpatientgenotypedataandclinicalphenotypesderivedfromroutinehealthdataacasestudyfortype2diabetesmellitusinpatientsinthewesterncapeprovincesouthafrica