Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MSc)--Stellenbosch University, 2024.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2024
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614046076796928 |
|---|---|
| access_status_str | Open Access |
| author | Madzime, Ruvarashe Joylyne |
| author2 | Tromp, Gerard |
| author_browse | Madzime, Ruvarashe Joylyne Tromp, Gerard |
| author_facet | Tromp, Gerard Madzime, Ruvarashe Joylyne |
| author_sort | Madzime, Ruvarashe Joylyne |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MSc)--Stellenbosch University, 2024. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/130655 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA en_ZA |
| last_indexed | 2026-06-10T12:45:48.703Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2024 |
| publishDateRange | 2024 |
| publishDateSort | 2024 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/130655 Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. Madzime, Ruvarashe Joylyne Tromp, Gerard Sanko, Tomasz Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences. Molecular Biology and Human Genetics. Metagenomics High-throughput nucleotide sequencing Microbial genetics Nucleotide sequence Thesis (MSc)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Advances in next generation sequencing technologies have enabled the investigation of microbial genetic material directly from a biological specimen without the need for culturing. This has propelled the field of metagenomics and set techniques like amplicon-, shotgun metagenomic- and meta transcriptomic sequencing at the forefront of investigating complex microbial communities. Data from these techniques is very large, over gigabytes (GB) in size, and often needs to be analysed on high performance clusters and servers. These computational requirements introduce a problem of variation in compute environments, which leads to irreproducibility. The data are high-dimensional and compositional, and there are specific algorithms that address these qualities of the data. However, the software algorithms are updated regularly, providing multiple versions of the same software algorithm, and this too leads to irreproducibility, therefore affecting the integrity of science. This project addresses computational irreproducibility through the development of three independent computational pipelines, designed to be used on Unix/Linux-based clusters and servers. I developed a pipeline implementing QIIME2, for analysing amplicon sequence data. For meta transcriptomic data, I developed a pipeline implementing Trinity and its utility workflow for de novo assembly of transcripts and differential expression analysis. I developed a pipeline for analysing shotgun metagenomic data, implementing multiple algorithms, meta SPAdes, Maxbin2, prokka, BLAST and deep ARG. I packaged all the algorithms in separate Singularity containers for version control and consistency of execution environment. All three pipelines were developed and launched using the Next flow workflow management system. Using the respective data for each pipeline, the pipelines managed to run in an automated manner on a local university server and a PBSPro cluster. All three independent containerized pipelines were successfully implemented. Future work will include multi-stage development of containers, robust validation of the pipelines, and adding features like optional software algorithms to the pipelines. AFRIKAANS OPSOMMING: Vooruitgang in volgende generasie volgordebepalingtegnologieë het die ondersoek van mikrobiese genetiese materiaal, direk vanaf 'n biologiese monster, moontlik gemaak sonder dat dit nodig is om die mikrobe kweek. Dit het die veld van metagenomika aangedryf en het tegnieke soos amplikon-, haelgeweer-metagenomiese- en metatranskriptomiese volgordebepaling aan die voorpunt van die ondersoek van komplekse mikrobiese gemeenskappe gestel. Data van hierdie tegnieke is baie groot, meer as gigagrepe (GG) groot, en moet dikwels op hoëwerkverrigtingklusters en bedieners ontleed word. Hierdie berekeningsvereistes lei tot 'n probleem van veranderlikheid in rekenaaromgewings in, wat verder lei tot onreproduseerbaarheid. Die data behels baie dimensies en veelsydige komposisie, en daar is spesifieke algoritmes wat hierdie eienskappe van die data aanspreek. Die sagtewarealgoritmes word egter gereeld opgedateer, wat verskeie weergawes van dieselfde sagtewarealgoritme verskaf, en dit lei ook tot onreproduseerbaarheid, wat dus die integriteit van die wetenskap beïnvloed. Hierdie projek spreek onreproduseerbaarheid van die rekenaarsomgewing aan deur die ontwikkeling van drie onafhanklike berekeningspyplyne, wat ontwerp is om op Unix/Linux-gebaseerde rekenaarkluster en bedieners gebruik te word. Ek het 'n pyplyn ontwikkel wat QIIME2 implementeer, vir die ontleding van amplikonvolgordedata. Vir metatranskriptomiese data het ek 'n pyplyn ontwikkel wat Trinity en sy nutswerkvloei implementeer vir de novo samestelling van transkripsies en differensiële uitdrukkingsanalise. Ek het 'n pyplyn ontwikkel vir die ontleding van haelgeweer metagenomiese data, met die implementering van veelvuldige algoritmes, naamlik metaSPAdes, Maxbin2, prokka, BLAST en deepARG. Ek het al die algoritmes in aparte Singularity-houers verpak vir weergawebeheer en konsekwentheid van uitvoeringsomgewing. Al drie pyplyne is ontwikkel en bekendgestel met behulp van die Nextflow-werkvloeibestuurstelsel. Deur die onderskeie data vir elke pyplyn te gebruik, het die pyplyne daarin geslaag om op 'n outomatiese wyse op 'n plaaslike universiteitsbediener en 'n PBSPro-kluster te loop. Al drie onafhanklike houerpypleidings is suksesvol geïmplementeer. Toekomstige werk sal multi-stadium ontwikkeling van houers insluit, robuuste validering van die pyplyne, en die toevoeging van kenmerke soos opsionele sagteware-algoritmes by die pyplyne. Masters 2024-02-16T11:12:24Z 2024-04-27T01:31:00Z 2024-02-16T11:12:24Z 2024-04-27T01:31:00Z 2024-03 Thesis https://scholar.sun.ac.za/handle/10019.1/130655 en_ZA en_ZA Stellenbosch University xv, 85 pages : illustrations application/octet-stream application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Metagenomics High-throughput nucleotide sequencing Microbial genetics Nucleotide sequence Madzime, Ruvarashe Joylyne Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title | Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title_full | Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title_fullStr | Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title_full_unstemmed | Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title_short | Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. |
| title_sort | development of containerized pipelines for the reproducible analysis of amplicon shotgun metagenomic and metatranscriptomic data |
| topic | Metagenomics High-throughput nucleotide sequencing Microbial genetics Nucleotide sequence |
| url | https://scholar.sun.ac.za/handle/10019.1/130655 |
| work_keys_str_mv | AT madzimeruvarashejoylyne developmentofcontainerizedpipelinesforthereproducibleanalysisofampliconshotgunmetagenomicandmetatranscriptomicdata |