Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.

Thesis (MSc)--Stellenbosch University, 2024.

Saved in:
Bibliographic Details
Main Author: Madzime, Ruvarashe Joylyne
Other Authors: Tromp, Gerard
Format: Thesis
Language:en_ZA
en_ZA
Published: Stellenbosch : Stellenbosch University 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614046076796928
access_status_str Open Access
author Madzime, Ruvarashe Joylyne
author2 Tromp, Gerard
author_browse Madzime, Ruvarashe Joylyne
Tromp, Gerard
author_facet Tromp, Gerard
Madzime, Ruvarashe Joylyne
author_sort Madzime, Ruvarashe Joylyne
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2024.
format Thesis
id oai:scholar.sun.ac.za:10019.1/130655
institution Stellenbosch University (South Africa)
language en_ZA
en_ZA
last_indexed 2026-06-10T12:45:48.703Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/130655 Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data. Madzime, Ruvarashe Joylyne Tromp, Gerard Sanko, Tomasz Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences. Molecular Biology and Human Genetics. Metagenomics High-throughput nucleotide sequencing Microbial genetics Nucleotide sequence Thesis (MSc)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Advances in next generation sequencing technologies have enabled the investigation of microbial genetic material directly from a biological specimen without the need for culturing. This has propelled the field of metagenomics and set techniques like amplicon-, shotgun metagenomic- and meta transcriptomic sequencing at the forefront of investigating complex microbial communities. Data from these techniques is very large, over gigabytes (GB) in size, and often needs to be analysed on high performance clusters and servers. These computational requirements introduce a problem of variation in compute environments, which leads to irreproducibility. The data are high-dimensional and compositional, and there are specific algorithms that address these qualities of the data. However, the software algorithms are updated regularly, providing multiple versions of the same software algorithm, and this too leads to irreproducibility, therefore affecting the integrity of science. This project addresses computational irreproducibility through the development of three independent computational pipelines, designed to be used on Unix/Linux-based clusters and servers. I developed a pipeline implementing QIIME2, for analysing amplicon sequence data. For meta transcriptomic data, I developed a pipeline implementing Trinity and its utility workflow for de novo assembly of transcripts and differential expression analysis. I developed a pipeline for analysing shotgun metagenomic data, implementing multiple algorithms, meta SPAdes, Maxbin2, prokka, BLAST and deep ARG. I packaged all the algorithms in separate Singularity containers for version control and consistency of execution environment. All three pipelines were developed and launched using the Next flow workflow management system. Using the respective data for each pipeline, the pipelines managed to run in an automated manner on a local university server and a PBSPro cluster. All three independent containerized pipelines were successfully implemented. Future work will include multi-stage development of containers, robust validation of the pipelines, and adding features like optional software algorithms to the pipelines. AFRIKAANS OPSOMMING: Vooruitgang in volgende generasie volgordebepalingtegnologieë het die ondersoek van mikrobiese genetiese materiaal, direk vanaf 'n biologiese monster, moontlik gemaak sonder dat dit nodig is om die mikrobe kweek. Dit het die veld van metagenomika aangedryf en het tegnieke soos amplikon-, haelgeweer-metagenomiese- en metatranskriptomiese volgordebepaling aan die voorpunt van die ondersoek van komplekse mikrobiese gemeenskappe gestel. Data van hierdie tegnieke is baie groot, meer as gigagrepe (GG) groot, en moet dikwels op hoëwerkverrigtingklusters en bedieners ontleed word. Hierdie berekeningsvereistes lei tot 'n probleem van veranderlikheid in rekenaaromgewings in, wat verder lei tot onreproduseerbaarheid. Die data behels baie dimensies en veelsydige komposisie, en daar is spesifieke algoritmes wat hierdie eienskappe van die data aanspreek. Die sagtewarealgoritmes word egter gereeld opgedateer, wat verskeie weergawes van dieselfde sagtewarealgoritme verskaf, en dit lei ook tot onreproduseerbaarheid, wat dus die integriteit van die wetenskap beïnvloed. Hierdie projek spreek onreproduseerbaarheid van die rekenaarsomgewing aan deur die ontwikkeling van drie onafhanklike berekeningspyplyne, wat ontwerp is om op Unix/Linux-gebaseerde rekenaarkluster en bedieners gebruik te word. Ek het 'n pyplyn ontwikkel wat QIIME2 implementeer, vir die ontleding van amplikonvolgordedata. Vir metatranskriptomiese data het ek 'n pyplyn ontwikkel wat Trinity en sy nutswerkvloei implementeer vir de novo samestelling van transkripsies en differensiële uitdrukkingsanalise. Ek het 'n pyplyn ontwikkel vir die ontleding van haelgeweer metagenomiese data, met die implementering van veelvuldige algoritmes, naamlik metaSPAdes, Maxbin2, prokka, BLAST en deepARG. Ek het al die algoritmes in aparte Singularity-houers verpak vir weergawebeheer en konsekwentheid van uitvoeringsomgewing. Al drie pyplyne is ontwikkel en bekendgestel met behulp van die Nextflow-werkvloeibestuurstelsel. Deur die onderskeie data vir elke pyplyn te gebruik, het die pyplyne daarin geslaag om op 'n outomatiese wyse op 'n plaaslike universiteitsbediener en 'n PBSPro-kluster te loop. Al drie onafhanklike houerpypleidings is suksesvol geïmplementeer. Toekomstige werk sal multi-stadium ontwikkeling van houers insluit, robuuste validering van die pyplyne, en die toevoeging van kenmerke soos opsionele sagteware-algoritmes by die pyplyne. Masters 2024-02-16T11:12:24Z 2024-04-27T01:31:00Z 2024-02-16T11:12:24Z 2024-04-27T01:31:00Z 2024-03 Thesis https://scholar.sun.ac.za/handle/10019.1/130655 en_ZA en_ZA Stellenbosch University xv, 85 pages : illustrations application/octet-stream application/pdf Stellenbosch : Stellenbosch University
spellingShingle Metagenomics
High-throughput nucleotide sequencing
Microbial genetics
Nucleotide sequence
Madzime, Ruvarashe Joylyne
Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title_full Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title_fullStr Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title_full_unstemmed Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title_short Development of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.
title_sort development of containerized pipelines for the reproducible analysis of amplicon shotgun metagenomic and metatranscriptomic data
topic Metagenomics
High-throughput nucleotide sequencing
Microbial genetics
Nucleotide sequence
url https://scholar.sun.ac.za/handle/10019.1/130655
work_keys_str_mv AT madzimeruvarashejoylyne developmentofcontainerizedpipelinesforthereproducibleanalysisofampliconshotgunmetagenomicandmetatranscriptomicdata